Skip to content

Conversation

@watson
Copy link
Collaborator

@watson watson commented Apr 8, 2025

What does this PR do?

Evaluate the log message template in the context of the line the probe is instrumenting.

Data structure

The message template for a log probe is provided with the rest of the probe over Remote Config, and is found in the two probe fields message and segments. Previously, we just used message as-is, which is the raw string that the user entered in the UI. However, the backend parses this string and turns it into an AST which is stored in the segments array. This PR switches to use this AST if any of the segments requires evaluation. If not, the raw message is still just used as-is. The message property and segments array has the following format:

{
  message: 'Value of foo: {foo}!',
  segments: [
    { str: 'Value of foo: ' }, // str: this segment should not be evaluated, but is just a string
    { dsl: '{foo}', json: { ...ast... } }, // dsl: raw string user entered, json: the parsed AST
    { str: '!' } // another string
  ]
}

The AST in the json property has the same format as the DI probe conditions already supported by this tracer, so the compiler used to compile probe conditions is reused to compile these segments in this PR.

Performance optimizations

If there's more than one probe on a specific file/line/column, all evaluations happen in a single Debugger.evaluateOnCallFrame call, as to minimise the number of roundtrips over the Chrome DevTools Protocol, and hence, minimise the time the inspected thread is paused.

Stringification method

The util.inspect method is used to stringify the results of the code evaluations (expect for strings which are just returned as is). The following options are passed to util.inspect to keep the returned string small, and to align as much as possible with the same implementation in the other tracers.

{
  depth: 0, // Only traverse the root level of objects
  customInspect: false, // Do not execute custom [util.inspect.custom](depth, opts, inspect) functions
  maxArrayLength: 3, // Only show the first 3 elements in arrays, sets, maps, etc
  maxStringLength: 8 * 1024, // Truncate strings longer than 8KiB
  breakLength: Infinity // Do not add linebreaks to the returned string
}

The total length of the full message is also truncated at 8KiB.

Motivation

Feature parity with the other tracers.

Additional Notes

The only two major differences between this implementation and the implementations in the other tracers are;

  • If any of the objects traversed by util.inspect contains the magic [Symbol.toStringTag] getter, it will get executed. This is technically a side-effect, which we don't like, but in this specific case, it has been considered tolerable, as anyone including such a function, will do so with the express intent of modifying the string representation of instances of a given class.
  • The RFC for how message templates are evaluated, states that any objects with fields, should have their field count capped at 5 in when stringified. This is not possible to do with util.inspect, so for now all fields are included in the string representation. This is an ok compromise for an MVP.

@watson watson self-assigned this Apr 8, 2025
@watson watson added semver-minor debugger Dynamic Instrumentation & Live Debugger labels Apr 8, 2025
Copy link
Collaborator Author

watson commented Apr 8, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

github-actions bot commented Apr 8, 2025

Overall package size

Self size: 9.33 MB
Deduped: 102.57 MB
No deduping: 103.09 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.5.1 | 29.73 MB | 29.73 MB | | @datadog/native-appsec | 8.5.2 | 19.33 MB | 19.34 MB | | @datadog/native-iast-taint-tracking | 3.3.1 | 13.99 MB | 13.99 MB | | @datadog/pprof | 5.7.1 | 9.51 MB | 9.88 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.4.0 | 2.77 MB | 5.42 MB | | @datadog/wasm-js-rewriter | 4.0.0 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.13.1 | 117.64 kB | 839.26 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.2 | 53.63 kB | 53.63 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | dc-polyfill | 0.1.8 | 25.08 kB | 25.08 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.2 | 23.54 kB | 23.54 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link

codecov bot commented Apr 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.28%. Comparing base (4c9b1c2) to head (aa6c85b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5548      +/-   ##
==========================================
- Coverage   79.29%   79.28%   -0.01%     
==========================================
  Files         513      513              
  Lines       23342    23342              
==========================================
- Hits        18508    18507       -1     
- Misses       4834     4835       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Apr 8, 2025

Datadog Report

Branch report: watson/DEBUG-2628/process-template
Commit report: a27bb12
Test service: dd-trace-js-integration-tests

✅ 0 Failed, 920 Passed, 0 Skipped, 8m 9.5s Total Time

@pr-commenter
Copy link

pr-commenter bot commented Apr 8, 2025

Benchmarks

Benchmark execution time: 2025-04-11 13:23:13

Comparing candidate commit aa6c85b in PR branch watson/DEBUG-2628/process-template with baseline commit 4c9b1c2 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 946 metrics, 16 unstable metrics.

scenario:startup-with-tracer-22

  • 🟩 execution_time [-19.311ms; -17.509ms] or [-7.401%; -6.710%]

Base automatically changed from watson/DEBUG-2918/multi-probes to master April 8, 2025 12:12
@watson watson force-pushed the watson/DEBUG-2628/process-template branch from 8a66ab4 to 6a1e418 Compare April 8, 2025 12:13
@watson watson marked this pull request as ready for review April 8, 2025 12:13
@watson watson requested review from a team as code owners April 8, 2025 12:13
tylfin
tylfin previously approved these changes Apr 8, 2025
Copy link

@shatzi shatzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome progress.

I suggest we fix the spec on how errors are handled before commit.

shatzi
shatzi previously approved these changes Apr 9, 2025
expression: probe.condition,
returnByValue: true
})
if (result.value !== true) continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a condition that fails with exception?

the spec should generate an evaluation errors every X minutes - so the user can fix those errors.
I think at the current state failed condition simply stop all probes from been trigger, is that right?

Copy link
Collaborator Author

@watson watson Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's an oversight from me when I created the condition logic. We should report any errors that might occur here. I'll do that in a follow up PR since it's nothing to do with log templates (I just moved this code around)

@watson watson force-pushed the watson/DEBUG-2628/process-template branch from da3c9d4 to 457f6fc Compare April 10, 2025 18:12
@watson watson requested a review from shatzi April 10, 2025 18:12
shatzi
shatzi previously approved these changes Apr 10, 2025
@watson watson force-pushed the watson/DEBUG-2628/process-template branch from b143b67 to e13fed4 Compare April 11, 2025 04:59
Copy link
Collaborator

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few suggestions for more performant code. The main aspect is likely the for await.

assert.strictEqual(diagnostics.exception.message, 'Cannot compile expression: original dsl')
done()
} else if (diagnostics.status === 'INSTALLED') {
assert.fail('Should not install when condition cannot be compiled')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess assert.fail() is used to break out of the loop and to not call done multiple times with an error?

I might be wrong but if this fails, I would expect the test to time out and the error to be an unhandled rejection. So instead, maybe this could use the async await pattern that I recently suggested? :)

Copy link
Collaborator Author

@watson watson Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.fail will immediately end the test without having to call done or waiting for a timeout. I added it here as an escape hatch in case somehow diagnostics.status was unexpectedly INSTALLED which should never happen in this test.

It has technically nothing to do with this PR, but while I was working on other tests during the development in this PR I noticed I could make this improvement (those other tests was a copy of this test, but has since been refactored, so that's why this single change remains in this PR)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the test runner does some magic in this case to detect it? Or is this code sync?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the code is async. Yeah, I assume it does so as well. I just tested it to make sure.

Comment on lines +115 to +119
const { result } = await session.post('Debugger.evaluateOnCallFrame', {
callFrameId: params.callFrames[0].callFrameId,
expression: probe.condition,
returnByValue: true
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a similar pattern that I recently implemented it in the collector. All following lines have to be executed in the then of the promise.

That is however only allowed in case the order of the probes and of the expressions does not matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll do that in a later PR as the condition evaluation has nothing to do with this PR. This diff just appeared in this PR because I moved the code a few lines down from where it was before and added the TODO's so that I could remember to deal with this later.

watson added 2 commits April 11, 2025 15:12
The log template is now evaluated in the context of the line the probe
is instrumenting.
@watson watson force-pushed the watson/DEBUG-2628/process-template branch from df0d279 to aa6c85b Compare April 11, 2025 13:12
Copy link
Collaborator

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@watson watson merged commit 32f2f27 into master Apr 11, 2025
432 checks passed
@watson watson deleted the watson/DEBUG-2628/process-template branch April 11, 2025 13:31
github-actions bot pushed a commit that referenced this pull request Apr 14, 2025
Evaluate the log message template in the context of the line the probe is
instrumenting.

If there's more than one probe on a specific file/line/column, all evaluations
happen in a single `Debugger.evaluateOnCallFrame` call, as to minimise the
number of roundtrips over the Chrome DevTools Protocol, and hence, minimise the
time the inspected thread is paused.

The `util.inspect` method is used to stringify the results of the code
evaluations (expect for strings which are just returned as is). Options are
passed to `util.inspect` to keep the returned string small, and to align as
much as possible with the same implementation in the other tracers.

The total length of the full message is also truncated at 8KiB.

The only two major differences between this implementation and the
implementations in the other tracers are:

- If any of the objects traversed by `util.inspect` contains the magic
`[Symbol.toStringTag]` getter, it will get executed. This is technically a
side-effect, which we don't like, but in this specific case, it has been
considered tolerable, as anyone including such a function, will do so with the
express intent of modifying the string representation of instances of a given
class.

- The RFC for how message templates are evaluated, states that any objects with
fields, should have their field count capped at 5 in when stringified. This is
not possible to do with `util.inspect`, so for now all fields are included in
the string representation. This is an ok compromise for an MVP.
@github-actions github-actions bot mentioned this pull request Apr 14, 2025
github-actions bot pushed a commit that referenced this pull request Apr 14, 2025
Evaluate the log message template in the context of the line the probe is
instrumenting.

If there's more than one probe on a specific file/line/column, all evaluations
happen in a single `Debugger.evaluateOnCallFrame` call, as to minimise the
number of roundtrips over the Chrome DevTools Protocol, and hence, minimise the
time the inspected thread is paused.

The `util.inspect` method is used to stringify the results of the code
evaluations (expect for strings which are just returned as is). Options are
passed to `util.inspect` to keep the returned string small, and to align as
much as possible with the same implementation in the other tracers.

The total length of the full message is also truncated at 8KiB.

The only two major differences between this implementation and the
implementations in the other tracers are:

- If any of the objects traversed by `util.inspect` contains the magic
`[Symbol.toStringTag]` getter, it will get executed. This is technically a
side-effect, which we don't like, but in this specific case, it has been
considered tolerable, as anyone including such a function, will do so with the
express intent of modifying the string representation of instances of a given
class.

- The RFC for how message templates are evaluated, states that any objects with
fields, should have their field count capped at 5 in when stringified. This is
not possible to do with `util.inspect`, so for now all fields are included in
the string representation. This is an ok compromise for an MVP.
@github-actions github-actions bot mentioned this pull request Apr 14, 2025
watson added a commit that referenced this pull request Apr 14, 2025
Evaluate the log message template in the context of the line the probe is
instrumenting.

If there's more than one probe on a specific file/line/column, all evaluations
happen in a single `Debugger.evaluateOnCallFrame` call, as to minimise the
number of roundtrips over the Chrome DevTools Protocol, and hence, minimise the
time the inspected thread is paused.

The `util.inspect` method is used to stringify the results of the code
evaluations (expect for strings which are just returned as is). Options are
passed to `util.inspect` to keep the returned string small, and to align as
much as possible with the same implementation in the other tracers.

The total length of the full message is also truncated at 8KiB.

The only two major differences between this implementation and the
implementations in the other tracers are:

- If any of the objects traversed by `util.inspect` contains the magic
`[Symbol.toStringTag]` getter, it will get executed. This is technically a
side-effect, which we don't like, but in this specific case, it has been
considered tolerable, as anyone including such a function, will do so with the
express intent of modifying the string representation of instances of a given
class.

- The RFC for how message templates are evaluated, states that any objects with
fields, should have their field count capped at 5 in when stringified. This is
not possible to do with `util.inspect`, so for now all fields are included in
the string representation. This is an ok compromise for an MVP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

debugger Dynamic Instrumentation & Live Debugger semver-minor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants