ci(llmobs): re-work `deepEqualWithMockValues` to use `node:assert` #6709

sabrenner · 2025-10-21T02:50:07Z

What does this PR do?

Changes the LLM Observability deepEqualWithMockValues to use node:assert instead of mocha expect. Also in doing this, refactor the logic to do bi-directional assertions. The reason this is a bit complicated is because we sometimes want to assert structure with "mock" values in certain places. This is a common practice in our LLM integration testing, especially for related metadata, as shape of objects change often, and we just want to assert that something exists (MOCK_ANY), or exists with a certain type (MOCK_STRING, MOCK_OBJECT, etc.).

Motivation

Fix the test utility to properly assert and not rely on mocha utilities.

Follow-up items

There are a few follow-up items to re-enable tests that previously did not fail (incorrectly):

Fix the ai integration for correct tool argument tagging for v5 of the library
Undefined messages for Cohere models with aws-sdk bedrockruntime
No output roles for cached token tests for aws-sdk bedrock runtime
A langchain test seems flaky due to a VCR incompatibility. the test does pass occasionally, so i don't believe it to be an issue with the test. i would like to do a follow-up investigation into how the underlying operation of that test (batched llm calls) is interacting with the testagent's vcr capabilities
A couple OpenAI errors where input metadata is not recorded correctly when the underlying openai call fails
Skipped an evaluation metric test which I will quick follow up on re-enabling. Just needs a handler in our local testagent server and an additional method in our llm observability test utilities

Additionally, there are some other tests that need to be updated in a similar cleanup initiative:

The rest of the LLM Observability SDK tests still use expect, and should be changed to use node:assert
The existing "integration" tests should probably be migrated to actual integration tests

github-actions · 2025-10-21T02:50:48Z

Overall package size

Self size: 13.15 MB
Deduped: 115.95 MB
No deduping: 118.16 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.11.1 | 9.96 MB | 10.34 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.82 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @opentelemetry/resources | 1.9.1 | 306.54 kB | 1.74 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.207.0 | 201.39 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.1.0-preview.12 | 95.11 kB | 401.68 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

codecov · 2025-10-21T02:51:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.78%. Comparing base (b874514) to head (ef719cb).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6709      +/-   ##
==========================================
- Coverage   83.89%   83.78%   -0.11%     
==========================================
  Files         500      506       +6     
  Lines       21117    21256     +139     
==========================================
+ Hits        17716    17810      +94     
- Misses       3401     3446      +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pr-commenter · 2025-10-21T03:01:13Z

Benchmarks

Benchmark execution time: 2025-10-30 20:23:04

Comparing candidate commit ef719cb in PR branch sabrenner/rework-llmobs-test-assertions with baseline commit b874514 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1607 metrics, 63 unstable metrics.

…r/rework-llmobs-test-assertions

BridgeAR

This is mostly LGTM. I do have a few questions for understanding and the assertions are all type insensitive (doing the loose comparison) now. Is that intentional? Could we use strict comparison instead?

packages/dd-trace/test/llmobs/plugins/openai/openaiv4.spec.js

packages/dd-trace/test/llmobs/plugins/ai/index.spec.js

packages/dd-trace/test/llmobs/util.js

packages/dd-trace/test/llmobs/sdk/integration.spec.js

…6709) This changes the LLM Observability deepEqualWithMockValues to use node:assert instead of mocha expect. Also refactor the logic to do assertions. Now all properties will be validated, not only particular entries. That caught a couple of issues that will be handled separately.

sabrenner added 9 commits October 20, 2025 22:48

change test utilities

8452958

ai tests

351b933

anthropic tests

7b8b7a2

bedrock tests

842f720

vertexai tests

12e5167

langchain tests

4977220

openai tests

2d973a7

"integration" tests

0bc153c

typescript tests

6b240ba

sabrenner added the semver-patch label Oct 21, 2025

This comment has been minimized.

Sign in to view

sabrenner added 10 commits October 20, 2025 23:05

fix ci errors

d33b73c

Merge branch 'master' of github.com:DataDog/dd-trace-js into sabrenne…

c3f78d7

…r/rework-llmobs-test-assertions

unskip some tests

365cf05

Merge branch 'master' of github.com:DataDog/dd-trace-js into sabrenne…

27c43e7

…r/rework-llmobs-test-assertions

remove unused import

0187e72

uncomment dummy api key

5741e92

fix sdk test

b510c4a

openai completion models do not support cached tokens

86d28ea

change MOCK_ANY assertion

93c76e8

remove unnecessary metadata assertion

e0587a3

sabrenner mentioned this pull request Oct 21, 2025

ci(langchain): fix tests for test version bump to 1.0 #6715

Merged

sabrenner and others added 2 commits October 22, 2025 09:43

Merge branch 'master' of github.com:DataDog/dd-trace-js into sabrenne…

a590892

…r/rework-llmobs-test-assertions

Merge branch 'master' into sabrenner/rework-llmobs-test-assertions

216da08

sabrenner marked this pull request as ready for review October 24, 2025 16:54

sabrenner requested review from a team as code owners October 24, 2025 16:54

sabrenner requested a review from a team as a code owner October 24, 2025 16:54

sabrenner and others added 6 commits October 27, 2025 10:45

Merge branch 'master' into sabrenner/rework-llmobs-test-assertions

75bd3dd

Merge branch 'master' into sabrenner/rework-llmobs-test-assertions

161af29

move requires

fd999cb

in-person review comments

e5eea39

lint

9e85bb9

Merge branch 'master' into sabrenner/rework-llmobs-test-assertions

33645e8

BridgeAR reviewed Oct 30, 2025

View reviewed changes

review comments

ef719cb

BridgeAR approved these changes Oct 31, 2025

View reviewed changes

BridgeAR merged commit 091130d into master Oct 31, 2025
763 checks passed

BridgeAR deleted the sabrenner/rework-llmobs-test-assertions branch October 31, 2025 15:27

sabrenner mentioned this pull request Oct 31, 2025

fix(ai): fix streamed tool call parsing, tool call arguments in v5 #6813

Draft

dd-octo-sts bot mentioned this pull request Nov 1, 2025

v5.76.0 proposal #6810

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(llmobs): re-work `deepEqualWithMockValues` to use `node:assert` #6709

ci(llmobs): re-work `deepEqualWithMockValues` to use `node:assert` #6709

Uh oh!

sabrenner commented Oct 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

pr-commenter bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

BridgeAR left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ci(llmobs): re-work deepEqualWithMockValues to use node:assert #6709

ci(llmobs): re-work deepEqualWithMockValues to use node:assert #6709

Uh oh!

Conversation

sabrenner commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Follow-up items

Uh oh!

github-actions bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

pr-commenter bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

BridgeAR left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ci(llmobs): re-work `deepEqualWithMockValues` to use `node:assert` #6709

ci(llmobs): re-work `deepEqualWithMockValues` to use `node:assert` #6709

sabrenner commented Oct 21, 2025 •

edited

Loading

github-actions bot commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading

pr-commenter bot commented Oct 21, 2025 •

edited

Loading