Skip to content

Conversation

@sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Oct 21, 2025

What does this PR do?

Changes the LLM Observability deepEqualWithMockValues to use node:assert instead of mocha expect. Also in doing this, refactor the logic to do bi-directional assertions. The reason this is a bit complicated is because we sometimes want to assert structure with "mock" values in certain places. This is a common practice in our LLM integration testing, especially for related metadata, as shape of objects change often, and we just want to assert that something exists (MOCK_ANY), or exists with a certain type (MOCK_STRING, MOCK_OBJECT, etc.).

Motivation

Fix the test utility to properly assert and not rely on mocha utilities.

Follow-up items

There are a few follow-up items to re-enable tests that previously did not fail (incorrectly):

  • Fix the ai integration for correct tool argument tagging for v5 of the library
  • Undefined messages for Cohere models with aws-sdk bedrockruntime
  • No output roles for cached token tests for aws-sdk bedrock runtime
  • A langchain test seems flaky due to a VCR incompatibility. the test does pass occasionally, so i don't believe it to be an issue with the test. i would like to do a follow-up investigation into how the underlying operation of that test (batched llm calls) is interacting with the testagent's vcr capabilities
  • A couple OpenAI errors where input metadata is not recorded correctly when the underlying openai call fails
  • Skipped an evaluation metric test which I will quick follow up on re-enabling. Just needs a handler in our local testagent server and an additional method in our llm observability test utilities

Additionally, there are some other tests that need to be updated in a similar cleanup initiative:

  • The rest of the LLM Observability SDK tests still use expect, and should be changed to use node:assert
  • The existing "integration" tests should probably be migrated to actual integration tests

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

Overall package size

Self size: 13.15 MB
Deduped: 115.95 MB
No deduping: 118.16 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.11.1 | 9.96 MB | 10.34 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.82 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @opentelemetry/resources | 1.9.1 | 306.54 kB | 1.74 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.207.0 | 201.39 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.1.0-preview.12 | 95.11 kB | 401.68 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.78%. Comparing base (b874514) to head (ef719cb).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6709      +/-   ##
==========================================
- Coverage   83.89%   83.78%   -0.11%     
==========================================
  Files         500      506       +6     
  Lines       21117    21256     +139     
==========================================
+ Hits        17716    17810      +94     
- Misses       3401     3446      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-official

This comment has been minimized.

@pr-commenter
Copy link

pr-commenter bot commented Oct 21, 2025

Benchmarks

Benchmark execution time: 2025-10-30 20:23:04

Comparing candidate commit ef719cb in PR branch sabrenner/rework-llmobs-test-assertions with baseline commit b874514 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1607 metrics, 63 unstable metrics.

@sabrenner sabrenner marked this pull request as ready for review October 24, 2025 16:54
@sabrenner sabrenner requested review from a team as code owners October 24, 2025 16:54
@sabrenner sabrenner requested a review from a team as a code owner October 24, 2025 16:54
Copy link
Collaborator

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly LGTM. I do have a few questions for understanding and the assertions are all type insensitive (doing the loose comparison) now. Is that intentional? Could we use strict comparison instead?

@BridgeAR BridgeAR merged commit 091130d into master Oct 31, 2025
763 checks passed
@BridgeAR BridgeAR deleted the sabrenner/rework-llmobs-test-assertions branch October 31, 2025 15:27
dd-octo-sts bot pushed a commit that referenced this pull request Nov 1, 2025
…6709)

This changes the LLM Observability deepEqualWithMockValues to use node:assert instead of mocha expect. Also refactor the logic to do assertions. Now all properties will be validated, not only particular entries. That caught a couple of issues that will be handled separately.
@dd-octo-sts dd-octo-sts bot mentioned this pull request Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants