Skip to content

✨ make all sampling decisions deterministic#4194

Merged
thomas-lebeau merged 9 commits intov7from
thomas.lebeau/v7-deterministic-sampling
Feb 20, 2026
Merged

✨ make all sampling decisions deterministic#4194
thomas-lebeau merged 9 commits intov7from
thomas.lebeau/v7-deterministic-sampling

Conversation

@thomas-lebeau
Copy link
Copy Markdown
Collaborator

@thomas-lebeau thomas-lebeau commented Feb 17, 2026

Motivation

Decouple sampling decisions from session storage. Currently, the session store computes and persists tracking types (e.g. rum=1, logs=0) alongside the session ID. This means sampling logic lives deep in the core session store, and both RUM and Logs depend on it to make sampling decisions at session creation time.

This PR moves sampling responsibility to each product (RUM, Logs), where it's computed on demand using the deterministic isSampled() function. Since sampling is deterministic based on session ID + sample rate, there's no need to persist the decision — it can be recomputed at any time.

Additionally, it corrects a monotone correlation issue in chained deterministic sampling: when the same hash is compared against two thresholds (session → replay/tracing/profiling), the effective child rate must be adjusted to parentRate × childRate / 100 so that childRate% of tracked sessions get the child feature.

Changes

Move sampler from rum-core to core

  • Moved sampler.ts from packages/rum-core/src/domain/sampler/ to packages/core/src/domain/ so both RUM and Logs can use it
  • Updated imports in tracer.ts and profilerApi.ts

Simplify core session manager and store

  • SessionManager no longer carries a TrackingType generic, productKey, or computeTrackingType callback
  • SessionStore always generates a session ID on creation (no more "not tracked = no ID" logic)
  • Removed SESSION_NOT_TRACKED checks and tracking type persistence from session state
  • SessionContext no longer has a trackingType field
  • hasSessionInCache() now checks for id instead of product key
  • isSessionInCacheOutdated() only compares session IDs (no more tracking type comparison)

Move sampling to each product

  • RUM (rumSessionManager.ts): computeTrackingType() now takes (configuration, sessionId) and calls isSampled() at query time
  • Logs (logsSessionManager.ts): Same pattern
  • Both stubs updated to generate a UUID for deterministic sampling consistency
  • Removed RUM_SESSION_KEY, LOGS_SESSION_KEY constants and hasValidRumSession/hasValidLoggerSession validators

Apply correction factor for chained deterministic sampling

  • Added correctedChildSampleRate(parentRate, childRate) in core/sampler.ts
  • Applied at 3 call sites: replay (rumSessionManager.ts), tracing (tracer.ts), profiling (profilerApi.ts)
  • Wrapped isProfilingSupported in mockable() to enable testing the profiler sampling path

Test instructions

  1. yarn test:unit --spec packages/core/src/domain/sampler.spec.ts — unit tests for correctedChildSampleRate and isSampled
  2. yarn test:unit --spec packages/rum-core/src/domain/rumSessionManager.spec.ts — deterministic sampling with correction for replay
  3. yarn test:unit --spec packages/rum-core/src/domain/tracing/tracer.spec.ts — correction for tracing
  4. yarn test:unit --spec packages/rum/src/boot/profilerApi.spec.ts — correction for profiling
  5. yarn typecheck — no type errors

Correction tests use a MID_HASH_UUID (hash ~50.7%) with sessionSampleRate=60, childRate=60. The corrected rate (36) rejects while the uncorrected rate (60) would pass — tests fail if the correction is removed.

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Feb 17, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 75.00%
Overall Coverage: 76.93% (-0.17%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f5ae67c | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from 3878686 to f97da14 Compare February 17, 2026 11:29
@thomas-lebeau thomas-lebeau changed the base branch from thomas.lebeau/v7-session-manager-from-scratch-clean to thomas.lebeau/remove-beta-encode-cookie-options February 17, 2026 11:30
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Feb 17, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 171.86 KiB 172.00 KiB +144 B +0.08%
Rum Profiler 4.67 KiB 4.67 KiB 0 B 0.00%
Rum Recorder 24.88 KiB 24.88 KiB 0 B 0.00%
Logs 56.12 KiB 56.38 KiB +268 B +0.47%
Flagging 944 B 944 B 0 B 0.00%
Rum Slim 127.61 KiB 127.63 KiB +20 B +0.02%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.004 0.0057 +42.50%
RUM - add action 0.0134 0.0244 +82.09%
RUM - add error 0.0128 0.0214 +67.19%
RUM - add timing 0.0027 0.0031 +14.81%
RUM - start view 0.0136 0.0167 +22.79%
RUM - start/stop session replay recording 0.0007 0.0009 +28.57%
Logs - log message 0.0159 0.0212 +33.33%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 26.40 KiB 26.65 KiB +258 B
RUM - add action 115.40 KiB 114.36 KiB -1.03 KiB
RUM - add timing 26.81 KiB 26.53 KiB -287 B
RUM - add error 119.60 KiB 120.24 KiB +652 B
RUM - start/stop session replay recording 25.54 KiB 25.29 KiB -262 B
RUM - start view 502.40 KiB 510.54 KiB +8.14 KiB
Logs - log message 45.28 KiB 46.48 KiB +1.20 KiB

🔗 RealWorld

@thomas-lebeau thomas-lebeau marked this pull request as ready for review February 17, 2026 11:33
@thomas-lebeau thomas-lebeau requested a review from a team as a code owner February 17, 2026 11:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f97da142c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/rum-core/src/domain/rumSessionManager.ts Outdated
Comment thread packages/rum-core/src/domain/rumSessionManager.ts
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from f97da14 to 28c6885 Compare February 17, 2026 11:43
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/remove-beta-encode-cookie-options branch from e49d028 to 1ba1e48 Compare February 17, 2026 12:11
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from 28c6885 to 45e1731 Compare February 17, 2026 12:13
Base automatically changed from thomas.lebeau/remove-beta-encode-cookie-options to v7 February 17, 2026 12:29
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from 45e1731 to 5c33ff2 Compare February 17, 2026 13:15
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from e9f9893 to 31d3ba3 Compare February 18, 2026 07:19
@thomas-lebeau thomas-lebeau changed the title ♻️ Move sampling out of session storage ✨ make all sampling decisions deterministic Feb 18, 2026
Comment thread packages/logs/src/domain/logsSessionManager.ts Outdated
Comment thread packages/rum-core/src/domain/tracing/tracer.spec.ts Outdated
Comment on lines +64 to +66
if (!session || session.id === 'invalid') {
return
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Those invalid session id is somewhat bothering, but I guess it'll be removed in a follow-up step?

Copy link
Copy Markdown
Collaborator Author

@thomas-lebeau thomas-lebeau Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it yet, but I'm taking note 👍

const traceSampled = isSampled(session.id, configuration.traceSampleRate)
const traceSampled = isSampled(
session.id,
correctedChildSampleRate(configuration.sessionSampleRate, configuration.traceSampleRate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the new session manager will start regardless of the product and won't be tied to RUM. Hence, do we need child sample rates? Can we not have plain sample rates that are easier to understand?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traces still depend on RUM, if rum don't start, we don't collect traces.
It's true that we could imagine a separate Trace product, but that's a separate topic.
currently we sample traces as: N% of collected resources will have traces. with deterministic sampling we will have N% of sampled session will have traces

mormubis and others added 7 commits February 19, 2026 14:06
Co-authored-by: Thomas Lebeau <thomas.lebeau@datadoghq.com>
Co-authored-by: Thomas Lebeau <thomas.lebeau@datadoghq.com>
- Change startRumSessionManagerWithDefaults helper to use sessionSampleRate: 100 and sessionReplaySampleRate: 100 as defaults
- Remove redundant sessionSampleRate: 100 overrides from individual test cases that relied on always-tracked behavior
…rdown

- Add deterministic sampling tests to rumSessionManager covering session tracking and replay sampling based on known high/low hash UUIDs
- Move resetSampleDecisionCache() from sampler.spec.ts to the global forEach.spec.ts afterEach hook to ensure consistent cleanup across all test suites
…ampling

- Add correctedChildSampleRate to compute the effective sample rate for
  child features (replay, tracing, profiling) given the parent session rate
- Apply the correction factor (parentRate * childRate) / 100 at each
  sampling decision site so chained sampling produces the expected
  end-to-end probability
@thomas-lebeau thomas-lebeau force-pushed the thomas.lebeau/v7-deterministic-sampling branch from cfa2fb0 to 44a32ad Compare February 19, 2026 13:07
- Replace `{} as any` args with proper typed mocks in profilerApi test

- Fix logsSessionManager stub assertion: session ID should be undefined since the stub no longer generates one
@thomas-lebeau thomas-lebeau merged commit 32325c0 into v7 Feb 20, 2026
19 checks passed
@thomas-lebeau thomas-lebeau deleted the thomas.lebeau/v7-deterministic-sampling branch February 20, 2026 08:22
@github-actions github-actions Bot locked and limited conversation to collaborators Feb 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants