Prepare foundations for multi-consumer tracing output by TheJokr · Pull Request #190 · cloudflare/foundations

TheJokr · 2026-04-08T10:02:24Z

Introduce optional limit for span queue size.
Add metrics for total spans, dropped spans, current span queue size, and maximum configured span queue size.
Add tokio::sync::mpsc receiver wrapper to allow multi-consumer semantics. The next PR will introduce the option to start multiple consumer tasks.
Use batching recv_many calls in Jaeger UDP tracing output.

I reviewed all the well-known async MPMC queue implementations prior to landing on the async mutex wrapper for tokio's own channels. The common problem shared by almost all MPMC implementations is that they do not support batch receive operations (i.e., recv_many). This, combined with the bad locality of single-queue MPMC channels, makes me believe a Mutex-wrapped MPSC channel with batch receives will perform better for the tracing use case.

There are 2 MPMC implementations that do offer batching (batch-channel and burstq). These don't work for our use case either:

burstq pre-allocates the entire channel size, which is prohibitive since we expect to only use a fraction of it >99% of the time.
batch-channel does batching inside the sender. This means sending requires exclusive ownership over the sender, so we would have to put it inside a Mutex to share it between spans. This moves the locking from the (few) consumer tasks to the many, many spans that may be generated.

In contrast, the Mutex-wrapped MPSC receiver will have 1 active receiving task at any time and a FIFO queue of other tasks waiting to become the active receiver next. The active receiver gets a batch of spans, and while the Mutex is passed on to the next task a new batch accumulates in the channel.

We can revisit this decision with production metrics later on if needed.

foundations/src/telemetry/settings/tracing.rs

foundations/src/telemetry/tracing/metrics.rs

- Introduce optional limit for span queue size. - Add metrics for total spans, dropped spans, current span queue size, and maximum configured span queue size. - Add `tokio::sync::mpsc` receiver wrapper to allow multi-consumer semantics. The next commit will introduce the option to start multiple consumer tasks. - Use batching `recv_many` calls in Jaeger UDP tracing output. I reviewed all the well-known async MPMC queue implementations prior to landing on the async mutex wrapper for tokio's own channels. The common problem shared by almost all MPMC implementations is that they do not support batch receive operations (i.e., `recv_many`). This, combined with the bad locality of single-queue MPMC channels, makes me believe a Mutex-wrapped MPSC channel with batch receives will perform better for the tracing use case. There are 2 MPMC implementations that do offer batching (batch-channel and burstq). These don't work for our use case either: - burstq pre-allocates the entire channel size, which is prohibitive since we expect to only use a fraction of it >99% of the time. - batch-channel does batching inside the sender. This means sending requires exclusive ownership over the sender, so we would have to put it inside a Mutex to share it between spans. This moves the locking from the (few) consumer tasks to the many, many spans that may be generated. In contrast, the Mutex-wrapped MPSC receiver will have 1 active receiving task at any time and a FIFO queue of other tasks waiting to become the active receiver next. The active receiver gets a batch of spans, and while the Mutex is passed on to the next task a new batch accumulates in the channel. We can revisit this decision with production metrics later on if needed.

Added: - The `ratelimit!` utility macro simplifies the setup required for rate-limiting a code block into a single macro expression. There is also a special `ratelimit=` prefix syntax for log statements specifically. (#182) - The sentry metrics hook added in v5.5 now also supports rate-limiting for sentry events. To make use of this feature, call the new `foundations::sentry::install_hook_with_settings` setup function. (#183) - The telemetry server implements a `/pprof/symbol` endpoint now, which can be used for remote symbolization with pprof-compatible tools. (#186) - `foundations::telemetry::tracing::span_is_sampled()` provides a cheap way to check whether the current trace has been sampled. This allows skipping expensive tag/log formatting code if the values would be discarded anyway. (#187) - `Secret` (string) and `RawSecret` (bytes) wrappers have been added to aid with confidential values in config files. Both types hide their contents from Debug/Display calls and require an explicit accessors to retrieve the secret. Additionally, they zero their memory when dropped. (#188) - `MaybeExternal` is a new settings type that can load plain data (strings, bytes, and secrets) from either inline config or external sources (environment variables or file system). (#188) Improved: - `serde_yaml` was replaced by the new `serde-saphyr` YAML implementation. `serde_yaml` has been unmaintained since 2024. (#181) - Loggers can now be frozen, meaning any further mutation (such as `add_fields!`) will lead to an error. This is useful to catch bugs where mutations are applied to the wrong logger instance. (#189) - The maximum queue size for trace span output can now be limited via telemetry settings. The default has been set at 1 million spans. Additionally, there are new metrics to observe the queue size, total number of spans exported, and how many spans have been dropped. (#190) - Tracing can now be configured with multiple concurrent output tasks to boost span throughput. The tasks now run independently of the TelemetryDriver to ensure spans are output throughout the lifetime of the process. (#191) Fixed: - Log rate limiting now correctly applies across `set_verbosity` calls. (#180) Deprecated: - `foundations::sentry::install_hook` is deprecated in favor of `foundations::sentry::install_hook_with_settings`.

TheJokr requested a review from fisherdarling April 8, 2026 10:02

TheJokr self-assigned this Apr 8, 2026

TheJokr force-pushed the lblocher/trace-consumer-perf branch 5 times, most recently from 730730a to aad0d83 Compare April 8, 2026 10:17

fisherdarling approved these changes Apr 8, 2026

View reviewed changes

foundations/src/telemetry/settings/tracing.rs Show resolved Hide resolved

foundations/src/telemetry/tracing/metrics.rs Show resolved Hide resolved

TheJokr force-pushed the lblocher/trace-consumer-perf branch from aad0d83 to d1a9bea Compare April 9, 2026 09:31

TheJokr force-pushed the lblocher/trace-consumer-perf branch from d1a9bea to c5a5577 Compare April 9, 2026 09:33

TheJokr merged commit 1c03f04 into main Apr 9, 2026
20 checks passed

TheJokr deleted the lblocher/trace-consumer-perf branch April 9, 2026 09:42

TheJokr mentioned this pull request Apr 9, 2026

Release 5.6.0 #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare foundations for multi-consumer tracing output#190

Prepare foundations for multi-consumer tracing output#190
TheJokr merged 1 commit intomainfrom
lblocher/trace-consumer-perf

TheJokr commented Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TheJokr commented Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants