Skip to content

Prepare foundations for multi-consumer tracing output#190

Merged
TheJokr merged 1 commit intomainfrom
lblocher/trace-consumer-perf
Apr 9, 2026
Merged

Prepare foundations for multi-consumer tracing output#190
TheJokr merged 1 commit intomainfrom
lblocher/trace-consumer-perf

Conversation

@TheJokr
Copy link
Copy Markdown
Collaborator

@TheJokr TheJokr commented Apr 8, 2026

  • Introduce optional limit for span queue size.
  • Add metrics for total spans, dropped spans, current span queue size, and maximum configured span queue size.
  • Add tokio::sync::mpsc receiver wrapper to allow multi-consumer semantics. The next PR will introduce the option to start multiple consumer tasks.
  • Use batching recv_many calls in Jaeger UDP tracing output.

I reviewed all the well-known async MPMC queue implementations prior to landing on the async mutex wrapper for tokio's own channels. The common problem shared by almost all MPMC implementations is that they do not support batch receive operations (i.e., recv_many). This, combined with the bad locality of single-queue MPMC channels, makes me believe a Mutex-wrapped MPSC channel with batch receives will perform better for the tracing use case.

There are 2 MPMC implementations that do offer batching (batch-channel and burstq). These don't work for our use case either:

  • burstq pre-allocates the entire channel size, which is prohibitive since we expect to only use a fraction of it >99% of the time.
  • batch-channel does batching inside the sender. This means sending requires exclusive ownership over the sender, so we would have to put it inside a Mutex to share it between spans. This moves the locking from the (few) consumer tasks to the many, many spans that may be generated.

In contrast, the Mutex-wrapped MPSC receiver will have 1 active receiving task at any time and a FIFO queue of other tasks waiting to become the active receiver next. The active receiver gets a batch of spans, and while the Mutex is passed on to the next task a new batch accumulates in the channel.

We can revisit this decision with production metrics later on if needed.

@TheJokr TheJokr requested a review from fisherdarling April 8, 2026 10:02
@TheJokr TheJokr self-assigned this Apr 8, 2026
@TheJokr TheJokr force-pushed the lblocher/trace-consumer-perf branch 5 times, most recently from 730730a to aad0d83 Compare April 8, 2026 10:17
@TheJokr TheJokr force-pushed the lblocher/trace-consumer-perf branch from aad0d83 to d1a9bea Compare April 9, 2026 09:31
- Introduce optional limit for span queue size.
- Add metrics for total spans, dropped spans, current span queue size,
  and maximum configured span queue size.
- Add `tokio::sync::mpsc` receiver wrapper to allow multi-consumer
  semantics. The next commit will introduce the option to start multiple
  consumer tasks.
- Use batching `recv_many` calls in Jaeger UDP tracing output.

I reviewed all the well-known async MPMC queue implementations prior to
landing on the async mutex wrapper for tokio's own channels. The common
problem shared by almost all MPMC implementations is that they do not
support batch receive operations (i.e., `recv_many`). This, combined
with the bad locality of single-queue MPMC channels, makes me believe a
Mutex-wrapped MPSC channel with batch receives will perform better for
the tracing use case.

There are 2 MPMC implementations that do offer batching (batch-channel
and burstq). These don't work for our use case either:
- burstq pre-allocates the entire channel size, which is prohibitive
  since we expect to only use a fraction of it >99% of the time.
- batch-channel does batching inside the sender. This means sending
  requires exclusive ownership over the sender, so we would have to put
  it inside a Mutex to share it between spans. This moves the locking
  from the (few) consumer tasks to the many, many spans that may be
  generated.

In contrast, the Mutex-wrapped MPSC receiver will have 1 active
receiving task at any time and a FIFO queue of other tasks waiting to
become the active receiver next. The active receiver gets a batch of
spans, and while the Mutex is passed on to the next task a new batch
accumulates in the channel.

We can revisit this decision with production metrics later on if needed.
@TheJokr TheJokr force-pushed the lblocher/trace-consumer-perf branch from d1a9bea to c5a5577 Compare April 9, 2026 09:33
@TheJokr TheJokr merged commit 1c03f04 into main Apr 9, 2026
20 checks passed
@TheJokr TheJokr deleted the lblocher/trace-consumer-perf branch April 9, 2026 09:42
TheJokr added a commit that referenced this pull request Apr 9, 2026
Added:
- The `ratelimit!` utility macro simplifies the setup required for
  rate-limiting a code block into a single macro expression. There is
  also a special `ratelimit=` prefix syntax for log statements
  specifically. (#182)
- The sentry metrics hook added in v5.5 now also supports rate-limiting
  for sentry events. To make use of this feature, call the new
  `foundations::sentry::install_hook_with_settings` setup function.
  (#183)
- The telemetry server implements a `/pprof/symbol` endpoint now, which
  can be used for remote symbolization with pprof-compatible tools.
  (#186)
- `foundations::telemetry::tracing::span_is_sampled()` provides a cheap
  way to check whether the current trace has been sampled. This allows
  skipping expensive tag/log formatting code if the values would be
  discarded anyway. (#187)
- `Secret` (string) and `RawSecret` (bytes) wrappers have been added to
  aid with confidential values in config files. Both types hide their
  contents from Debug/Display calls and require an explicit accessors to
  retrieve the secret. Additionally, they zero their memory when
  dropped. (#188)
- `MaybeExternal` is a new settings type that can load plain data
  (strings, bytes, and secrets) from either inline config or external
  sources (environment variables or file system). (#188)

Improved:
- `serde_yaml` was replaced by the new `serde-saphyr` YAML
  implementation. `serde_yaml` has been unmaintained since 2024. (#181)
- Loggers can now be frozen, meaning any further mutation (such as
  `add_fields!`) will lead to an error. This is useful to catch bugs
  where mutations are applied to the wrong logger instance. (#189)
- The maximum queue size for trace span output can now be limited via
  telemetry settings. The default has been set at 1 million spans.
  Additionally, there are new metrics to observe the queue size, total
  number of spans exported, and how many spans have been dropped. (#190)
- Tracing can now be configured with multiple concurrent output tasks to
  boost span throughput. The tasks now run independently of the
  TelemetryDriver to ensure spans are output throughout the lifetime of
  the process. (#191)

Fixed:
- Log rate limiting now correctly applies across `set_verbosity` calls.
  (#180)

Deprecated:
- `foundations::sentry::install_hook` is deprecated in favor of
  `foundations::sentry::install_hook_with_settings`.
@TheJokr TheJokr mentioned this pull request Apr 9, 2026
TheJokr added a commit that referenced this pull request Apr 9, 2026
Added:
- The `ratelimit!` utility macro simplifies the setup required for
  rate-limiting a code block into a single macro expression. There is
  also a special `ratelimit=` prefix syntax for log statements
  specifically. (#182)
- The sentry metrics hook added in v5.5 now also supports rate-limiting
  for sentry events. To make use of this feature, call the new
  `foundations::sentry::install_hook_with_settings` setup function.
  (#183)
- The telemetry server implements a `/pprof/symbol` endpoint now, which
  can be used for remote symbolization with pprof-compatible tools.
  (#186)
- `foundations::telemetry::tracing::span_is_sampled()` provides a cheap
  way to check whether the current trace has been sampled. This allows
  skipping expensive tag/log formatting code if the values would be
  discarded anyway. (#187)
- `Secret` (string) and `RawSecret` (bytes) wrappers have been added to
  aid with confidential values in config files. Both types hide their
  contents from Debug/Display calls and require an explicit accessors to
  retrieve the secret. Additionally, they zero their memory when
  dropped. (#188)
- `MaybeExternal` is a new settings type that can load plain data
  (strings, bytes, and secrets) from either inline config or external
  sources (environment variables or file system). (#188)

Improved:
- `serde_yaml` was replaced by the new `serde-saphyr` YAML
  implementation. `serde_yaml` has been unmaintained since 2024. (#181)
- Loggers can now be frozen, meaning any further mutation (such as
  `add_fields!`) will lead to an error. This is useful to catch bugs
  where mutations are applied to the wrong logger instance. (#189)
- The maximum queue size for trace span output can now be limited via
  telemetry settings. The default has been set at 1 million spans.
  Additionally, there are new metrics to observe the queue size, total
  number of spans exported, and how many spans have been dropped. (#190)
- Tracing can now be configured with multiple concurrent output tasks to
  boost span throughput. The tasks now run independently of the
  TelemetryDriver to ensure spans are output throughout the lifetime of
  the process. (#191)

Fixed:
- Log rate limiting now correctly applies across `set_verbosity` calls.
  (#180)

Deprecated:
- `foundations::sentry::install_hook` is deprecated in favor of
  `foundations::sentry::install_hook_with_settings`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants