Skip to content

feat: remote tracing context#669

Merged
Mirko-von-Leipzig merged 3 commits intonextfrom
mirko/otel-remote-context
Feb 6, 2025
Merged

feat: remote tracing context#669
Mirko-von-Leipzig merged 3 commits intonextfrom
mirko/otel-remote-context

Conversation

@Mirko-von-Leipzig
Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig commented Feb 5, 2025

This PR enables tracing across RPC boundaries using open-telemetry.

Open-telemetry supports smuggling trace ID context across HTTP/RPC calls by embedding this context in various headers. In our case we are using gRPC metadata headers (which in turn are just binary encoded http headers I believe). Notably, this context includes the sender's trace ID which allows us to connect the dots on the receiver end.

The general idea is that a trace should represent one cohesive item of work. Most examples are of course web-focussed and essentially boil down to a single user request being represented by one trace. In our world this becomes one trace per:

  • RPC component request
  • Single batch building
  • Single block building
  • Any other tasks we might have in the future

All of these cross our internal RPC boundaries and this PR allows us to continue tracing across it.

Implementation details

Open-telemetry tracing context is injected on the client side using tonic's Interceptor trait. On the receiver we extract this context using tonic's trace_fn. Both of these are implemented as middleware and are therefore RPC method agnostic.

Note that the RPC server is missing this -- the tracing is restricted to internal only APIs. We could consider expanding this to include other services e.g. the faucet. We would have to authenticate these requests somehow, maybe with an authentication header or API key. Though this might be of questionable value at this stage.

Concerns

This remote context is open-telemetry specific. I was somewhat hoping that tracing-forest would also magically connect the dots but apparently not :) This does sort of mean our stdout implementation will drift ito feature/quality compared to the otel exporter.

The open-telemetry specification also has a mountain of naming standards both general and specific to RPC and gRPC, http etc.. We will want to identify which of these are important to us. This may also mean that the current implementation can't cater to them e.g. client IP and server IP are not available to the middleware I used.

@Mirko-von-Leipzig Mirko-von-Leipzig force-pushed the mirko/otel-remote-context branch from db2341c to ddc6e3f Compare February 5, 2025 16:33
@Mirko-von-Leipzig Mirko-von-Leipzig force-pushed the mirko/otel-remote-context branch from 624c1fa to 2497c1e Compare February 5, 2025 16:54
@@ -0,0 +1 @@
pub mod grpc;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this in anticipation of splitting the logging.rs into here with otel and stdout modules.

Comment on lines +26 to +42
let span = match request.uri().path().rsplit('/').next() {
Some("ApplyBlock") => tracing::info_span!("store.rpc/ApplyBlock"),
Some("CheckNullifiers") => tracing::info_span!("store.rpc/CheckNullifiers"),
Some("CheckNullifiersByPrefix") => tracing::info_span!("store.rpc/CheckNullifiersByPrefix"),
Some("GetAccountDetails") => tracing::info_span!("store.rpc/GetAccountDetails"),
Some("GetAccountProofs") => tracing::info_span!("store.rpc/GetAccountProofs"),
Some("GetAccountStateDelta") => tracing::info_span!("store.rpc/GetAccountStateDelta"),
Some("GetBlockByNumber") => tracing::info_span!("store.rpc/GetBlockByNumber"),
Some("GetBlockHeaderByNumber") => tracing::info_span!("store.rpc/GetBlockHeaderByNumber"),
Some("GetBlockInputs") => tracing::info_span!("store.rpc/GetBlockInputs"),
Some("GetBatchInputs") => tracing::info_span!("store.rpc/GetBatchInputs"),
Some("GetNotesById") => tracing::info_span!("store.rpc/GetNotesById"),
Some("GetTransactionInputs") => tracing::info_span!("store.rpc/GetTransactionInputs"),
Some("SyncNotes") => tracing::info_span!("store.rpc/SyncNotes"),
Some("SyncState") => tracing::info_span!("store.rpc/SyncState"),
_ => tracing::info_span!("store.rpc/Unknown"),
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing::span! requires a compile-time constant string.

A more maintenance free alternative is to allow the open-telemetry and stdout tracing to drift apart some more. We could set the otel.name property which overrides the span name for open-telemetry (currently it just defaults to the span name).

This would mean stdout would get a more generic name = store.rpc and a separate rpc.method = {method} attribute.

Might be fine/better than this manual matching?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this addresses the m question above. I think we can leave it for a future PR.

/// client's origin trace.
pub fn store_trace_fn(request: &http::Request<()>) -> tracing::Span {
let span = match request.uri().path().rsplit('/').next() {
Some("ApplyBlock") => tracing::info_span!("store.rpc/ApplyBlock"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open-telemetry standard specifies {package}.{service}/{method} for RPC span names. Technically our protobuf says the service is called Api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would renaming the Api to Store allow us to make this method more general? (i.e., not having to list all endpoints).

@Mirko-von-Leipzig Mirko-von-Leipzig marked this pull request as ready for review February 5, 2025 17:23
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! A very light review from me, but I did leave a few comments inline.

/// client's origin trace.
pub fn store_trace_fn(request: &http::Request<()>) -> tracing::Span {
let span = match request.uri().path().rsplit('/').next() {
Some("ApplyBlock") => tracing::info_span!("store.rpc/ApplyBlock"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would renaming the Api to Store allow us to make this method more general? (i.e., not having to list all endpoints).

Comment on lines +26 to +42
let span = match request.uri().path().rsplit('/').next() {
Some("ApplyBlock") => tracing::info_span!("store.rpc/ApplyBlock"),
Some("CheckNullifiers") => tracing::info_span!("store.rpc/CheckNullifiers"),
Some("CheckNullifiersByPrefix") => tracing::info_span!("store.rpc/CheckNullifiersByPrefix"),
Some("GetAccountDetails") => tracing::info_span!("store.rpc/GetAccountDetails"),
Some("GetAccountProofs") => tracing::info_span!("store.rpc/GetAccountProofs"),
Some("GetAccountStateDelta") => tracing::info_span!("store.rpc/GetAccountStateDelta"),
Some("GetBlockByNumber") => tracing::info_span!("store.rpc/GetBlockByNumber"),
Some("GetBlockHeaderByNumber") => tracing::info_span!("store.rpc/GetBlockHeaderByNumber"),
Some("GetBlockInputs") => tracing::info_span!("store.rpc/GetBlockInputs"),
Some("GetBatchInputs") => tracing::info_span!("store.rpc/GetBatchInputs"),
Some("GetNotesById") => tracing::info_span!("store.rpc/GetNotesById"),
Some("GetTransactionInputs") => tracing::info_span!("store.rpc/GetTransactionInputs"),
Some("SyncNotes") => tracing::info_span!("store.rpc/SyncNotes"),
Some("SyncState") => tracing::info_span!("store.rpc/SyncState"),
_ => tracing::info_span!("store.rpc/Unknown"),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this addresses the m question above. I think we can leave it for a future PR.

@bobbinth
Copy link
Contributor

bobbinth commented Feb 6, 2025

Note that the RPC server is missing this -- the tracing is restricted to internal only APIs. We could consider expanding this to include other services e.g. the faucet. We would have to authenticate these requests somehow, maybe with an authentication header or API key. Though this might be of questionable value at this stage.

I would leave this for the future (and not sure it is needed either).

This may also mean that the current implementation can't cater to them e.g. client IP and server IP are not available to the middleware I used.

This means that, for example, the RPC component won't assign IP addresses to the request traces?

@Mirko-von-Leipzig
Copy link
Collaborator Author

This may also mean that the current implementation can't cater to them e.g. client IP and server IP are not available to the middleware I used.

This means that, for example, the RPC component won't assign IP addresses to the request traces?

Correct. Right now the implementation does only the bare minimum to connect traces together. From the tonic docs on Interceptor:

gRPC interceptors are similar to middleware but have less flexibility. An interceptor allows you to do two main things, one is to add/remove/check items in the MetadataMap of each request. Two, cancel a request with a Status.

[snip]

Additionally, interceptors is not the recommended way to add logging to your service. For that a tower middleware is more appropriate since it can also act on the response. For example tower-http’s Trace middleware supports gRPC out of the box.

I just wanted to get it working for now; and this actually had examples on how to achieve it.

Copy link
Contributor

@PhilippGackstatter PhilippGackstatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Mirko-von-Leipzig Mirko-von-Leipzig merged commit 018dde3 into next Feb 6, 2025
2 checks passed
@Mirko-von-Leipzig Mirko-von-Leipzig deleted the mirko/otel-remote-context branch February 6, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants