Skip to content

chore: improve tracing in network monitor#1366

Merged
SantiagoPittella merged 2 commits intonextfrom
santiagopittella-improve-tracing-in-monitor
Dec 17, 2025
Merged

chore: improve tracing in network monitor#1366
SantiagoPittella merged 2 commits intonextfrom
santiagopittella-improve-tracing-in-monitor

Conversation

@SantiagoPittella
Copy link
Copy Markdown
Collaborator

This PR uses main as base branch

Improves the instrumentation in the miden-network-monitor binary.

This is the current view of the traces (in jaeger):

Screenshot 2025-11-13 at 17 21 17

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-improve-tracing-in-monitor branch from 7c6ec2b to 97531ec Compare November 13, 2025 20:32
@SantiagoPittella
Copy link
Copy Markdown
Collaborator Author

@bobbinth @Mirko-von-Leipzig mdbook publishing a new, non compatible, version a few hours ago. I tried removing the multilingual option but other errors arised. Should I lock the Ci version to the previous (v0.4.52) until we complete the migration? mdbook people provides a migration guide https://github.com/rust-lang/mdBook/blob/master/CHANGELOG.md#05-migration-guide

Copy link
Copy Markdown
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the plan is still to merge this and patch release it, but LGTM! Sorry for the delay in the review.
Not super familiar with the frequencies of some processes. Doesn't seem like these changes would be very spammy but it's probably something to watch out for.

Comment on lines +39 to +42
debug!(target: COMPONENT, "Initializing monitoring tasks");

// Initialize the RPC Status endpoint checker task.
debug!(target: COMPONENT, "Initializing RPC status checker");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Emitting 2 events here feels a bit redundant/unnecessary

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the first one. For the debug! level I left one log per component.

level = "info",
ret(level = "debug")
)]
pub(crate) async fn check_remote_prover_status(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that could be nice for troubleshooting is to raise info events for service status changes. Something like

  let mut last = Status::Unknown;
  // ...
  let status = check_remote_prover_status(...).await;
  if status.status != last {
      info!(target: COMPONENT, prover = %name, status = ?status.status, "Remote prover status changed");
      last = status.status;
  }

But maybe this is already easily filterable with the current traces

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 443 we have this:

   debug!(target: COMPONENT, prover_name = %name, remote_prover_status = ?status, "Remote prover status check successful");

Is that what you meant?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly meant emitting events when the status changes (so for example, when a service goes from healthy to unhealthy or viceversa). It feels like it could be useful for troubleshooting, but maybe not.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but the idea of the monitor is not troubleshooting but a quick glance of the overall state of the components

@SantiagoPittella
Copy link
Copy Markdown
Collaborator Author

Not sure if the plan is still to merge this and patch release it, but LGTM! Sorry for the delay in the review.

Yeah, maybe now we want to change the base to next and avoid this release + deploy cc @bobbinth

Not super familiar with the frequencies of some processes. Doesn't seem like these changes would be very spammy but it's probably something to watch out for.

Maybe this can be controlled changing the sample rate if we decide that we need to reduce it? That will require some probably minor changes in the utils crate.

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-improve-tracing-in-monitor branch from 01eeccc to fdee8ba Compare December 15, 2025 20:14
@SantiagoPittella SantiagoPittella changed the base branch from main to next December 15, 2025 20:14
@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-improve-tracing-in-monitor branch from fdee8ba to 8791a06 Compare December 16, 2025 20:24
@bobbinth bobbinth requested a review from sergerad December 16, 2025 23:20
@SantiagoPittella SantiagoPittella merged commit 4e27e15 into next Dec 17, 2025
6 checks passed
@SantiagoPittella SantiagoPittella deleted the santiagopittella-improve-tracing-in-monitor branch December 17, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants