feat: add network monitoring application#1217
Conversation
7b88f03 to
5c3c2c3
Compare
5c3c2c3 to
f6d0e0b
Compare
bin/network-monitoring/README.md
Outdated
| @@ -0,0 +1,32 @@ | |||
| # Miden network monitoring | |||
|
|
|||
| This crate contains a binary for running a Miden network monitor that can monitor multiple remote provers. | |||
There was a problem hiding this comment.
Maybe something like
| This crate contains a binary for running a Miden network monitor that can monitor multiple remote provers. | |
| A monitoring app for a Miden network's infrastructure. | |
| It serves a webpage with an overview of the current infrastructure status and emits OpenTelemetry events which can be ingested for more advanced monitoring and alerting. |
bin/network-monitoring/README.md
Outdated
There was a problem hiding this comment.
Maybe also a section on currently supported items, and future items?
There was a problem hiding this comment.
Could we get away with just cargo check? But maybe the full one is better - I'm mainly concerned with keeping github cache usage low. So maybe we keep as is, and then if cache gets clobbered we revisit.
There was a problem hiding this comment.
It is a relatively simple binary, using just check is probably enough. I will remove both jobs (build, install) and add a check one. I can re-add it if we change our minds
bin/network-monitoring/src/main.rs
Outdated
| // Wait for either task to complete or fail, then abort the other | ||
| let (frontend_result, status_result) = tokio::join!(frontend_task, status_task); |
There was a problem hiding this comment.
I don't think this works - join! waits for all tasks to complete:
Waits on multiple concurrent branches, returning when all branches complete.
For short lists like this tokio::select! is probably the way to go; for longer lists I use JoinSet with join_next_with_id but that involves storing the IDs as well to identify which task completed.
| pub async fn get_dashboard() -> Html<&'static str> { | ||
| Html(include_str!("../assets/index.html")) | ||
| } | ||
|
|
||
| pub async fn get_status( | ||
| axum::extract::State(shared_status): axum::extract::State<SharedStatus>, | ||
| ) -> axum::response::Json<crate::status::NetworkStatus> { | ||
| let status = shared_status.lock().await; | ||
| axum::response::Json(status.clone()) | ||
| } |
There was a problem hiding this comment.
Do these need to be pub?
There was a problem hiding this comment.
No, it does not
| /// | ||
| /// * `shared_status` - The shared status of the network. | ||
| /// * `config` - The configuration of the network. | ||
| pub async fn run_frontend(shared_status: SharedStatus, config: MonitoringConfig) { |
There was a problem hiding this comment.
Nit:
| pub async fn run_frontend(shared_status: SharedStatus, config: MonitoringConfig) { | |
| pub async fn serve(shared_status: SharedStatus, config: MonitoringConfig) { |
| // build our application with routes | ||
| let app = Router::new() | ||
| // Serve static files from assets directory | ||
| .nest_service("/assets", ServeDir::new("bin/network-monitoring/assets")) |
There was a problem hiding this comment.
Does this work when installed via cargo install? Because I don't think the assets folder will be bundled with unless ServeDir somehow embeds the assets into the binary.
I've typically forced this by embedding the files directly in the binary by using include_str! and friends, but then you can't use a directory approach.
See 0xMiden/miden-faucet#59 for some additional context.
| .route("/status", get(get_status)) | ||
| .with_state(shared_status); | ||
|
|
||
| // run our app with hyper, listening globally on the configured port |
There was a problem hiding this comment.
Nit: don't think we use hyper directly
| // run our app with hyper, listening globally on the configured port |
There was a problem hiding this comment.
This is fine for now but I think we'll run into problems if/when we start adding more checks.
An alternative I would suggest as follow-up is to split each check into its own task, using a tokio Watch channel to communicate its latest status.
bobbinth
left a comment
There was a problem hiding this comment.
Looks great! Thank you! This is a very shallow review from me, but I left a couple of comments inline. The main one is renaming everything from "miden network monitoring" to "miden network monitor".
|
|
||
| let bind_address = format!("0.0.0.0:{}", config.port); | ||
| println!("Starting web server on {bind_address}"); | ||
| println!("Dashboard available at: http://localhost:{}/", config.port); |
There was a problem hiding this comment.
Did we want to use the same tracing setup we use everywhere else?
There was a problem hiding this comment.
I can do it easily by importing it from miden_node_utils. Should I always mark opentelemetry as disabled? Not sure if we want that here.
There was a problem hiding this comment.
I would want it enabled so that we can keep track of it via Honeycomb too
There was a problem hiding this comment.
I'm actually not sure the monitor should be tied in to Honeycomb - or even have OTEL enabled at all. In my mind it is a simple binary that we run purely as an external user (and anyone can run it as well). So, the simpler we can make it, the better.
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Than you! Again not a very deep review from me, but i left a couple of small comments inline.
bin/network-monitor/Cargo.toml
Outdated
|
|
||
| [dependencies] | ||
| anyhow = { workspace = true } | ||
| axum = { version = "0.8.4" } |
There was a problem hiding this comment.
nit: just 0.8 for version should be enough.
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
|
@SantiagoPittella, @Mirko-von-Leipzig, @sergerad - is this mostly done? or is there anything outstanding? |
Yes, there is a follow up issue that I'm already addressing and has a PR open that can be merged after this one. All comments were addressed here. |
bobbinth
left a comment
There was a problem hiding this comment.
Not a deep review from me - but looks good! Thank you!
@Mirko-von-Leipzig, @sergerad - we should probably deploy this on a small instance (e.g., 1 - 2 core). Maybe one deployment for testnet and one for devenet? It would be cool if this was accessible under something like status.testnet.miden.io and status.devnet.miden.io.
Another cool thing for a future PR would be to include some additional data for block producer (and maybe NTX builder). For example, something like number of tx in the mempool. Let's create an issue for this (unless we already have one).

partially addresses #1190
This is the current look of the frontned
