Skip to content

Add compute node resource monitoring#270

Merged
daniel-thom merged 3 commits intomainfrom
feat/compute-node-resource-stats
Apr 19, 2026
Merged

Add compute node resource monitoring#270
daniel-thom merged 3 commits intomainfrom
feat/compute-node-resource-stats

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end compute-node (system-wide) resource monitoring, persists summary metrics on compute node records, and extends the dashboard/TUI/CLI + plot tooling to display and filter these metrics alongside updated workflow-spec configuration semantics.

Changes:

  • Add compute-node resource summary fields to the DB/API model and populate them from the resource monitor at runner shutdown.
  • Introduce scoped resource_monitor.jobs and resource_monitor.compute_node config blocks (while keeping legacy compatibility) and update docs/examples/tests accordingly.
  • Extend UI surfaces (dashboard tables, resource plots tab, TUI, CLI) and plot-resources tooling to include system timelines/summaries and workflow-based DB filtering.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
torc-server/migrations/20260319000000_add_compute_node_resource_summary.up.sql Adds compute-node summary columns.
torc-server/migrations/20260319000000_add_compute_node_resource_summary.down.sql Drops the added compute-node summary columns.
torc-dash/static/js/app-workflows.js Adds resource-plots workflow selector + robust id comparison.
torc-dash/static/js/app-tables.js Shows compute-node peak/avg CPU + memory columns in tables.
torc-dash/static/js/app-resources.js Filters resource DB list by workflow id; resets plot state on workflow change.
torc-dash/static/js/app-details.js Adds compute-node peak/avg columns to details table body rendering.
torc-dash/static/js/app-core.js Syncs resource-plots workflow selector on tab switch.
torc-dash/static/index.html Adds workflow selector UI to Resource Plots tab.
torc-dash/static/css/style.css Styles workflow labels in resource DB list.
tests/workflows/multi_node_parallel_jobs_test/workflow.yaml Updates workflow spec to new resource_monitor.jobs shape.
tests/test_resource_requirements.rs Improves panic message with underlying error.
tests/test_hpc.rs Adds additional ISO8601 duration parsing test case.
tests/test_compute_nodes.rs Adds integration test for compute-node summary field round-trip.
src/tui/ui.rs Adds Compute Nodes detail tab + table rendering with summary columns.
src/tui/app.rs Adds ComputeNodes view state, loading, and filtering.
src/tui/api.rs Adds client method to list compute nodes for TUI.
src/server/api/compute_nodes.rs Extends compute-node CRUD/list to include new summary columns.
src/plot_resources_cmd.rs Adds system sample/summary loading + system plots; improves bar dashboard axes.
src/models.rs Extends ComputeNodeModel with summary fields + initializes defaults/tests.
src/client/workflow_spec.rs Parses/prints nested resource-monitor scopes; adds legacy-compat tests.
src/client/resource_monitor.rs Implements scoped job vs compute-node monitoring + persists system samples/summary.
src/client/job_runner.rs Captures monitor shutdown summary and writes to compute-node record.
src/client/commands/compute_nodes.rs Displays compute-node system peak/avg metrics in CLI list/get + adds tests.
src/client/async_cli_command.rs Starts/stops per-job monitoring only when jobs scope is enabled.
src/bin/torc-dash.rs Adds workflow_id parsing for resource DB filenames and returns it to UI.
python_client/src/torc/openapi_client/models/compute_node_model.py Adds new compute-node summary fields to Python client model.
julia_client/julia_client/docs/ComputeNodeModel.md Documents new compute-node summary fields for Julia client.
julia_client/Torc/src/api/models/model_ComputeNodeModel.jl Adds new compute-node summary fields to Julia client model.
examples/yaml/slurm_staged_pipeline.yaml Updates example to new resource_monitor.jobs config shape.
examples/yaml/resource_monitoring_demo.yaml Updates example to include compute-node monitoring scope.
examples/yaml/multi_node_slurm.yaml Updates example to new resource_monitor.jobs config shape.
examples/kdl/slurm_staged_pipeline.kdl Updates example to nested jobs config block.
examples/kdl/resource_monitoring_demo.kdl Updates example to include nested jobs + compute_node blocks.
examples/json/slurm_staged_pipeline.json5 Updates example to nested jobs config block.
examples/json/resource_monitoring_demo.json5 Updates example to include nested jobs + compute_node blocks.
docs/src/core/reference/workflow-spec.md Documents new scoped resource-monitor config and legacy behavior.
docs/src/core/reference/resource-monitoring.md Updates resource-monitoring reference with system tables/plots.
docs/src/core/how-to/view-resource-plots.md Updates how-to to use resource_monitor.jobs for time-series.
api/openapi.yaml Adds new compute-node summary fields to OpenAPI schema.
api/openapi.codegen.yaml Mirrors OpenAPI schema changes for codegen.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/server/api/compute_nodes.rs
Comment thread src/client/resource_monitor.rs
Comment thread src/client/resource_monitor.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds compute-node (system-wide) resource monitoring and surfaces the resulting peak/avg CPU+memory summaries across the API, CLI/TUI, dashboard, and plot generation tooling.

Changes:

  • Extend compute_node with persisted resource summary fields (sample count, peak/avg CPU%, peak/avg memory).
  • Add compute-node/system sampling + storage to the resource monitor and generate system timeline/summary plots in plot-resources.
  • Update dashboard/TUI/CLI plus workflow-spec parsing/docs/examples to support scoped resource_monitor.jobs and resource_monitor.compute_node configuration.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
torc-server/migrations/20260319000000_add_compute_node_resource_summary.up.sql Adds new compute-node summary columns to the DB schema.
torc-server/migrations/20260319000000_add_compute_node_resource_summary.down.sql Removes the new compute-node summary columns on downgrade.
torc-dash/static/js/app-workflows.js Adds a workflow selector for the resource plots tab + robust ID comparison.
torc-dash/static/js/app-tables.js Shows CPU/mem peak/avg columns in compute nodes table.
torc-dash/static/js/app-resources.js Adds workflow filtering + state reset for resource DB selection/plot generation.
torc-dash/static/js/app-details.js Shows CPU/mem peak/avg in compute node details table.
torc-dash/static/js/app-core.js Syncs selected workflow into the resource-plots workflow selector.
torc-dash/static/index.html Adds workflow selector UI to Resource Plots tab.
torc-dash/static/css/style.css Styles workflow label in resource DB list items.
tests/workflows/multi_node_parallel_jobs_test/workflow.yaml Updates resource_monitor config to the new nested jobs structure.
tests/test_resource_requirements.rs Improves panic message with underlying error.
tests/test_hpc.rs Adds additional ISO8601 duration parsing test coverage.
tests/test_compute_nodes.rs Adds API round-trip test for new compute-node summary fields.
src/tui/ui.rs Adds a Compute Nodes detail view/table rendering in the TUI.
src/tui/app.rs Adds compute-nodes state, filtering, and load behavior to the TUI app model.
src/tui/api.rs Adds a TUI client call to list compute nodes.
src/server/api/compute_nodes.rs Extends compute-node CRUD/list queries to include the new summary fields.
src/plot_resources_cmd.rs Loads/merges system samples/summary and generates system timeline/summary plots + tests.
src/models.rs Extends ComputeNodeModel with the new optional summary fields.
src/client/workflow_spec.rs Adds scoped resource_monitor parsing/serialization + legacy compatibility tests.
src/client/resource_monitor.rs Introduces scoped monitoring config and compute-node (system) sampling + DB storage + tests.
src/client/job_runner.rs Persists compute-node system summary to the compute_node record on shutdown.
src/client/commands/compute_nodes.rs Displays compute-node peak/avg CPU+mem in CLI list/get output + tests.
src/client/async_cli_command.rs Gates per-job monitoring by jobs_enabled() for new scoped config.
src/bin/torc-dash.rs Adds workflow-id parsing for resource DB filenames + exposes it via API + tests.
python_client/src/torc/openapi_client/models/compute_node_model.py Regenerates Python client model to include new fields.
julia_client/julia_client/docs/ComputeNodeModel.md Updates Julia client docs for new compute-node fields.
julia_client/Torc/src/api/models/model_ComputeNodeModel.jl Updates Julia client model for new compute-node fields.
examples/yaml/slurm_staged_pipeline.yaml Updates example to nested resource_monitor.jobs.
examples/yaml/resource_monitoring_demo.yaml Updates example to nested jobs + new compute_node block.
examples/yaml/multi_node_slurm.yaml Updates example to nested resource_monitor.jobs.
examples/kdl/slurm_staged_pipeline.kdl Updates example to nested jobs block.
examples/kdl/resource_monitoring_demo.kdl Updates example to nested jobs + compute_node blocks.
examples/json/slurm_staged_pipeline.json5 Updates example to nested resource_monitor.jobs.
examples/json/resource_monitoring_demo.json5 Updates example to nested jobs + compute_node blocks.
docs/src/core/reference/workflow-spec.md Documents new scoped resource_monitor configuration and legacy behavior.
docs/src/core/reference/resource-monitoring.md Documents job vs compute-node monitoring scopes and new DB tables/outputs.
docs/src/core/how-to/view-resource-plots.md Updates how-to to use nested resource_monitor.jobs.
api/openapi.yaml Extends ComputeNodeModel schema with new summary fields.
api/openapi.codegen.yaml Extends codegen schema with new summary fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/client/resource_monitor.rs
Comment thread src/client/resource_monitor.rs
Comment thread docs/src/core/reference/resource-monitoring.md
@daniel-thom daniel-thom merged commit c9bd0ca into main Apr 19, 2026
9 checks passed
@daniel-thom daniel-thom deleted the feat/compute-node-resource-stats branch April 19, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants