concore trace: capturing and visualizing closed-loop execution across a run

Hi @pradeeban, i hope you are doing well, i have come up with a new idea....
While working on the protocol conformance matrix (phases 1 and 2) one thing that kept coming up for me was: once a study is running, there isn't really a way to see what actually happened across a run,only what the state looks like right now. concore watch gives a great live view of the current port values and edge freshness, and concore status tells you what's up, but neither of them answers questions like:

1. In iteration 42 of this dicycle, which node fired first?
2. How long did each node take per iteration, and is one of them the bottleneck?
3. A value went out of range at some point, when, and what was the upstream input that caused it?
4. Does this closed loop actually stabilize, or is it oscillating?
Right now the workflow is "add prints, re-run, read logs by hand." For a framework whose entire purpose is modelling closed-loop systems, i think we can do better.

I am thinking of adding a new CLI command, `concore trace`, that records a full timeline of a study run and lets you inspect it afterwards.

The key insight that makes this feasible without touching any language implementation: concore is file-based. Every port read and write already goes through a file in an edge directory. That means a tracer can run purely as an external observer of the study directory, no changes to concore.py, concore.hpp, concore.java, concore.v, or any of the MATLAB files. Whatever language a node is written in, we see the same thing: a file got written, with this value, at this time.

### What it would do
Two subcommands, roughly:
`concore trace record <study-dir> [--output trace.json]`
Wraps a study run (or attaches to one already running) and produces a structured trace file. For each port event it captures:

- timestamp (monotonic + wall clock)
- node name, port name, direction (read/write)
- the value, parsed through the same literal_eval path the protocol uses
- iteration index (inferred from the per-node iteration counter files concore already maintains)

`concore trace view <trace.json>`
Two output modes to start:

1. **Chrome-trace JSON** — drop into chrome://tracing or Perfetto and get per-node timelines, durations, and a visual of which nodes are blocking which. This is the "power user" view.
2. **Mermaid sequence diagram** — for a bounded window (e.g. --iterations 5..10), emit a sequence diagram of port reads/writes between nodes. This is the one I actually expect to be useful for papers and for explaining a study to someone.
A third mode I'd like to do but want to scope-check with you: an HTML timeline view, self-contained, no server. Probably out of scope for v1.

### This is

- Not a replacement for watch. watch is live state; trace is historical causality. They compose.
- Not a protocol change. No new payload format, no version bump, no per-language work.
- Not a profiler in the CPU-sampling sense. Timing resolution is port-event granularity, which is the level that matters for control loops anyway.

### Why I think this is worth doing
Two reasons, beyond the obvious "debugging is easier":

1. **It's the natural next thing after conformance.** Phase 1 proved the languages agree on the wire format. Phase 3 (planned for the coding period) extends that. But all of that is about correctness of a single message. Trace is about correctness of the loop which is the part classic workflow tools (CWL, WDL) fundamentally can't represent, and which the README explicitly calls out as the research gap concore fills. Having a first-class way to inspect loop behaviour feels like it belongs in the repo.

2. **It produces artifacts you can put in the paper.** A Mermaid sequence diagram of a cardiac PM controller stabilizing over 20 iterations is a figure i'd want to see in paper.md. Right now there's no way to produce one.

Some things i'd like feedback on before starting:

- **Scope of v1:** Would you prefer I ship record + Chrome-trace export only as a first PR, and do Mermaid + the richer views in follow-ups? I'd rather land something small and get it reviewed than sit on a big patch.
- **Overhead:** Even as a pure filesystem observer, a busy study could produce a lot of events. Should trace support sampling / filtering by node or port from the start, or is capture everything acceptable for v1?
- Relationship to watch: I'm slightly tempted to share the port-file parsing helpers in concore_cli/commands/watch.py rather than duplicate them. Happy to do the small refactor if you're OK with it, or keep them separate if you'd rather not mix the two.
- **Trace file format:** I'm leaning towards newline-delimited JSON (appendable, streamable, survives crashes) with a small header. Open to alternatives if there's a format the project already prefers.

If the direction looks right I'd like to be assigned this and put up a small design-only PR first (a README in concore_cli/commands/ describing the command surface and the trace schema), then the implementation in a second PR.

Happy to adjust scope based on what you think fits best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concore trace: capturing and visualizing closed-loop execution across a run #546

What it would do

This is

Why I think this is worth doing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

concore trace: capturing and visualizing closed-loop execution across a run #546

Description

What it would do

This is

Why I think this is worth doing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions