Skip to content

feat(examples): export rollouts to JSON (s)#68

Merged
FatPigeorz merged 9 commits into
Agentix-Project:masterfrom
Meirtz:feat/tui-export
May 30, 2026
Merged

feat(examples): export rollouts to JSON (s)#68
FatPigeorz merged 9 commits into
Agentix-Project:masterfrom
Meirtz:feat/tui-export

Conversation

@Meirtz
Copy link
Copy Markdown
Collaborator

@Meirtz Meirtz commented May 29, 2026

Stacks on #63 (the Agentix TUI). GitHub shows the cumulative diff until #63 merges; the net change here is the export feature — export_payload / export_to on RolloutsView, an s binding + action_save in app.py, and two pilot tests.

What

Lets you persist a run straight from the dashboard. RolloutsView.export_payload() returns a JSON-friendly snapshot — each Rollout.to_dict() (instance id, resolved, patch size, agent exit, score, duration) plus a small aggregate (total / done / resolved / failed). Pressing s writes agentix-rollouts.json to the cwd and toasts the count; with nothing collected yet it warns instead of writing an empty file.

This serves Agentix's rollout data collection goal directly — the snapshot is the unit an RL/eval loop persists for offline analysis or replay, and it's built only on the runner's existing Rollout.to_dict().

Verification

  • ruff check; headless run_test pilots assert the payload snapshots all instances and that export_to(tmp_path) round-trips through json.loads12 pilot tests green.

Meirtz and others added 9 commits May 30, 2026 03:37
`examples/eval-tui` is a modern, reactive Textual dashboard over
`agentix.runner`: a per-instance grid (pending -> setup -> agent -> scoring ->
PASS/FAIL/skip/error), a live summary bar (done / resolved / failed / running
+ throughput), and an event log. In-flight phases are observed by wrapping the
dataset/agent adapters (`_adapters.py`), so `agentix.runner` is unchanged.

- `--demo N` runs a synthetic, no-Docker batch (reproducible from a seed) — try
  it instantly. Real runs resolve `module:attr` dataset/agent + a provider,
  exactly like `agentix-run`.
- Standalone example (own lock) — its TUI deps stay out of the core-dev venv.
- Verified headlessly: ruff clean + a Textual `run_test()` pilot test that
  drives the demo to completion (no Docker).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restructure the single rollout dashboard into a multi-tab Textual app
(AgentixTUI) that surfaces each Agentix area:

- Rollouts — the live dashboard, refactored into a reusable view widget.
- Catalog — installed `agentix*` distributions + `agentix.provider` and
  `agentix.nix` entry points (pure introspection, no Docker).
- Sandboxes / Build / Observability — signposted placeholders for the
  follow-up PRs that flesh them out.

Adds DESIGN.md (the rubrics this iterates against), an idle state so the
app is useful with no run attached, and pilot tests for the tabbed app,
the idle path, and catalog discovery. ruff + headless run_test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l pane

Highlight a row in the Rollouts grid to see that instance's full detail
(verdict, duration, agent exit, patch size, score breakdown, error) in a
side panel, alongside the live event log. The rendered detail text is also
exposed on the view for headless assertions. ruff + 4 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A branded landing tab: a warm-gradient "AGENTIX" banner, live ecosystem
stat cards (packages / providers / nix-closures from the same introspection
the Catalog uses), a Docker-readiness indicator, and quick hints. Registers
a branded Textual theme (best-effort; falls back to the default if the
running Textual version's theme API differs). Pure introspection — renders
with or without Docker. Adds an Aesthetics rubric (DESIGN.md) and a pilot
test. ruff + 5 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Sandboxes placeholder with a live readiness view: the known
backends (docker / podman / apptainer / daytona / e2b) each probed for
usability here — binary on PATH, daemon reachable (a real `<bin> info`
subprocess in a worker), or SDK + API key present — plus a short note on
the session + remote-invoke model. Degrades gracefully when nothing is
installed. ruff + 6 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Observability placeholder with a split live feed of the two
Agentix side channels: /trace (OTel-style spans) on the left, /log (bridged
stdlib logging) on the right. With no run attached it plays a short synthetic
demo so the shape is visible; real streams arrive from running sandboxes.
ruff + 7 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Build placeholder with an interactive planner: a project-path
input that live-constructs the `agentix build … --platform … --output …`
command, the build model (uv owns Python, Nix owns binaries), and the
`agentix.nix` closures that would be staged (real entry-point introspection).
Adds number keybindings (1–6) to jump between tabs. The control room now has
six live tabs — Overview · Rollouts · Catalog · Sandboxes · Build ·
Observability. ruff + 9 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Catalog tab gets a filter input that narrows the distributions/entry-points
table by name / kind / detail as you type (title shows matched/total). ruff +
10 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds `RolloutsView.export_payload()` (a JSON-friendly snapshot: each
`Rollout.to_dict()` plus an aggregate) and `export_to(path)`. `s` writes
`agentix-rollouts.json` to the cwd and toasts the count; with no results
yet it warns instead. This is the unit an RL/eval loop persists for
offline analysis or replay. ruff + 12 pilot tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FatPigeorz FatPigeorz merged commit 9cdbf45 into Agentix-Project:master May 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants