Skip to content

feat: add humanize monitor web dashboard for RLCR sessions#63

Merged
SihaoLiu merged 30 commits intoPolyArch:betafrom
zevorn:feat/viz-dashboard
Apr 19, 2026
Merged

feat: add humanize monitor web dashboard for RLCR sessions#63
SihaoLiu merged 30 commits intoPolyArch:betafrom
zevorn:feat/viz-dashboard

Conversation

@zevorn
Copy link
Copy Markdown

@zevorn zevorn commented Apr 2, 2026

Summary

  • Add humanize monitor web subcommand to scripts/humanize.sh — a per-project Flask + SPA dashboard that reads the same .humanize/rlcr/ data the terminal monitors use, runs in the foreground by default, and serves a live pipeline graph + sidebar + streaming cache-log pane.
  • Auto-generate a sanitized methodology report (generic taxonomy, forbidden-token scan, no project-specific leakage) via local Claude CLI, preview it in the dashboard, and submit to PolyArch/humanize via gh issue create — all reachable from the session-detail ops menu.
  • Replaces the earlier /humanize:viz Claude command; its skill and slash-command entries are removed in favor of the CLI subcommand (foreground first; --daemon opt-in for tmux-backed daemon mode).

How to use

# One-shot foreground launch (localhost, no auth).
humanize monitor web --project /path/to/project

# Remote mode: bind to a non-loopback host and require a bearer token.
humanize monitor web --project /path/to/project \
    --host 0.0.0.0 --port 18000 \
    --auth-token "$(openssl rand -hex 16)"

# Daemon mode (opt-in): delegates to the tmux-backed launcher.
humanize monitor web --project /path/to/project --daemon

Open http://localhost:18000 in a browser. The home page lists every session under the project; click an active card to enter the session detail page. The bottom live-log panel streams the most recent round's round-N-codex-run.log; it auto-switches when a new round starts. A three-state toggle (collapsed / normal / expanded) and a node-click auto-fold keep the pipeline canvas legible.

From the ops menu ( top-right on the detail page), Preview Issue invokes local Claude to generate the methodology report (if missing), renders a sanitized preview modal, and can submit the resulting issue to PolyArch/humanize via gh issue create (requires gh auth login once).

Architecture

Backend (Python/Flask, auto-managed venv in .humanize/viz-venv):

  • viz/server/app.py — REST + WebSocket + SSE endpoints; bearer-token auth (header for fetch, ?token= for SSE); same-origin CSRF gate on every mutating route; cancel endpoint routed through project-scoped helper; Claude-CLI-driven report generation; gh issue create wrapper.
  • viz/server/parser.py — RLCR session parser (YAML frontmatter, goal-tracker markdown, round summaries, review results, methodology reports). Accepts list-form and table-form acceptance criteria plus legacy / dashless / nested id forms.
  • viz/server/watcher.py — watchdog observer on .humanize/rlcr/ plus a per-session cache-dir observer that broadcasts round_added the moment round-N-*.log appears, so the frontend switches the live-log pane without waiting for the next state.md write.
  • viz/server/log_streamer.py — per-session SSE log stream: snapshot / append / resync / eof with 256-event Last-Event-Id replay.
  • viz/server/rlcr_sources.py — cache-dir derivation and filename parsing; shared by the backend and tests.

Frontend (vanilla HTML/CSS/JS, zero build step):

  • Pipeline canvas with snake-path node layout, SVG connectors, mouse-wheel zoom + drag pan, per-round flyout detail panel, and incremental node updates (append or replace only the changed node, preserving zoom/pan across rounds).
  • Home page lists Active + Completed session cards. Diff-based refresh: only cards whose rendered HTML changed are replaced; no full-list re-animation on WS events.
  • Session-detail page carries a below-canvas live-log pane driven by SSE; its collapse state survives log-basename switches and WS refreshes.
  • WebSocket-driven targeted subtree refresh (pipeline / sidebar / goal-bar) with 500 ms trailing-edge debounce; remote-mode 10 s polling fallback uses the same targeted path.
  • Analytics route with Chart.js panels (rounds per session, round duration, verdict distribution, P-issues, first-COMPLETE, BitLesson growth) plus a verdict timeline and a session comparison table. Non-destructive theme toggle; only the analytics route re-renders on theme change because its charts bake CSS vars into SVG.
  • DOMPurify-backed Markdown rendering for every piece of session content injected into the DOM.

Integration

  • scripts/humanize.sh — dispatches monitor web to _humanize_monitor_web, which auto-provisions the venv, finds a free port in 18000–18099, probes the configured bind for liveness, fails closed on a non-loopback bind without --auth-token, and either execs Flask in the foreground or delegates to the opt-in tmux launcher.
  • viz/scripts/viz-start.sh / viz-stop.sh / viz-status.sh — the opt-in --daemon path uses per-project tmux session names (SHA-1 of the project path) with tmux -t =name exact-match so prefix collisions cannot touch another project's dashboard. Startup probe fails closed and cleans up viz.port / viz.url when readiness never lands.

Security posture

  • Localhost binds skip auth; any non-loopback bind requires --auth-token (or HUMANIZE_VIZ_TOKEN) and refuses to start without one.
  • Same-origin CSRF layer gates every mutating route against the request's actual Host / Origin (handles --host 0.0.0.0 wildcard binds and IPv6 literals).
  • Session-id path validation rejects .., dotfiles, and symlink escapes; plan-file reads are constrained to the project or session directory.
  • Methodology report sanitization maps free-form Claude output to a constrained taxonomy and blocks outbound payloads carrying paths, symbol names, git hashes, or local endpoints.

Test plan

  • humanize monitor web --project <path> launches in the foreground and serves the dashboard on the chosen port.
  • humanize monitor web --host 0.0.0.0 --port 18000 --auth-token <tok> serves remotely and rejects unauthenticated requests.
  • Home page lists Active vs Completed sessions; active-session count is rendered without full-list re-animation on WS events.
  • Clicking a session opens the pipeline; new rounds land as appended nodes with zoom/pan preserved.
  • The bottom live-log pane streams round-N-codex-run.log and switches to round N+1 the moment the file is created.
  • Three-state toggle (collapsed / normal / expanded) works on desktop and narrow-screen layouts.
  • Ops → Preview Issue generates the methodology report via local Claude CLI, shows a sanitized preview, and submits to PolyArch/humanize when gh is authenticated.
  • test-viz.sh, test-viz-isolation.sh, test-streaming.sh, test-app-auth.sh, test-app-routes-live.sh, test-cancel-session.sh, test-frontend-migration.sh, test-rlcr-sources.sh, test-style-compliance.sh all pass.
  • No regressions in the pre-existing upstream suites (test-template-loader.sh, test-bash-validator-patterns.sh, robustness/*, etc.).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6771bda603

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/parser.py Outdated
Comment thread viz/static/js/pipeline.js Outdated
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 2, 2026

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9aa75a59bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
Comment thread viz/scripts/viz-status.sh Outdated
@zevorn zevorn force-pushed the feat/viz-dashboard branch from 615fcde to 1b575fe Compare April 2, 2026 09:27
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 7, 2026

@codex

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@ZenusZhang ZenusZhang force-pushed the dev branch 2 times, most recently from 16dbf50 to a1f3614 Compare April 7, 2026 03:38
@SihaoLiu
Copy link
Copy Markdown
Contributor

@codex review this PR as entirety

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1b575fe70a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/scripts/viz-start.sh Outdated
Comment thread viz/scripts/viz-status.sh Outdated
Comment thread viz/static/js/app.js
@zevorn zevorn force-pushed the feat/viz-dashboard branch 4 times, most recently from c6c5e63 to a43cfec Compare April 18, 2026 06:11
@zevorn zevorn changed the title feat: add RLCR visualization dashboard (/humanize:viz) feat: add humanize monitor web dashboard for RLCR sessions Apr 18, 2026
@zevorn zevorn force-pushed the feat/viz-dashboard branch from a43cfec to 3039e7e Compare April 18, 2026 07:06
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 18, 2026

@SihaoLiu

@zevorn zevorn force-pushed the feat/viz-dashboard branch 2 times, most recently from 450d91e to be29c41 Compare April 18, 2026 09:59
Introduces the `humanize monitor web` command and its Flask + SPA
dashboard for RLCR sessions, plus the accumulated review-phase
fixes and UX polish.

Feature
- Per-project session discovery and summary (state.md, goal tracker,
  round summaries, review results, methodology reports).
- Pipeline canvas with snake-path node layout, SVG connectors, pan /
  zoom, and per-round flyout details.
- Session-detail page carries a below-canvas live-log panel with a
  three-state toggle (collapsed / normal / expanded) driven by
  --log-h and a single grid-template-rows declaration. Clicking a
  pipeline node auto-collapses the log; closing the flyout restores
  the prior state.
- Incremental pipeline update (_updatePipelineIncremental) appends
  or replaces individual nodes and redraws only the SVG connectors,
  preserving zoom / pan across round additions.
- Diff-based home refresh: only cards whose rendered HTML changed
  are replaced; unchanged cards stay untouched.
- Per-session SSE live-log streaming from ~/.cache/humanize with
  snapshot / append / resync / eof / retained-replay protocol.
- Active-log selection mirrors the CLI monitor: pick the top
  round's codex-run log, with codex-review / gemini-run /
  gemini-review fallbacks.
- REST + WebSocket + SSE transports. Localhost binds skip auth;
  remote binds require a bearer token (Authorization header for
  fetch, ?token= query for SSE) plus origin-matched CSRF gating on
  every mutating route.
- Ops-menu Preview Issue flow: calls local Claude CLI to generate
  a sanitized methodology report (humanize issue taxonomy), shows
  a preview modal, and submits via `gh issue create --repo
  humania-org/humanize`. Falls back to a copy-to-clipboard path
  when gh is unauthenticated.
- Active-session dynamic progress visualization, methodology report
  generation via local Claude CLI, and a comprehensive frontend test
  suite.

Correctness and safety
- broadcast_message uses set.difference_update to avoid the
  UnboundLocalError from Python treating `-=` as a local rebind.
- Session path validation rejects `..`, dotfiles, symlink escapes;
  plan-file reads are constrained to the project / session dir.
- Remote-mode auth fail-closed when no token is configured.
- CSRF layer enforced on every mutating route, including wildcard
  binds and IPv6 literals.
- Parser accepts table-form acceptance criteria and legacy /
  single-letter criterion ids; multi-id table cells are split
  correctly; COMPLETE verdict only fires on terminal lines.
- Finalize-phase classification scoped to the live round, and
  skip-impl sessions correctly classify round 0 as review-only.

Transport and watcher
- watcher.py starts a cache-log observer per active session that
  broadcasts `round_added` when a new round-N-*.log file appears,
  so the dashboard switches the live-log pane to the new round
  without waiting for the next state.md write.
- SessionWatcher.on_session_created hook primes cache-log observers
  for brand-new sessions; every event retries the hook so the
  state-dir-before-cache-dir race eventually heals.
- Non-destructive toggleTheme (no DOM rebuild). The analytics route
  is the only surface whose charts bake CSS vars into SVG at render
  time, so that route alone re-renders on theme change.
- Remote-mode slow polling (10s) uses the same targeted refresh
  helpers as the WS path so card counters and pipeline nodes catch
  up without a full-page reload.
- Session-detail refresh race guard: after awaiting
  /api/sessions/<id>, re-check the active route + layout skeleton
  before mutating DOM.

Install and CLI
- Preflight codex_hooks and idempotent hook merge when paths contain
  spaces.
- Relaxed runtime detection and tolerate unwritable default shim
  paths.
- `humanize monitor web` foreground mode passes --host explicitly,
  IPv6 brackets in the printed URL, readiness probe is fail-closed.

Parser and API
- /api/sessions summary carries every field the terminal
  `humanize monitor rlcr` status bar needs (push_every_round,
  ask_codex_question, agent_teams, drift, stall count,
  ultimate_goal, tasks_active / deferred, active_log_path,
  git_status, etc.).
- cache_logs entries carry round / tool / role / path / basename
  for deterministic client-side selection.

Version: 1.17.0.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn zevorn force-pushed the feat/viz-dashboard branch from be29c41 to c0c4835 Compare April 18, 2026 10:07
@SihaoLiu
Copy link
Copy Markdown
Contributor

@codex review this PR as entirety against branch dev

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0c4835c59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/humanize.sh Outdated
Comment thread viz/scripts/viz-restart.sh Outdated
zevorn added 3 commits April 18, 2026 23:43
`scripts/humanize.sh` is meant to be sourced into the user's
interactive shell (the README shows `source humanize.sh` followed
by `humanize monitor web`). Using `exec` to hand control to the
Flask process replaced that shell, so pressing Ctrl+C — or any
server exit — took the whole interactive session down with it.

Replace the `exec` with a plain subprocess invocation so the
function returns normally when the server stops and the user's
shell prompt stays alive. Daemon mode still delegates to
`viz/scripts/viz-start.sh` and is unaffected.

Fixes Codex review P1 on PR PolyArch#63
(PolyArch#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
viz-restart.sh advertised `viz-restart.sh [--project <path>]` in
its usage string but ran `PROJECT_DIR="${1:-.}"`, which treated
the literal flag name `--project` as a directory. Calling the
documented form would fail at `cd --project` before the restart
could happen.

Mirror the flag-parsing loop used by viz-start.sh and viz-stop.sh
(positional path plus `--project <path>` named form), and forward
`--project` to both helpers so the whole chain uses the same
resolved absolute path.

Fixes Codex review P2 on PR PolyArch#63
(PolyArch#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
The six Chart.js analytics panels on /#/analytics (rounds per
session, avg round duration, verdict distribution, P0-P9 issues,
first COMPLETE round, BitLesson growth) were dropped in the
home-analytics refactor; the home page now carries the same
summary numbers as inline stat cards plus a lightweight inline-
SVG sparkline, and the analytics route surfaces the verdict
timeline + session comparison table without any Chart.js panels.

Clean up the remaining dead references that Codex flagged:
- Delete viz/static/js/charts.js (no longer imported).
- Drop the Chart.js CDN <script> from index.html.
- Drop the /js/charts.js <script> tag from index.html.
- Remove the six chart-panel i18n keys from i18n.js — no other
  caller references them.
- Update tests/test-viz.sh's static-file presence check to match.

Fixes Codex review P2 on PR PolyArch#63
(PolyArch#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@SihaoLiu
Copy link
Copy Markdown
Contributor

@codex review this PR as entirety against branch dev

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ebd4ee2d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
Comment thread viz/server/parser.py Outdated
Comment thread viz/static/js/app.js Outdated
zevorn added 2 commits April 19, 2026 00:41
The forbidden-token pattern for imports was
`\b(?:import|require|from)\s+\w+`, which matches ordinary English
prose like "drifted from the original plan structure". That one
phrase appears verbatim in the built-in `plan_execution`
methodology observation text, so every sanitized issue payload
that picked `plan_execution` as a dominant category was marked as
having warnings, and `/api/sessions/<id>/github-issue` rejected the
submission with a 400 even after the outbound body had already
been assembled from the constrained taxonomy.

Replace the single pattern with three code-anchored variants:
  - `^\s*import\s+[\w.]+`                (Python `import x.y` at line start)
  - `^\s*from\s+[\w.]+\s+import\b`       (Python `from x import y` at line start)
  - `\brequire\s*\(`                     (JS / Node `require(` call syntax)

These still catch real import fragments that leak into a
methodology report (bare `import os`, indented `  import sys`,
`from viz.server import app`, `require("fs")`) but leave natural
prose using the preposition "from" alone.

Fixes Codex review P1 on PR PolyArch#63
(PolyArch#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
`read_plan_file()` gated its open() on `os.path.exists(candidate_real)`,
which is True for directories as well as files. A state.md that
carried `plan_file: .` (or any directory under the project / session
that happened to resolve inside the allowed prefix) would therefore
pass validation, skip the backup fallback, and drop into
`open(candidate_real, 'r')` which raises IsADirectoryError. The
exception propagated out through /api/sessions/<id>/plan as an
uncaught 500 instead of surfacing the controlled fallback to the
session-local plan.md (or the intended 404 when no backup exists).

Swap the existence check for `os.path.isfile`, which is directory-
safe and also returns False for broken symlinks. Directories and
dangling symlinks now fall through to the existing backup branch.

Fixes Codex review P2 on PR PolyArch#63
(PolyArch#63 (comment)).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
CI route-backed truncation test (tests/test-app-routes-live.sh Group 6)
is prone to a watcher/writer race: when the writer does
``with open(path, 'wb') as f: f.write(data)`` the file is truncated
to 0 bytes before the second write lands. The per-cache watcher's
callback can fire on that first IN_MODIFY event and invoke
``_poll_locked`` while the file is still empty, so the resync path's
inline ``_snapshot_locked`` emits zero snapshot events and the content
subsequently arrives as ``append`` — violating the protocol contract's
resync -> snapshot sequencing that downstream clients rely on.

Track a ``_resync_pending`` flag that every resync path sets whenever
its immediate snapshot attempt returned empty (file was 0 bytes at
the time). The next poll that observes ``size > self._offset`` while
the flag is set emits the new content via ``_snapshot_locked`` instead
of ``_append``, and clears the flag once caught up. Normal write paths
are unaffected: the flag stays False when the resync snapshot sees
content immediately, so the append fast-path still covers steady-state
streaming.

Fixes the GitHub Actions ``run-all-tests`` failure on Round 20:
  FAIL: route-backed truncation event stream incomplete:
        snapshots=1 resync_truncated=True eof=True

The three existing resync sites (recreated, rotated, truncated) share
the same pattern so the fix is uniform across all of them.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@SihaoLiu
Copy link
Copy Markdown
Contributor

@codex review this PR as entirety against dev branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86f0de9b00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
Comment thread viz/server/parser.py Outdated
zevorn added 2 commits April 19, 2026 12:42
Round 21 Codex P1 (PRRT_kwDOQ4a3IM57_efh). `_TERMINAL_STATE_FILES`
listed `methodology-analysis-state.md` and `finalize-state.md` among
the markers that tell the SSE generator to emit `eof` and close the
stream. Both markers are actually *active* states — `_is_terminal_status`
excludes `analyzing` / `finalizing`, and the dashboard still treats
those sessions as cancellable — so `_session_is_terminal_cheap` was
mis-classifying live sessions as finished.

The observable impact was that as soon as a session entered finalize
or methodology-analysis, every live SSE client saw an immediate
`eof` event and the log pane stopped updating, even though the
backing codex-run / codex-review log files were still being written.

Trim the marker list to the five truly-terminal states (complete,
cancel, stop, maxiter, unexpected) so SSE now keeps delivering
append events throughout the analyzing/finalizing phases. A short
comment pins this list to `_is_terminal_status` so future markers
don't drift out of sync again.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 21 Codex P2 (PRRT_kwDOQ4a3IM57_efi). `parse_git_status` in
`viz/server/parser.py` checked `x == 'M' or y == 'M'` before the
index-side `A` branch. That made a porcelain status like `AM`
(new file added to the index, then modified in the worktree —
the common "stage a new file, then tweak it" workflow) count as
`modified` and never as `added`, so the dashboard git summary
drifted out of parity with `humanize_parse_git_status` in
`scripts/humanize.sh`, which explicitly maps `AM` to `added`.

Reorder the branches so any index-side `A` wins first (covers
`A `, `AM`, `AD`), followed by `R` (rename → modified) then
`D` (deletion), and only then the `M` fallback. Matches the
shell helper's explicit priority exactly and stops the dashboard
header from undercounting added files whenever the user has a
staged-new-file with an in-progress edit.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@codex review this PR as entirety against dev branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08ab1d4547

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/log_streamer.py
Comment thread viz/server/app.py Outdated
zevorn added 2 commits April 19, 2026 14:05
Round 22 Codex P1 (PRRT_kwDOQ4a3IM57_u12).
``LogStreamRegistry.get_or_create`` had no matching delete path, so
every ``(session_id, basename)`` ever opened lived until the dashboard
process exited. Each ``LogStream`` keeps a 256-event retention deque
(often large base64 snapshot/append chunks), so browsing many
sessions/round logs grew memory without bound on long-running viz
servers.

Extend the registry with explicit ``acquire`` / ``release`` helpers
that ref-count active SSE consumers, and expose ``LogStream.eof_emitted``
for the release path to query. Eviction policy:
- refcount > 0: keep the stream (clients still reading).
- refcount == 0 AND stream has emitted EOF: drop from the registry
  and free the retention deque.
- refcount == 0 AND session still active (no EOF yet): keep the
  stream so future reconnects still land inside the 256-event
  ``Last-Event-Id`` replay window the streaming contract mandates.

Wire the SSE route to ``_acquire_log_stream`` / ``_release_log_stream``:
the release now sits in the same generator ``finally`` that already
releases the cache watcher, so normal EOF, client disconnect, and
exception paths all balance the refcount. ``get_or_create`` stays
callable (tests exercise it directly without refcount semantics).

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 22 Codex P2 (PRRT_kwDOQ4a3IM57_u13). The SPA fallback route
probed ``os.path.isfile(os.path.join(STATIC_DIR, path))`` before
handing off to ``send_from_directory``. ``os.path.join`` does not
reject traversal segments, so a request like
``GET /..%2f..%2fetc%2fpasswd`` differentiated its response based on
whether the traversal target existed on disk:
- existing target → ``send_from_directory`` branch → 404
- missing target → SPA fallback → 200 ``index.html``

Because the static route is intentionally unauthenticated, that
turned the dashboard into an unauthenticated filesystem-existence
oracle for any path readable by the server process.

Swap ``os.path.join`` for ``werkzeug.utils.safe_join``: it returns
``None`` whenever the resolved path would escape ``STATIC_DIR``,
so traversal attempts skip the ``os.path.isfile`` probe entirely
and fall straight through to the SPA fallback. Legitimate static
assets keep working (they resolve to an absolute path inside
STATIC_DIR and pass both checks). No new CDN dep is introduced —
``werkzeug.utils`` is already imported transitively via Flask.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@codex review this PR as entirety against dev branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfec5afa83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/static/js/app.js Outdated
Comment thread viz/server/log_streamer.py Outdated
zevorn added 2 commits April 19, 2026 14:40
Round 23 Codex P2 (PRRT_kwDOQ4a3IM57_zC9). The earlier eviction path
only dropped a stream when ``refcount`` reached zero AND
``eof_emitted`` was already true. A client that disconnected before
its session reached a terminal state therefore left the stream (and
its 256-event retention deque, often full of base64 snapshot/append
chunks) resident for the process lifetime — no later code path marks
EOF on an unpolled stream whose session quietly ended. On long-lived
dashboard instances where users briefly browse many active sessions,
that accumulated memory without bound.

Add a second eviction axis: an idle-TTL timer recorded whenever a
release drops the refcount to zero without EOF. Every ``release``
doubles as an opportunistic sweep — entries whose idle timestamp is
older than ``IDLE_STREAM_TTL_SECONDS`` (5 min) and still have
refcount=0 are dropped regardless of EOF status. ``acquire`` clears
the idle timestamp so normal reconnect-within-window flows keep
retention (the streaming contract's 256-event ``Last-Event-Id``
replay window), while late reconnects fall through to the existing
``resync(overflow)`` + fresh-snapshot path.

The constructor accepts an override so tests can drive eviction
with a deterministic short TTL without waiting 5 minutes of
wall-clock time.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 23 Codex P1 (PRRT_kwDOQ4a3IM57_zC8). ``/api/analytics`` fed the
raw output of ``list_sessions(PROJECT_DIR)`` into ``compute_analytics``
without the ``_is_safe_session_id`` filter that ``/api/sessions``
already applies. A crafted on-disk session directory whose name
contained quote / JS metacharacters could therefore reach the
Analytics comparison table, where ``buildCmpTable`` interpolated
``s.session_id`` directly into an inline
``onclick="navigate('#/session/${id}')"`` handler plus cell HTML —
the same XSS vector the rest of the API had already closed.

Two layers of defence:

1. Backend: drop sessions whose id fails ``_is_safe_session_id``
   before handing them to the analyzer, so the Analytics payload
   cannot carry an unsafe id at all.
2. Frontend: stop interpolating the id through an inline
   ``onclick`` string literal. ``buildCmpTable`` now emits a
   ``data-session-id`` attribute (escaped via ``_esc``) and binds
   navigation through a delegated click listener; every other cell
   value is also escaped for consistency. Even if a future backend
   regression lets an unsafe id slip past the filter, the value
   flows only through ``dataset`` (a DOM-level string that is never
   re-parsed as JS) and ``window.location`` — neither evaluates
   markup.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cde0f712a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/analyzer.py Outdated
Round 24 Codex P2 (PRRT_kwDOQ4a3IM57_5AV). ``compute_analytics`` read
``s['current_round']`` as if it were a round count, but the field is
a 0-based *index* — a session that finished round 0 reports
``current_round=0`` even though its ``rounds`` list has one entry.
Two knock-on effects on the Analytics UI:

- ``overview.average_rounds`` undercounted every non-empty session
  by one. The prior filter ``current_round > 0`` also excluded
  single-round sessions entirely, so the average silently skipped
  the shortest runs instead of representing them.
- ``session_stats[*].rounds`` (the comparison table's ``Rounds``
  column) reported ``0`` for sessions whose only round was round 0
  and ``N-1`` for sessions with N rounds completed.

Switch both sites to ``len(s['rounds'])``. The parser already
constructs ``rounds`` from ``range(max_disk_round + 1)`` so its
length is the authoritative completed-round count, and it naturally
covers sessions where on-disk review files exceed ``current_round``
(the parser's own ``max_disk_round`` rule). Drop the ``> 0`` filter
in favour of ``n > 0`` on the derived length so single-round
sessions now participate in the average.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@codex review this PR as entirety against dev branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53f5b98493

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
Comment thread viz/server/exporter.py Outdated
zevorn added 2 commits April 19, 2026 20:04
Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pN). The GitHub-issue generator
rendered ``A {current_round}-round RLCR session ...`` where
``current_round`` is a 0-based *index*. A session whose only round
was round 0 reported ``0-round`` in the outbound issue text, and
every other session lost one round in the summary.

Swap to ``len(s['rounds'])``, matching the Round 24 fix in
``compute_analytics``. The parsed ``rounds`` list is the authoritative
count: the parser builds it from ``range(max_disk_round + 1)`` so it
covers both the normal ``current_round + 1`` case and the edge case
where on-disk review files exceed ``current_round``. The sanitized
taxonomy-only issue body now reflects the real session size.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
Round 25 Codex P2 (PRRT_kwDOQ4a3IM58A-pP). The Markdown exporter put
``session['current_round']`` into the ``Rounds`` overview cell, which
is a 0-based round index. A session whose only round was round 0
therefore exported ``| Rounds | 0 |`` and every other session lost
one round in the generated report. Downstream consumers (users
archiving session summaries, anything grepping the export) saw
incorrect session totals.

Use ``len(session.get('rounds') or [])`` to match the analytics and
sanitized-issue fixes from Round 24/25: the parsed rounds list is
the authoritative count regardless of how ``current_round`` relates
to ``max_disk_round`` (they can diverge when on-disk review files
exceed ``current_round``). The guard against a missing ``rounds``
key also hardens the exporter so a partially-parsed session cannot
raise AttributeError mid-render.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@codex review this PR as entirety against dev branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3655147ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread viz/server/app.py Outdated
Round 26 Codex P2 (PRRT_kwDOQ4a3IM58BEtU). The sanitized GitHub issue
body's ``Quantitative Summary`` table still read ``| Total rounds |
{s["current_round"]} |``. ``current_round`` is a 0-based round
index, so the table under-reported every session by one (``0`` for
a single-round session, ``N-1`` for an N-round session) even though
the Context line above already switched to ``round_total`` in
Round 25.

Reuse the ``round_total = len(s.get('rounds') or [])`` value already
computed earlier in the same function so the Quantitative Summary
and the Context paragraph agree on the session size. Downstream
issue readers no longer see conflicting counts between the two
sections of the generated report.

Signed-off-by: Chao Liu <chao.liu.zevorn@gmail.com>
@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@codex review this PR as entirety against dev branch

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@zevorn
Copy link
Copy Markdown
Author

zevorn commented Apr 19, 2026

@SihaoLiu

@SihaoLiu
Copy link
Copy Markdown
Contributor

@codex review this PR as entirety one more time

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6812053a13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

exit 3
}

SESSION_DIR="$PROJECT_ROOT/.humanize/rlcr/$SESSION_ID"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate session id before building cancel target path

SESSION_ID is interpolated directly into SESSION_DIR, so values like ../foo escape .humanize/rlcr and let this helper rename state files outside the intended session tree. In practice, cancel-rlcr-session.sh --project <proj> --session-id ../foo will operate on <proj>/.humanize/foo/state.md if present, which breaks the session-scoped safety contract and can mutate unrelated project state if any caller forwards unvalidated ids.

Useful? React with 👍 / 👎.

Comment on lines +445 to +446
self._idle_since[key] = time.monotonic()
self._sweep_idle_streams_locked()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Evict idle streams without requiring a later release call

Idle stream eviction only runs synchronously inside release(), so a stream that drops to refcount 0 without EOF is never removed if no subsequent release() happens after TTL. That leaves LogStream entries (including retained event buffers) resident for process lifetime after a one-off disconnect on an active session, which is a real memory leak in long-lived dashboard processes with low churn.

Useful? React with 👍 / 👎.

@SihaoLiu SihaoLiu changed the base branch from dev to beta April 19, 2026 16:16
@SihaoLiu SihaoLiu merged commit d67da3b into PolyArch:beta Apr 19, 2026
4 checks passed
@SihaoLiu SihaoLiu mentioned this pull request Apr 19, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants