Skip to content

fix: bot resilience + card/parser UI fixes#96

Merged
Time4Mind merged 3 commits into
mainfrom
fix/bot-resilience-and-ui
May 22, 2026
Merged

fix: bot resilience + card/parser UI fixes#96
Time4Mind merged 3 commits into
mainfrom
fix/bot-resilience-and-ui

Conversation

@Time4Mind
Copy link
Copy Markdown
Owner

Summary

Resilience + UI fixes surfaced by auditing prod logs (Conflict storms, stranded supervisors, frozen live cards, missing inline keyboards).

  • lifecycle (149a51d): exit on sustained getUpdates Conflict instead of looping forever deaf; idempotent, process-gate-first supervisor (probe lock → back off if a healthy instance holds it → only then net-wait + launch); graceful, deliberate restart.sh (SIGTERM → wait for clean lock release → SIGKILL only on timeout).
  • parser (8ae0244): detect Claude Code's "Resume from summary" select prompt so it renders the arrow/Enter/Esc keyboard; harden detection of tall multi-question AskUserQuestion whose header + footer scrolled off the pane.
  • card (63d653b): finalize a live card when the upstream Claude process stalls/exits mid-turn (no final assistant turn); stop the footer page counter collapsing to 1 during a card repost/stale-reset.

Test plan

  • ruff check + ruff format --check clean
  • pyright src/ccbot/: 0 errors
  • shellcheck on supervisor/restart scripts: clean
  • pytest: 528 passed (28 new tests across 6 files)
  • Applied live: bot restarted clean (single instance, no Conflict); supervisor reloaded and now supervising a fresh bot child

🤖 Generated with Claude Code

Time4Mind and others added 3 commits May 23, 2026 00:00
…cess-gate-first supervisor/restart

A sustained "Conflict: terminated by other getUpdates request" was swallowed
by the PTB error handler while the bot kept long-polling, leaving it
permanently deaf until a manual kill. The instance now exits after a short
Conflict streak (or >15s) so the singleton flock + supervisor converge to a
single live bot.

Supervisor reworked to be idempotent and fault-tolerant: it probes the
process-level lock FIRST and backs off (never preempts) while a healthy
instance holds it, only then waiting for network and launching. main.py
yields cleanly (exit 0) when another healthy instance owns the lock instead
of looking like a crash. restart.sh is the explicit, deliberate, graceful
restart path: SIGTERM -> wait for full exit + lock release -> SIGKILL only
after timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stion AskUserQuestion

Add a UIPattern for Claude Code's "Resume from summary" select shown when
resuming a large/old session, so ccbot renders the arrow/Enter/Esc keyboard
instead of leaving the resumed session hung. Add a footer-scrolled-off
fallback pattern for tall multi-question AskUserQuestion prompts, guarded
against colliding with Permission / ResumeSummary / Settings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…psing to 1

When the upstream Claude process stalls/exits mid-turn (no final assistant
turn, only filtered metadata entries), the live card stayed frozen on its
last frame. An active session whose spinner has been idle past a threshold
with no new content is now finalized with a "went idle" note.

The footer page counter collapsed to 1 during a card repost / stale-reset
because state.events was wiped and not re-seeded before the footer was built;
re-seed inline so the turn-based total stays correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Time4Mind Time4Mind merged commit a64a20d into main May 22, 2026
4 checks passed
@Time4Mind Time4Mind deleted the fix/bot-resilience-and-ui branch May 22, 2026 21:24
Time4Mind added a commit that referenced this pull request May 23, 2026
…select stall (#98)

* fix(card): stop double-repost flicker + duplicate user message

Two coupled live-card regressions, both surfacing on the active
session's card; neither introduced by the #97 refactor.

Bug 1 (delete+resend flicker): repost_card refreshed last_rendered /
last_edit_ts but not last_event_ts, so a card idle >= STALE_CARD_SECONDS
was misjudged stale by the first event after the repost and a second
card was spawned ~1-2s later. Stamp last_event_ts on repost (a repost is
itself user activity).

Bug 2 (user message rendered twice): the stale-reset and
release_card_message wipe sites re-seed events from JSONL — which already
holds the just-submitted prompt — then append the same live event again
with no dedup. Guard both append sites with _duplicate_of_seeded
(matches type/started_at/text; distinct turns never collide).

Adds tests/test_card_dup_repost.py (5 regression tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(kb-mode): detect multi-select AskUserQuestion + debounce teardown

A parked multi-select AskUserQuestion (numbered bracketed checkboxes
N. [✔] / N. [ ] with the cursor ❯ on a separate Submit line) was not
classified by any AskUserQuestion UIPattern once the user moved the
cursor onto Submit and the ☐ header had scrolled off — none of the
existing top anchors (bare ☐ glyph, or ❯ N. on a numbered option)
match that frame. Detection dropped mid-prompt: the kb-mode keyboard
vanished and, with all interactive_waiting signals gone, the A4
stall-rescue (PR #96) misfired the 'session went idle' note.

terminal_parser: add a multi-select AUQ pattern anchored on the stable
signatures (N. [✔]/[ ] checkbox lines, or the ❯ Submit line) framed by
the Enter-to-select footer, plus the same anchors on the bottom-less
last-resort pattern for the footer-also-scrolled-off case.

status_polling: debounce kb-mode teardown — require KB_CLEAR_CONFIRM_POLLS
consecutive no-UI polls before exit_kb_mode(clear_pending=True), matching
exit_kb_mode's stated 'double-poll confirm' intent, so a single flickered
detection frame can't wipe a prompt that is still on screen. Streak resets
when the prompt is re-detected.

Adds tests/test_askuser_multiselect.py + tests/test_kb_mode_debounce.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Time4Mind added a commit that referenced this pull request May 23, 2026
Audit (3 agents) of tests + docs + config vs the current code (post
#96/#97/#98). Findings + fixes:

Tests (suite was already healthy — no stale tests to delete):
- Drop duplicate test_session.py::TestActiveSessions::test_set_and_clear
  (covered by test_session_dm.py create+delete); keep the empty-initial guard.
- Strengthen 3 'is None or != AskUserQuestion' poach-guards to 'is None'
  (verified the parser returns None for those degraded-capture panes).

Docs:
- architecture.md: fix 'background sessions render their own cards' (they
  emit none — panel only); correct commands/ inventory (no /list /use
  /rename; add /health /help); add missing modules (logging_setup, metrics,
  voice_install, local_terminal, card_model, kb_mode, response_builder,
  context_poll [disabled], callbacks/help).
- dm-architecture.md: rewrite the slash-command block to match
  setMyCommands reality (published: menu/help/history/done + forwarded
  CC pickers; hidden: new/kill/stop/archive/screenshot/usage/health).
- dm-multisession-spec.md: BOT_TOKEN -> TELEGRAM_BOT_TOKEN, CLAUDE_BIN ->
  CLAUDE_COMMAND, MODEL_PATH -> WHISPER_MODEL_PATH; /history is published;
  flag /restore-file as not-yet-implemented.
- dm-multisession-plan.md: banner marking it COMPLETED & SUPERSEDED
  (its file:line hotspot map targets the removed monolithic bot.py).

Config:
- .env.example: remove dead SESSION_TOKEN_BUDGET_5H / MAX_5H_TOKENS /
  MAX_WEEKLY_TOKENS (retired local token aggregator); document
  CCBOT_RESUME_SETTLE_TIMEOUT / CARD_EDIT_LAG / BG_STATUS_MAX /
  BG_STATUS_QUOTA_THRESHOLDS / LOG_LEVEL + a supervisor-knobs pointer.
- session.py: stale '/status' -> 'Menu -> Status' in a comment.

No behavior change (only a comment edit in src). 560 tests pass.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant