Skip to content

feat: deep test_bot diagnostics framework#3032

Merged
EntchenEric merged 9 commits into
developmentfrom
feat/deep-test-bot-diagnostics
Jun 1, 2026
Merged

feat: deep test_bot diagnostics framework#3032
EntchenEric merged 9 commits into
developmentfrom
feat/deep-test-bot-diagnostics

Conversation

@EntchenEric
Copy link
Copy Markdown
Member

@EntchenEric EntchenEric commented Jun 1, 2026

Summary

Production-safe diagnostics package for t.test_bot: phased runner, ~199 slash handler behavior specs, thread-based reporting, health manager wiring, and command tree manifest drift detection.

Changes

  • diagnostics/ package: runner (phases A–F), harness, registry, discovery, manifest compare
  • t.test_bot refactored to use DiagnosticsRunner with Discord thread output
  • Behavior spec registrations for all slash extensions (admin skips for broken ctx handlers; interactive games/wizards skipped)
  • Parametrized unit tests run every discovered behavior spec
  • scripts/check_diagnostics_manifest.py wired into CI (import-smoke + test workflows)
  • E2E tree tests updated for 13 admin roots + 2 utility roots
  • Removed obsolete tests/test_ping.py, tests/test_database.py, tests/test_commands.py

Test plan

  • pytest tests/unit/diagnostics/
  • python scripts/check_diagnostics_manifest.py
  • CI (unit, integration, lint, mypy, manifest gate)
  • Manual t.test_bot on staging after deploy

Introduce a production-safe diagnostics package with phased checks,
199 slash handler behavior specs, thread-based reporting for test_bot,
health manager wiring, and command tree manifest tooling.

Work in progress: ~8 handler specs still failing, CI manifest gate and
e2e tree test updates not yet included.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a diagnostics framework: models and mocks, assertion and infra checks, spec discovery/registry, handler harness and kwargs defaults, patching helpers and per-spec registrations, manifest-based command-tree validation with scripts, and a DiagnosticsRunner that orchestrates multi-phase checks and reports results.

Changes

Bot Diagnostics System

Layer / File(s) Summary
Data models and mock infrastructure
diagnostics/models.py, diagnostics/mocks.py
Core dataclasses (CheckOutcome, PhaseResult, DiagnosticsSummary, CommandBehaviorSpec) and mock factories (make_guild, make_member, make_text_channel, make_interaction, etc.).
Assertions and infrastructure checks
diagnostics/assertions.py, diagnostics/infra_checks.py
Async assertion helpers (expect_interaction_or_modal, expect_interaction_response, expect_defer, expect_mock_called) and infra checks (check_ping, check_database).
Handler invocation harness & kwargs defaults
diagnostics/harness.py, diagnostics/kwargs_defaults.py
invoke_interaction_command harness to execute handlers with mock context; build_kwargs_for_handler inspects signatures and provides test defaults.
Patching helpers and prefix skips
diagnostics/patches.py, diagnostics/prefix_skips.py
Context managers (api_patches, extension_patches, utility_permission_patches) and constants for prefix-excluded commands with reasons.
Spec discovery and registry execution
diagnostics/discovery.py, diagnostics/registry.py
discover_all_specs() introspects configured extensions to build CommandBehaviorSpec entries; run_spec() executes specs with patching, timeout, and optional assertions.
Spec helpers and configuration
diagnostics/specs/_helpers.py, diagnostics/specs/overrides.py
Helpers to register per-spec kwargs, skips, and custom assertions; configuration dicts (SPEC_SKIPS, SPEC_OVERRIDES, SPEC_PATCH_TARGETS, SPEC_CUSTOM_ASSERTIONS).
Command spec registrations
diagnostics/specs/{admin,ai,channel,fun,games,giveaway,image,level,logs,math,minigames,setup_wizards,utility}.py, diagnostics/specs/__init__.py
Per-extension modules that register spec entries (admin skips/custom assertion, game kwargs/choices, image defer-only) and load_all() to register them.
Command tree validation and manifest
diagnostics/tree.py, diagnostics/manifest.json, scripts/{generate_diagnostics_manifest.py,check_diagnostics_manifest.py}
Traverse bot command tree, compare to JSON manifest, and provide scripts to generate and validate the manifest; includes minigame subgroup diffs.
Diagnostics runner orchestration
diagnostics/runner.py
DiagnosticsRunner orchestrates phases (infrastructure, platform health, background loops, manifest, cogs, handler behaviors), posts progress to thread/status message, and posts final summary.
Package interface and admin integration
diagnostics/__init__.py, extensions/administration.py, main.py, tests/unit/diagnostics/test_behavior_specs.py, tests/integration/extensions/test_administration_comprehensive.py, scripts/check_coverage_per_file.py
Package exports, admin test_bot now runs diagnostics via DiagnosticsRunner, BackgroundLoopHealthCheck wired into startup, updated unit/integration tests, and coverage exemptions for WIP diagnostics files.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

"🐰 I hopped through specs with cheerful cheer,
Built mocks and manifests far and near,
The runner hums and reports what's true,
Tests pass in rows — a carrot for you! 🥕"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: deep test_bot diagnostics framework' clearly and specifically describes the main change: implementing a comprehensive diagnostics framework for the test_bot command.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/deep-test-bot-diagnostics

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diagnostics/discovery.py`:
- Around line 36-47: The final unguarded constructor call in _instantiate_group
can raise a TypeError and abort discovery; wrap the third instantiation attempt
(the group_cls(MagicMock()) call) in a try/except TypeError and on failure
return None (or otherwise skip the group) instead of letting the exception
propagate; update _instantiate_group's return contract (or typing to
Optional[Any]) so callers like discover_extension_specs/discover_all_specs can
safely skip None results.
- Around line 82-103: The code currently appends (callback.__name__, callback)
from group.walk_commands() but later looks up the handler via getattr(group,
method_name), which is fragile; instead, when iterating group.walk_commands()
store and later use the actual command.callback so you don't rely on attribute
name matching. Change the logic around walk_commands to append the pair
(method_name, callback) where method_name can be callback.__name__ (or a derived
id) but ensure later when populating handler you use the stored callback for
invoke_interaction_command paths (reference symbols: walk_commands,
command.callback, command_entries, handler, invoke_interaction_command, group).
Ensure you still skip interaction_check/on_error and preserve SPEC_SKIPS
behavior.

In `@diagnostics/infra_checks.py`:
- Around line 10-13: The latency measurement uses time.time() which is not
monotonic; replace the calls in the latency block (start = time.time(), end =
time.time()) with a monotonic timer such as time.monotonic() or
time.perf_counter() to compute latency for the ctx.send("ping") round-trip (the
variables start, end, and latency calculation should use the chosen monotonic
function).

In `@diagnostics/kwargs_defaults.py`:
- Around line 79-86: Remove the redundant first assignment in _make_role: the
expression make_member().__class__() assigned to role is immediately overwritten
by the next line, so delete that dead statement (the make_member().__class__()
assignment) and keep the subsequent role = make_member() and the property
assignments in the _make_role function.
- Around line 74-75: Duplicate key "giveaway_id" is defined twice in the same
dict (the second occurrence silently overrides the first); remove the redundant
entry so only a single canonical key remains. Keep the snake_case "giveaway_id"
and delete the duplicate mapping (or if the short form "giveawayid" was intended
as an alias, replace it with a single explicit alias key and remove the
duplicate "giveaway_id"); update any tests/usages if they relied on the
duplicate. Ensure the dict contains only one "giveaway_id" entry.

In `@diagnostics/patches.py`:
- Around line 33-62: The code currently enters contexts via stack/_enter before
the try/finally, so if any _enter (e.g., patch.object or patch) raises it leaks
active patches; replace the manual stack and _enter logic with a
contextlib.ExitStack instance (e.g., stack = ExitStack()) and call
stack.enter_context(api_patches()) and stack.enter_context(patch.object(...)) /
stack.enter_context(patch(...)) inside the loops, then yield mocks from within
the ExitStack-managed block so all entered contexts are automatically cleaned up
even on partial failures; remove the manual __enter__/__exit__ handling in
finally and rely on ExitStack to unwind.

In `@diagnostics/prefix_skips.py`:
- Around line 1-19: The two structures PREFIX_COMMANDS_EXCLUDED and
PREFIX_SKIP_REASONS duplicate the same identifiers and can drift; fix by
defining PREFIX_SKIP_REASONS first and then deriving PREFIX_COMMANDS_EXCLUDED
from its keys (e.g., set/frozenset(PREFIX_SKIP_REASONS.keys())) so the excluded
set always mirrors the mapping; ensure type remains frozenset[str] and adjust
ordering so PREFIX_SKIP_REASONS is declared before PREFIX_COMMANDS_EXCLUDED.

In `@diagnostics/registry.py`:
- Around line 29-35: Replace the inline kwargs-resolution in run_spec with a
single call to the existing helper _resolve_kwargs(spec, handler) and remove any
duplicate logic so behavior can't drift; update run_spec to call
_resolve_kwargs(spec, handler) instead of inlining the callable/extra check, and
ensure the consolidated helper continues to fall back to
build_kwargs_for_handler(handler) (or adjust both places to the chosen fallback)
so both paths behave identically; finally delete any now-unused duplicate code
if applicable.

In `@diagnostics/runner.py`:
- Around line 71-76: The phase calls in run_all (_run_phase_a_infra,
_run_phase_b_health, _run_phase_c_loops, _run_phase_d_tree,
_run_phase_f_extensions, _run_phase_e_handlers) must be uniformly protected so a
failure in one phase doesn't abort the rest; wrap each await call in its own
try/except, catch exceptions from manager.run_all() and
compare_tree_to_manifest() inside their respective phase methods (or where
they're invoked), log or record the phase failure outcome, and continue to the
next phase so run_all always attempts every phase before returning.
- Around line 188-202: The loop in Phase E currently calls self._thread_send for
every failing spec which can burst Discord API calls; modify the logic in the
for coro in asyncio.as_completed(tasks) loop to buffer failure messages (e.g.,
collect strings into a local list like fail_buffer) instead of calling
_thread_send immediately, and flush that buffer by calling _thread_send only
when the buffer reaches a configured size (e.g., 10) or when a progress/event
boundary is hit (use self._spec_done % SPEC_PROGRESS_INTERVAL == 0 or at the end
of the phase), ensure you still call _update_parent as before for progress, and
make sure to flush any remaining messages after the loop completes so no
failures are lost; reference functions/variables: _run_one, _thread_send,
_update_parent, SPEC_PROGRESS_INTERVAL, self._spec_done, phase.outcomes.
- Around line 154-157: The extra tree entries are currently recorded as failures
(CheckOutcome(..., False, ...)) while the user-facing message says WARN; change
the outcome to be non-blocking by creating the CheckOutcome for each extra path
with the success/skipped flag indicating a non-failure (e.g., set the
skipped/success parameter instead of False) so phase.outcomes and
DiagnosticsSummary.ok treat extras as warnings rather than failures; update the
CheckOutcome construction in runner.py (the loop that appends to phase.outcomes
for each path) to use the non-blocking status and keep the await
self._thread_send(...) WARN message unchanged.

In `@diagnostics/specs/utility.py`:
- Line 10: The spec registration is using the wrong spec-id and kwargs shape:
replace the register_kwargs call that currently registers
"utility.BoosterRoleCommands.setboosterrole" with
"utility.BoosterRoleCommands.setupboosterrole" and switch the kwargs preset from
the channel-shaped default_channel_kwargs to the role-shaped preset (e.g.,
default_role_kwargs or whatever role-shaped factory exists) so
BoosterRoleCommands.setupboosterrole is exercised with {"role": ...} instead of
{"channel": ...}; update the register_kwargs invocation accordingly.

In `@extensions/administration.py`:
- Around line 136-150: The thread creation call message.create_thread currently
sits before the try/except and can raise uncaught errors; move the thread
creation into the try block (or add a small try/except around
message.create_thread) so failures produce the same user-friendly error flow:
create the thread inside the try that calls DiagnosticsRunner(...).run_all(),
and in any exception path use error_embed with tanjunLocalizer.localize(...,
error=e) and fallback messaging (e.g., message.edit and message.send) rather
than letting exceptions from message.create_thread escape; reference
DiagnosticsRunner, run_all, message.create_thread, error_embed, and
tanjunLocalizer.localize when updating the code.

In `@scripts/check_diagnostics_manifest.py`:
- Around line 28-56: The manifest comparison currently omits the
"minigame_subgroups" key so subgroup drift isn't detected; update the comparison
logic (either in main() of scripts/check_diagnostics_manifest.py or in
compare_tree_to_manifest) to fetch expected.get("minigame_subgroups") and
compare it to actual.get("minigame_subgroups") (or actual["minigame_subgroups"])
as a set/dict as appropriate, and return failure (1) with a concise error
printout showing missing/extra entries when they differ; use the existing
pattern used for "paths" (compute missing/extra, print up to 20) to implement
this check.
- Around line 29-54: Replace the manual manifest read + set-comparison in
scripts/check_diagnostics_manifest.py with the existing diagnostics/tree.py
helpers: call load_manifest() to obtain the expected manifest and build the
diagnostics bot using the existing bot setup logic (extract or reuse the
_actual/_build_manifest bot-setup into a shared build_bot() helper if needed),
then call compare_tree_to_manifest(bot) to perform the comparison and return its
result; keep only the presentation/printing logic here (errors/reporting) and
remove the json.loads(MANIFEST_PATH.read_text(...)) and the explicit
exp_paths/act_paths or exp_roots/act_roots set comparisons so the comparison
logic is centralized in compare_tree_to_manifest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 62a9bbe6-87b1-4ecd-8823-a51ebf4a64c3

📥 Commits

Reviewing files that changed from the base of the PR and between 23e57c4 and f55c03a.

📒 Files selected for processing (35)
  • diagnostics/__init__.py
  • diagnostics/assertions.py
  • diagnostics/discovery.py
  • diagnostics/harness.py
  • diagnostics/infra_checks.py
  • diagnostics/kwargs_defaults.py
  • diagnostics/manifest.json
  • diagnostics/mocks.py
  • diagnostics/models.py
  • diagnostics/patches.py
  • diagnostics/prefix_skips.py
  • diagnostics/registry.py
  • diagnostics/runner.py
  • diagnostics/specs/__init__.py
  • diagnostics/specs/_helpers.py
  • diagnostics/specs/admin.py
  • diagnostics/specs/ai.py
  • diagnostics/specs/channel.py
  • diagnostics/specs/fun.py
  • diagnostics/specs/games.py
  • diagnostics/specs/giveaway.py
  • diagnostics/specs/image.py
  • diagnostics/specs/level.py
  • diagnostics/specs/logs.py
  • diagnostics/specs/math.py
  • diagnostics/specs/minigames.py
  • diagnostics/specs/overrides.py
  • diagnostics/specs/setup_wizards.py
  • diagnostics/specs/utility.py
  • diagnostics/tree.py
  • extensions/administration.py
  • main.py
  • scripts/check_diagnostics_manifest.py
  • scripts/generate_diagnostics_manifest.py
  • tests/unit/diagnostics/test_behavior_specs.py

Comment thread diagnostics/discovery.py Outdated
Comment thread diagnostics/discovery.py Outdated
Comment thread diagnostics/infra_checks.py Outdated
Comment thread diagnostics/kwargs_defaults.py Outdated
Comment thread diagnostics/kwargs_defaults.py
Comment thread diagnostics/runner.py
Comment thread diagnostics/specs/utility.py Outdated
Comment thread extensions/administration.py Outdated
Comment thread scripts/check_diagnostics_manifest.py
Comment thread scripts/check_diagnostics_manifest.py Outdated
EntchenEric and others added 2 commits June 1, 2026 09:12
Harden discovery/registry/runner per review feedback, centralize manifest
comparison in tree.py, fix booster-role spec kwargs, and repair unit tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@EntchenEric EntchenEric self-assigned this Jun 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/diagnostics/test_behavior_specs.py`:
- Around line 26-38: run_spec currently calls _instantiate_group(spec.group_cls)
and handler = getattr(group, spec.method_name) outside its try/except so an
AttributeError escapes; wrap the group instantiation and method lookup (the
calls to _instantiate_group and getattr) in the existing try/except (or add a
new one) and on any exception return a CheckOutcome constructed as
CheckOutcome(spec.id, False, str(exc)) so run_spec returns a failing
CheckOutcome for missing groups/methods instead of raising.
- Around line 9-13: The module-level pytestmark applies asyncio to all tests
including the synchronous test_discovers_many_behavior_specs; remove the
module-level pytestmark and instead add the pytest.mark.asyncio decorator only
to the asynchronous test function test_run_spec_handles_unknown_spec so that
test_discovers_many_behavior_specs remains a normal sync test and only
test_run_spec_handles_unknown_spec runs under the asyncio marker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: bfbe9213-01b7-499d-8190-99c820ebd489

📥 Commits

Reviewing files that changed from the base of the PR and between f55c03a and 55b6e2f.

📒 Files selected for processing (2)
  • tests/integration/extensions/test_administration_comprehensive.py
  • tests/unit/diagnostics/test_behavior_specs.py

Comment on lines +9 to +13
pytestmark = pytest.mark.asyncio


def test_discovers_many_behavior_specs() -> None:
"""Discover specs without requiring full discord.py mock setup."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="tests/unit/diagnostics/test_behavior_specs.py"
if [ ! -f "$FILE" ]; then
  echo "Missing file: $FILE" >&2
  exit 1
fi

echo "=== $FILE (head -n 120 with line numbers) ==="
nl -ba "$FILE" | sed -n '1,140p'

Repository: TanjunBot/new_tanjun

Length of output: 212


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="tests/unit/diagnostics/test_behavior_specs.py"

echo "=== Top of $FILE (cat -n 1..200) ==="
cat -n "$FILE" | sed -n '1,200p'

echo
echo "=== pytestmark occurrences ==="
rg -n "pytestmark\s*=" "$FILE" || true

echo
echo "=== async defs in file ==="
rg -n "async\s+def\s+" "$FILE" || true

Repository: TanjunBot/new_tanjun

Length of output: 1817


Scope pytest.mark.asyncio to async tests only
The module-level pytestmark = pytest.mark.asyncio applies to test_discovers_many_behavior_specs (sync, line 12). Add the marker only to test_run_spec_handles_unknown_spec (async).

🔧 Proposed fix
-pytestmark = pytest.mark.asyncio
-
-
 def test_discovers_many_behavior_specs() -> None:
+@pytest.mark.asyncio
 async def test_run_spec_handles_unknown_spec() -> None:
🧰 Tools
🪛 GitHub Actions: Test / 1_unit-and-mock.txt

[warning] 12-12: PytestWarning: test marked with @pytest.mark.asyncio but it is not an async function (remove asyncio mark).

🪛 GitHub Actions: Test / unit-and-mock

[warning] 12-12: PytestWarning: Test marked with '@pytest.mark.asyncio' but is not an async function; remove asyncio mark.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/diagnostics/test_behavior_specs.py` around lines 9 - 13, The
module-level pytestmark applies asyncio to all tests including the synchronous
test_discovers_many_behavior_specs; remove the module-level pytestmark and
instead add the pytest.mark.asyncio decorator only to the asynchronous test
function test_run_spec_handles_unknown_spec so that
test_discovers_many_behavior_specs remains a normal sync test and only
test_run_spec_handles_unknown_spec runs under the asyncio marker.

Comment thread tests/unit/diagnostics/test_behavior_specs.py Outdated
EntchenEric and others added 3 commits June 1, 2026 09:18
Guard issubclass when app_commands.Group is mocked in parallel unit runs,
skip discovery count test in that environment, and assert run_all instead
of parent message edit when DiagnosticsRunner is mocked.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
diagnostics/runner.py (3)

136-156: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Phase C does not update parent status message after completion.

Phases A and B call _update_parent after appending results (lines 103, 134), but Phase C omits this call. Users won't see progress confirmation for the background loops phase.

🔄 Proposed fix
     self.summary.phases.append(phase)
+    await self._update_parent("Bot Diagnostics", f"Phase C complete — {phase.failed} failed")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/runner.py` around lines 136 - 156, The Phase C routine in
_run_phase_c_loops builds a PhaseResult and appends it to self.summary.phases
but does not call self._update_parent() afterwards like Phases A and B do;
update _run_phase_c_loops (after self.summary.phases.append(phase)) to call
self._update_parent() so the parent status message is refreshed after completion
of the background loops phase (reference PhaseResult, _run_phase_c_loops, and
self._update_parent).

157-190: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Phase D does not update parent status message after completion.

Phase D omits the _update_parent call that Phases A and B include, leaving the parent message stale until Phase E begins.

🔄 Proposed fix
     self.summary.phases.append(phase)
+    await self._update_parent("Bot Diagnostics", f"Phase D complete — {phase.failed} failed")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/runner.py` around lines 157 - 190, The Phase D runner
(_run_phase_d_tree) creates and appends a PhaseResult but never refreshes the
parent status UI; add a call to self._update_parent(...) after appending phase
to self.summary.phases (same placement other phases use) so the parent message
reflects Phase D completion and its outcomes; ensure you pass the same arguments
pattern used by _run_phase_a_*/_run_phase_b_* (i.e., update the parent message
after self.summary.phases.append(phase)).

192-203: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Phase F does not update parent status message after completion.

Phase F omits the _update_parent call, so the user won't see confirmation that extension checks completed before Phase E starts.

🔄 Proposed fix
     self.summary.phases.append(phase)
+    await self._update_parent("Bot Diagnostics", f"Phase F complete — {phase.failed} failed")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/runner.py` around lines 192 - 203, The Phase F handler
_run_phase_f_extensions currently builds PhaseResult("F", ...) and appends it to
self.summary.phases but never updates the parent status; modify
_run_phase_f_extensions to call the parent-update helper (await
self._update_parent(phase)) after populating phase.outcomes and before appending
to self.summary.phases (or immediately after appending) so the UI/parent status
is updated to reflect completion of Phase F; reference the PhaseResult instance
named phase and the helper method _update_parent in your change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diagnostics/discovery.py`:
- Around line 56-57: Keep the runtime guard that checks "if not
isinstance(group_base, type): return classes" but suppress mypy's
--warn-unreachable for that return; specifically, add a mypy suppression comment
on the return statement (targeting the unreachable warning) so the runtime guard
remains while static analysis ignores the unreachable-return warning for the
Group runtime guard involving the variable "group_base".

In `@diagnostics/prefix_skips.py`:
- Line 10: Remove the redundant .keys() call when building the frozenset:
replace the expression that constructs PREFIX_COMMANDS_EXCLUDED from
PREFIX_SKIP_REASONS.keys() so it instead constructs the frozenset directly from
PREFIX_SKIP_REASONS (i.e., use frozenset(PREFIX_SKIP_REASONS)); update the
assignment to PREFIX_COMMANDS_EXCLUDED accordingly, leaving PREFIX_SKIP_REASONS
and the frozenset usage unchanged.

In `@diagnostics/registry.py`:
- Line 50: The call to _resolve_kwargs currently sits before the try in run_spec
so exceptions from extra() or build_kwargs_for_handler(handler) escape; move the
extra_kwargs = _resolve_kwargs(spec, handler) invocation inside the existing try
block in run_spec and catch any exception from _resolve_kwargs, converting it
into a failed CheckOutcome (or appropriate failure path used elsewhere) so
run_spec never raises; reference _resolve_kwargs, run_spec, extra(), and
build_kwargs_for_handler(handler) when making the change.

In `@diagnostics/specs/_helpers.py`:
- Around line 38-41: default_role_kwargs currently imports the private
_make_role from diagnostics.kwargs_defaults; instead add a public factory (e.g.,
make_role) and use that export. Create a public function make_role either in
diagnostics/mocks.py or export it from diagnostics.kwargs_defaults (rename or
wrap _make_role), then update default_role_kwargs to return {"role":
make_role()} and remove the private _make_role import; reference functions
default_role_kwargs, _make_role, and the new make_role in your change.

In `@diagnostics/tree.py`:
- Line 35: The return value of json.loads(MANIFEST_PATH.read_text(...)) needs an
explicit cast to dict[str, Any] to satisfy mypy; import cast (from typing import
cast) and change the return to return cast(dict[str, Any],
json.loads(MANIFEST_PATH.read_text(encoding="utf-8"))) (ensure Any is available
from typing if not already imported).

---

Outside diff comments:
In `@diagnostics/runner.py`:
- Around line 136-156: The Phase C routine in _run_phase_c_loops builds a
PhaseResult and appends it to self.summary.phases but does not call
self._update_parent() afterwards like Phases A and B do; update
_run_phase_c_loops (after self.summary.phases.append(phase)) to call
self._update_parent() so the parent status message is refreshed after completion
of the background loops phase (reference PhaseResult, _run_phase_c_loops, and
self._update_parent).
- Around line 157-190: The Phase D runner (_run_phase_d_tree) creates and
appends a PhaseResult but never refreshes the parent status UI; add a call to
self._update_parent(...) after appending phase to self.summary.phases (same
placement other phases use) so the parent message reflects Phase D completion
and its outcomes; ensure you pass the same arguments pattern used by
_run_phase_a_*/_run_phase_b_* (i.e., update the parent message after
self.summary.phases.append(phase)).
- Around line 192-203: The Phase F handler _run_phase_f_extensions currently
builds PhaseResult("F", ...) and appends it to self.summary.phases but never
updates the parent status; modify _run_phase_f_extensions to call the
parent-update helper (await self._update_parent(phase)) after populating
phase.outcomes and before appending to self.summary.phases (or immediately after
appending) so the UI/parent status is updated to reflect completion of Phase F;
reference the PhaseResult instance named phase and the helper method
_update_parent in your change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b1cb9f14-d485-4881-b3ed-060e3ad94baa

📥 Commits

Reviewing files that changed from the base of the PR and between 55b6e2f and eb1fafb.

📒 Files selected for processing (14)
  • diagnostics/discovery.py
  • diagnostics/infra_checks.py
  • diagnostics/kwargs_defaults.py
  • diagnostics/patches.py
  • diagnostics/prefix_skips.py
  • diagnostics/registry.py
  • diagnostics/runner.py
  • diagnostics/specs/_helpers.py
  • diagnostics/specs/utility.py
  • diagnostics/tree.py
  • extensions/administration.py
  • scripts/check_diagnostics_manifest.py
  • tests/integration/extensions/test_administration_comprehensive.py
  • tests/unit/diagnostics/test_behavior_specs.py
💤 Files with no reviewable changes (1)
  • diagnostics/kwargs_defaults.py

Comment thread diagnostics/discovery.py
Comment on lines +56 to +57
if not isinstance(group_base, type):
return classes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

mypy unreachable error how to suppress with type ignore unreachable comment

💡 Result:

To suppress the Mypy unreachable code error, you can use the following inline comment on the line causing the error: # type: ignore[unreachable] This suppresses the specific [unreachable] error code reported by Mypy when the --warn-unreachable configuration flag is enabled [1][2]. If you are encountering this error, note that Mypy's behavior is intentional: when Mypy determines a code block is unreachable (e.g., after a return statement, an infinite loop, or based on narrowed types), it intentionally skips type checking for that block [3][4]. A simple type ignore does not change how Mypy analyzes the control flow, so you may occasionally need to ignore subsequent lines within the same unreachable block if they also trigger errors [5]. If you believe the code should be reachable, consider these alternatives: * Use typing.cast to help Mypy understand the types involved, which may resolve false-positive reachability detections [5]. * Utilize TypeGuard (PEP 647) to explicitly define how your types should be narrowed [6]. * Check if the logic relies on conditions that Mypy cannot evaluate, such as complex assertions or external state [3]. You can globally disable this warning by adding # mypy: disable-error-code="unreachable" to the top of the file [3][7].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and display the relevant section of the file (with line numbers)
python3 - <<'PY'
import pathlib
path = pathlib.Path("diagnostics/discovery.py")
text = path.read_text().splitlines()
for i in range(1, len(text)+1):
    if 40 <= i <= 90:
        print(f"{i:4d}: {text[i-1]}")
PY

# Show any mypy config files (if present) and relevant settings for unreachable warnings
ls -la | sed -n '1,120p' || true
find . -maxdepth 3 -type f \( -name "mypy.ini" -o -name "pyproject.toml" -o -name "setup.cfg" -o -name "tox.ini" \) -print

# If pyproject.toml exists, print mypy section quickly
if [ -f pyproject.toml ]; then
  python3 - <<'PY'
import tomllib, pathlib
p = pathlib.Path("pyproject.toml")
data = tomllib.loads(p.read_text())
def find_mypy(obj):
    if isinstance(obj, dict):
        for k,v in obj.items():
            if k == "mypy":
                print("Found [tool.mypy] in pyproject.toml:")
                print(v)
            else:
                find_mypy(v)
    elif isinstance(obj, list):
        for x in obj: find_mypy(x)
find_mypy(data)
PY
fi

# Search for existing ignore-unreachable patterns in the repo
rg -n "#\s*type:\s*ignore\[unreachable\]" -S . || true
rg -n "warn-unreachable|unreachable" -S pyproject.toml mypy.ini setup.cfg tox.ini 2>/dev/null || true

Repository: TanjunBot/new_tanjun

Length of output: 8779


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for mypy invocations / flags in CI and scripts
rg -n "mypy" .github -S || true
rg -n "warn-unreachable|warn_unreachable|unreachable" .github -S || true
rg -n "mypy.*(warn|unreachable)|--warn-unreachable" . -S || true

# Show the workflows that likely run type-check
ls -la .github/workflows || true
rg -n "type-?check|lint|mypy" .github/workflows -S || true

Repository: TanjunBot/new_tanjun

Length of output: 19581


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the exact mypy invocation flags in type_checking.yml (where unreachable warnings likely get enabled)
python3 - <<'PY'
from pathlib import Path
p = Path(".github/workflows/type_checking.yml")
lines = p.read_text().splitlines()
for i in range(60, 115):
    if 1 <= i <= len(lines):
        print(f"{i:4d}: {lines[i-1]}")
PY

# Also show the mypy invocation in ci.yml for comparison
python3 - <<'PY'
from pathlib import Path
p = Path(".github/workflows/ci.yml")
lines = p.read_text().splitlines()
for i in range(50, 90):
    if 1 <= i <= len(lines):
        print(f"{i:4d}: {lines[i-1]}")
PY

Repository: TanjunBot/new_tanjun

Length of output: 4301


Suppress mypy unreachable for the Group runtime guard

CI runs mypy with --warn-unreachable, so this branch is treated as statically unreachable (return at line 57), even though the runtime guard may matter if app_commands.Group is mocked. Keep the check and suppress the specific unreachable warning on the return statement.

🔧 Proposed fix
     group_base = app_commands.Group
     if not isinstance(group_base, type):
-        return classes
+        return classes  # type: ignore[unreachable]
🧰 Tools
🪛 GitHub Actions: Python Type Checking with Issue Management / 0_type-check (3.12).txt

[error] 57-57: mypy: Statement is unreachable [unreachable]

🪛 GitHub Actions: Python Type Checking with Issue Management / type-check (3.12)

[error] 57-57: mypy: Statement is unreachable [unreachable]

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/discovery.py` around lines 56 - 57, Keep the runtime guard that
checks "if not isinstance(group_base, type): return classes" but suppress mypy's
--warn-unreachable for that return; specifically, add a mypy suppression comment
on the return statement (targeting the unreachable warning) so the runtime guard
remains while static analysis ignores the unreachable-return warning for the
Group runtime guard involving the variable "group_base".

"update": "Triggers bot restart endpoint",
}

PREFIX_COMMANDS_EXCLUDED = frozenset(PREFIX_SKIP_REASONS.keys())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Optional: .keys() is redundant when constructing frozenset from dict.

frozenset(dict) automatically iterates over the dictionary's keys, so the .keys() call can be omitted for brevity.

♻️ Proposed simplification
-PREFIX_COMMANDS_EXCLUDED = frozenset(PREFIX_SKIP_REASONS.keys())
+PREFIX_COMMANDS_EXCLUDED = frozenset(PREFIX_SKIP_REASONS)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PREFIX_COMMANDS_EXCLUDED = frozenset(PREFIX_SKIP_REASONS.keys())
PREFIX_COMMANDS_EXCLUDED = frozenset(PREFIX_SKIP_REASONS)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/prefix_skips.py` at line 10, Remove the redundant .keys() call
when building the frozenset: replace the expression that constructs
PREFIX_COMMANDS_EXCLUDED from PREFIX_SKIP_REASONS.keys() so it instead
constructs the frozenset directly from PREFIX_SKIP_REASONS (i.e., use
frozenset(PREFIX_SKIP_REASONS)); update the assignment to
PREFIX_COMMANDS_EXCLUDED accordingly, leaving PREFIX_SKIP_REASONS and the
frozenset usage unchanged.

Comment thread diagnostics/registry.py
Comment on lines +38 to +41
def default_role_kwargs() -> dict[str, Any]:
from diagnostics.kwargs_defaults import _make_role

return {"role": _make_role()}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm _make_role is defined/exported and check for a public role mock factory
rg -nP '\bdef\s+_?make_role\b' diagnostics
rg -nP '\bmake_role\b' diagnostics/mocks.py

Repository: TanjunBot/new_tanjun

Length of output: 123


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show _make_role definition and any nearby exports/usages
echo "=== diagnostics/kwargs_defaults.py (around _make_role) ==="
nl -ba diagnostics/kwargs_defaults.py | sed -n '60,110p'

# Confirm whether diagnostics/mocks.py exists and search for any public role factory
echo "=== diagnostics/mocks.py search for make_role ==="
ls -la diagnostics/mocks.py || true
rg -nP '\bmake_role\b' diagnostics/mocks.py || true

echo "=== diagnostics search for public role factory patterns ==="
rg -nP '\b(make_role|role.*make_)\b' diagnostics || true

echo "=== list other public make_* factories in diagnostics/mocks.py (context) ==="
rg -nP '\bdef\s+make_[a-zA-Z0-9_]+\b' diagnostics/mocks.py || true

Repository: TanjunBot/new_tanjun

Length of output: 184


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== ls diagnostics/ ==="
ls -la diagnostics || true

echo "=== diagnostics/kwargs_defaults.py around _make_role ==="
# Use cat -n since nl isn't available
cat -n diagnostics/kwargs_defaults.py | sed -n '65,95p'

echo "=== check diagnostics/mocks.py for make_role ==="
ls -la diagnostics/mocks.py || true
if [[ -f diagnostics/mocks.py ]]; then
  rg -nP '\bmake_role\b' diagnostics/mocks.py || true
  rg -nP '\bdef\s+make_[a-zA-Z0-9_]+\b' diagnostics/mocks.py || true
fi

echo "=== search for public make_* role factory across diagnostics ==="
rg -nP '\bdef\s+make_role\b' diagnostics || true
rg -nP '\bmake_role\b' diagnostics || true

Repository: TanjunBot/new_tanjun

Length of output: 2709


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== diagnostics/specs/_helpers.py (lines around default_role_kwargs) ==="
cat -n diagnostics/specs/_helpers.py | sed -n '1,120p'

echo "=== find usages of _make_role ==="
rg -n "_make_role" diagnostics || true

echo "=== look for any public make_role/make_*role factories in diagnostics ==="
rg -nP '\bmake_[a-zA-Z0-9_]*role\b' diagnostics || true

Repository: TanjunBot/new_tanjun

Length of output: 2599


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== diagnostics/kwargs_defaults.py (head) ==="
cat -n diagnostics/kwargs_defaults.py | head -n 120

echo "=== diagnostics/mocks.py (head) ==="
cat -n diagnostics/mocks.py | head -n 220

echo "=== diagnostics/__init__.py ==="
cat -n diagnostics/__init__.py | head -n 200

echo "=== check whether kwargs_defaults exports _make_role or imports it elsewhere ==="
rg -n "from diagnostics\.kwargs_defaults import" diagnostics || true
rg -n "import diagnostics\.kwargs_defaults" diagnostics || true

Repository: TanjunBot/new_tanjun

Length of output: 8381


Avoid importing private _make_role from diagnostics.kwargs_defaults in default_role_kwargs

diagnostics/specs/_helpers.py (38-41) imports diagnostics.kwargs_defaults._make_role (defined in diagnostics/kwargs_defaults.py) just to build the "role" kwarg, but there’s no corresponding public role factory (e.g., make_role) in diagnostics/mocks.py. Add a public role factory and use that instead (either in diagnostics.mocks or as a public export from diagnostics.kwargs_defaults).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@diagnostics/specs/_helpers.py` around lines 38 - 41, default_role_kwargs
currently imports the private _make_role from diagnostics.kwargs_defaults;
instead add a public factory (e.g., make_role) and use that export. Create a
public function make_role either in diagnostics/mocks.py or export it from
diagnostics.kwargs_defaults (rename or wrap _make_role), then update
default_role_kwargs to return {"role": make_role()} and remove the private
_make_role import; reference functions default_role_kwargs, _make_role, and the
new make_role in your change.

Comment thread diagnostics/tree.py Outdated
EntchenEric and others added 3 commits June 1, 2026 09:32
Register behavior specs across all extensions, parametrize spec tests,
wire manifest CI gate, update e2e tree tests for admin/utility roots,
and remove legacy test_bot runtime helpers.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ne test

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diagnostics/specs/games.py`:
- Around line 11-18: The register_kwargs calls for games (functions
GameCommands.tic_tac_toe_cmd, .connect4_cmd, .akinator_cmd, .wordle_cmd,
.hangman_cmd, .rps_cmd, .battleship_cmd, .advanced_ttt_cmd) include unnecessary
"# type: ignore[no-untyped-call]" comments that cause mypy unused-ignore
failures; remove those trailing ignore comments from each register_kwargs(...)
invocation so the calls rely on the existing proper typing of register_kwargs
and no longer trigger the CI type-check error.
- Around line 8-9: The _member() function is annotated to return MagicMock but
MagicMock isn't imported for the type checker; add an annotation-only import so
the name is resolved: either add "from typing import TYPE_CHECKING" and inside
"if TYPE_CHECKING: from unittest.mock import MagicMock" or replace the return
annotation with the fully-qualified "unittest.mock.MagicMock"; target the
_member() declaration and the make_member() reference to ensure the type checker
recognizes MagicMock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 953da3ba-c08c-437e-b4a0-4e48d2c052b7

📥 Commits

Reviewing files that changed from the base of the PR and between eb1fafb and ea5f6a0.

📒 Files selected for processing (2)
  • diagnostics/specs/games.py
  • diagnostics/tree.py

Comment thread diagnostics/specs/games.py Outdated
Comment thread diagnostics/specs/games.py Outdated
@EntchenEric EntchenEric changed the title feat: deep test_bot diagnostics framework (WIP) feat: deep test_bot diagnostics framework Jun 1, 2026
@EntchenEric EntchenEric merged commit b4aa692 into development Jun 1, 2026
25 checks passed
@EntchenEric EntchenEric deleted the feat/deep-test-bot-diagnostics branch June 1, 2026 07:50
@coderabbitai coderabbitai Bot mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant