Automated Test Generation Agent Enhancements by OBenner · Pull Request #176 · OBenner/Auto-Coding

OBenner · 2026-03-23T09:29:15Z

Specialized agent that analyzes code and generates comprehensive unit tests, integration tests, and E2E tests with high coverage.

Summary by CodeRabbit

Release Notes

New Features
- Added edge-case detection for test generation, identifying error handling, boundary conditions, type validation, and assertions in code
- Introduced pytest fixture generation with validation and organization across conftest.py and fixture modules
- Enabled integration test generation for endpoints, services, and external dependencies
- Added integration test analysis to extract endpoint routes, database operations, and external service calls
Improvements
- Implemented 80% code coverage threshold enforcement with detailed gap analysis and prioritization
- Enhanced pytest validation to handle conftest.py files more gracefully

… API end Created IntegrationTestAnalyzer in apps/backend/analysis/ to detect: - API endpoints (FastAPI, Flask, Django routes) - Service classes and boundaries - Database operations (ORM queries) - External service calls (HTTP clients) This analyzer provides the foundation for integration test generation by identifying testable integration points in the codebase. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ration test guidelines Added comprehensive integration testing section with API endpoint testing patterns and service integration examples

…fixtures

…erator workflow

…hreshold

- Add detect_edge_cases_from_analysis() to extract edge cases from analysis results - Add _detect_edge_cases_from_ast() to detect edge cases in Python source code - Supports detection of: error handling, boundary conditions, type validation, error raising, assertions - Falls back to AST-based detection when edge_cases not in analysis results

- Fix pytest collection issue (pywin32 import error) - Move validate_platform_dependencies() call from module-level to main() - Prevents pytest collection from triggering SystemExit - Validation still runs when script is executed - Create missing test files for Phase 5 - tests/test_fixture_generation.py: Test fixture generator agent - tests/test_integration_test_generation.py: Test integration test analyzer - Follow patterns from test_test_generation.py QA Fix Session: 1 Signed-off-by: Test User <test@example.com>

Fixes: - IntegrationTestAnalyzer now detects SQLAlchemy 2.0 function calls (select(User), insert(User), etc.) by handling both ast.Attribute and ast.Name nodes in _extract_database_operations() - Added _extract_model_from_arg() helper to extract model names from function arguments - Fixture validation now allows conftest.py with no collected tests (fixtures-only files) - Fixed all test mocks to use run_generator_session instead of create_client (correct import path) Verified: - AST parsing correctly detects both node types - Helper method extracts model from function arguments - conftest.py validation exception in place - All test mocks updated to correct import path QA Fix Session: 2

Fixes: - Test mocks incorrectly configured (Issue 1) * Removed 5 incorrect mock patches from test_fixture_generation.py * Fixed run_generator_session and validate_fixture_files mocks to be async * All 13 fixture generation tests now pass (was 4/13) - Database operation classification bug (Issue 2) * Removed 'commit' from DB_QUERY_METHODS set * Commit is transaction control, not a query operation * test_classify_database_operations now passes - Async external call detection missing (Issue 3) * Added ast.Await node detection in _extract_external_services() * Fixed analyze_file() to check ast.AsyncFunctionDef (was only checking FunctionDef) * test_detect_async_external_calls now passes - End-to-end fixture generation test failing (Issue 4) * Fixed multiple mock issues (async functions, log_generator_result) * Added tests directory validation to fixture_generator.py * All test infrastructure now working correctly Test Results: - test_fixture_generation.py: 13/13 passing (100%) - was 4/13 (31%) - test_integration_test_generation.py: 30/30 passing (100%) - was 28/30 (93%) - test_test_generation.py: 27/27 passing (100%) - no regressions - Total: 70/70 tests passing (100%) - was 59/60 (98%) QA Fix Session: 4 Verified: All tests pass, no regressions

coderabbitai · 2026-03-23T09:29:31Z

Warning

Rate limit exceeded

@OBenner has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cc7b28fc-0398-4209-a051-d1f2f84c5d84

📥 Commits

Reviewing files that changed from the base of the PR and between d916d91 and b0552ea.

📒 Files selected for processing (7)

apps/backend/agents/fixture_generator.py
apps/backend/agents/test_generator.py
apps/backend/analysis/integration_test_analyzer.py
apps/backend/qa/loop.py
apps/backend/runners/ideation_runner.py
guides/TEST-GENERATION-AGENT.md
tests/test_fixture_generation.py

📝 Walkthrough

Walkthrough

Added comprehensive fixture generation infrastructure and integration test capabilities to the backend. New modules include a fixture generator with async orchestration, an AST-based integration test analyzer, and enhanced edge-case detection. Test generator coverage gap analysis now includes threshold enforcement and categorized gap reporting. Supporting prompt documentation and test coverage were added.

Changes

Cohort / File(s)	Summary
Fixture Generation `apps/backend/agents/fixture_generator.py`, `apps/backend/prompts/fixture_generator.md`, `tests/test_fixture_generation.py`	New async module orchestrating pytest fixture generation and validation; validates fixture files and generates fixtures via generator session; includes prompt guidance for fixture patterns and quality standards; comprehensive test suite covering validation and orchestration flows.
Edge-Case Detection and Coverage Analysis `apps/backend/agents/test_generator.py`	Added `detect_edge_cases_from_analysis()` and `_detect_edge_cases_from_ast()` for AST-based edge-case extraction; enhanced `analyze_coverage_gaps()` with `enforce_threshold` parameter and structured error returns; extended gap reporting with threshold validation, per-file coverage, and categorized priority lists (critical/high/medium); improved `format_coverage_gaps_prompt()` output.
Integration Test Generation `apps/backend/analysis/integration_test_analyzer.py`, `apps/backend/agents/test_generator.py`, `apps/backend/prompts/test_generator.md`, `tests/test_integration_test_generation.py`	New AST-based `IntegrationTestAnalyzer` module extracting endpoints, service classes, database operations, and external service calls from source code; added `generate_integration_tests()` async function for orchestration; extended test generator prompts with integration testing criteria, structure, templates, and best practices; test suite validating analyzer behavior across frameworks and code patterns.
QA Loop Coverage Enforcement `apps/backend/qa/loop.py`	Added `MINIMUM_COVERAGE_THRESHOLD` constant (80.0%); implemented coverage gate post-test-generation that rejects validation below threshold, generates `QA_FIX_REQUEST.md`, records critical iteration entry, and dispatches failure webhook.
Configuration and Validation `apps/backend/agents/_validation.py`, `apps/backend/runners/ideation_runner.py`	Updated pytest collection validation to treat `conftest.py` failures differently from other files; moved `validate_platform_dependencies()` invocation from import-time to runtime inside `main()`.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant gen as generate_fixtures()
    participant sess as run_generator_session()
    participant val as validate_fixture_files()
    participant vpt as validate_python_tests()
    participant log as log_generator_result()

    Client->>gen: project_dir, spec_dir, analysis_results
    gen->>gen: compute counts, build starting_message
    gen->>sess: invoke generator session
    sess->>sess: run LLM generation
    sess-->>gen: generated_files
    gen->>gen: scan tests/ for conftest.py & fixtures/
    gen->>val: fixture_files, project_dir
    val->>vpt: validate syntax & imports
    vpt-->>val: True/False
    val-->>gen: validation result
    gen->>log: record generation result
    log-->>gen: logged
    gen-->>Client: {generated_files, success, error, framework}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Automated Test Generation in QA Loop #40: Modifies apps/backend/agents/test_generator.py and adds public helper functions for test generation and analysis workflows.
Fix QA Rejection - Add Authentication and Test Coverage #11: Updates QA/test-generation related code including apps/backend/qa/loop.py and test generation modules.

Suggested labels

size/XL, area/backend

Poem

🐰 Fixtures and tests now hop with grace,
Edge cases found in every place,
Integration flows through AST trees,
Coverage thresholds set to ease,
One hop forward, nine tests deep! 🎯✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Automated Test Generation Agent Enhancements' accurately reflects the main changes: significant enhancements to test generation capabilities including integration test analysis, fixture generation, coverage enforcement, and edge-case detection.
Docstring Coverage	✅ Passed	Docstring coverage is 95.60% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch auto-claude/190-automated-test-generation-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 18

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/backend/agents/_validation.py`:
- Around line 69-73: The current validation lets any non-zero pytest exit code
pass for conftest.py; update the check around result.returncode and
file_path.name so that only exit code 5 (NO_TESTS_COLLECTED) is treated as
acceptable for conftest.py. Concretely, replace the unconditional "if
result.returncode != 0 and file_path.name != 'conftest.py':" logic with a branch
that: if file_path.name == 'conftest.py' then allow only result.returncode == 5,
otherwise require result.returncode == 0; keep the existing print_status,
logger.debug, and return False behavior when the check fails (use the same
variables result.returncode, result.stdout, result.stderr, test_file).

In `@apps/backend/agents/fixture_generator.py`:
- Around line 187-198: The current check for an empty fixture_files list returns
{"generated_files": [], "success": True, "error": None, "framework": framework}
which is ambiguous; update the return in the no-files branch inside the function
that contains the fixture_files check so it signals failure and conveys a
reason: set "success": False and populate "error" with a descriptive message
(e.g., "No fixture files generated; check agent output or tests/fixtures/") and
keep "generated_files": [] and "framework": framework; also ensure the printed
warning via print_status remains unchanged so callers and logs consistently
reflect the lack of output.
- Around line 216-224: The function currently ignores validation_success and
always returns "success": True; update the return to reflect validation by
setting "success" to validation_success (or validation_success and generation
checks) and populate "error" with a descriptive message when validation_success
is False (e.g., use the same message from print_status or a new error string) so
callers of the fixture generator see the real outcome; locate the block that
prints validation failure and the return dict (references: validation_success,
print_status, fixture_files, and the returned keys "generated_files", "success",
"error", "framework") and change the "success" and "error" values accordingly.

In `@apps/backend/agents/test_generator.py`:
- Around line 929-1055: The function _detect_edge_cases_from_ast is too
complex—extract each pattern branch into small helper detectors and dispatch to
them from the main loop; specifically create and call helpers like
_detect_try_except_patterns(node: ast.Try, file_path: str),
_detect_comparison_patterns(node: ast.Compare, file_path: str),
_detect_isinstance_patterns(node: ast.Call, file_path: str),
_detect_raise_patterns(node: ast.Raise, file_path: str), and
_detect_assertion_patterns(node: ast.Assert, file_path: str) that return lists
of edge-case dicts and move the corresponding logic (try/except handling,
None/numeric/empty comparisons, isinstance checks, raise parsing, and
assertions) into those helpers, then replace the in-loop branches in
_detect_edge_cases_from_ast with calls to extend the edge_cases list with each
helper's result.
- Around line 1247-1283: The large multiline string assigned to starting_message
should be moved out of the function into a reusable prompt template or
module-level constant (e.g., INTEGRATION_TEST_STARTING_MESSAGE) and
parameterized to accept the analysis JSON; update the function to build
starting_message by formatting that constant with json.dumps(analysis_results,
indent=2) (or passing analysis_json) so the function only composes the formatted
prompt, improving readability and maintainability while preserving the existing
prompt content and usage.
- Around line 308-452: The function format_coverage_gaps_prompt is too complex;
extract logical sections into small helpers to reduce cognitive complexity and
make maintenance easier: create helpers like
_format_priority_breakdown(gaps_summary, threshold) to build the "Gap
Prioritization" lines, _format_file_listing(gaps_summary, threshold,
max_files=15) to produce the detailed per-file block (including priority
indicator, coverage, and missing line ranges using the existing
_format_line_ranges), and optionally _format_statistics(gaps_summary, threshold)
and _format_action_items() for the stats and action items; then replace the
corresponding blocks in format_coverage_gaps_prompt with calls to these helpers
while preserving current output formatting and limits (e.g., max_files_to_show,
truncation of missing lines).
- Around line 888-900: The code mutates the analyzed_files list by checking "if
file_path not in analyzed_files" which is O(n) and can still allow duplicates;
change this to use a set for deduplication and O(1) membership checks: create a
set like seen = set(analyzed_files) (or seen = set() if analyzed_files may be
empty), add file paths from analysis_results.get("functions", []) and
analysis_results.get("classes", []) to seen via seen.add(file_path), then
replace analyzed_files with a list built from the set (e.g., analyzed_files =
list(seen)) so subsequent code uses the deduplicated file list while keeping
references to the variables analyzed_files, analysis_results, functions, and
classes intact.

In `@apps/backend/analysis/integration_test_analyzer.py`:
- Around line 751-760: The current mapping in the analyzer treats method_name
"save" as unconditionally "UPDATE", which is incorrect for ORMs like Django
where save() can INSERT or UPDATE; change the classification for "save" in the
branch handling method_name (the code that currently checks elif method_name in
{"update", "save"} returning "UPDATE") to a neutral/explicit upsert category
(e.g., "UPSERT" or "INSERT_OR_UPDATE") or to "UNKNOWN" if you prefer
conservative analysis; update any downstream consumers/tests that expect
"UPDATE" for save() to handle the new "UPSERT"/"INSERT_OR_UPDATE"/"UNKNOWN"
token accordingly and add a brief comment by the method_name mapping explaining
why save() is treated specially.
- Around line 312-319: The nested check for node.module inside the
ast.ImportFrom branch should be merged to reduce nesting: update the branch that
currently reads "elif isinstance(node, ast.ImportFrom):" to check both
conditions (isinstance(node, ast.ImportFrom) and node.module) so you can remove
the inner "if node.module:" and keep the existing startswith checks for
"fastapi", "flask", and "django" on node.module; adjust the branch in the
analyzer that handles ast.ImportFrom accordingly.
- Around line 546-560: Remove the development debug print statements and the
temporary sys import inside the ast.Await handling block: delete the print(...)
calls that emit "DEBUG: ..." and the import sys, and instead (if you need
runtime diagnostics) use the module logger via logger.debug(...) or
self.logger.debug(...) for non‑polluting debug output; target the block that
checks isinstance(child, ast.Await) and the subsequent branches that inspect
child.value, child.value.func, and compute method_name (references: child, call,
method_name, and self.HTTP_CLIENT_METHODS).
- Around line 519-594: The _extract_external_services method duplicates the
logic that builds ExternalServiceInfo for awaited and non-awaited HTTP calls;
extract a small helper (e.g. _build_external_call or
_create_external_service_info) that accepts parameters like call_node
(ast.Call), function_name, lineno, is_async flag and returns an
ExternalServiceInfo using self._extract_service_name and method name (from
call_node.func.attr) and HTTP_CLIENT_METHODS checks; then replace the duplicated
ExternalServiceInfo construction in both the ast.Await branch and the ast.Call
branch with calls to this helper, keeping the original is_async determination
(is_async = isinstance(node, ast.AsyncFunctionDef)) and preserving method
uppercasing and default service_name fallback.
- Around line 268-291: The traversal currently uses ast.walk(tree), which visits
every node including nested function/class definitions; replace it with
iterating only top-level children using ast.iter_child_nodes(tree) so that
methods like _extract_endpoint, _extract_database_operations,
_extract_external_services, and _extract_service operate only on top-level
FunctionDef/AsyncFunctionDef and ClassDef nodes (i.e., change the for node in
ast.walk(tree) loop to for node in ast.iter_child_nodes(tree) and keep the
existing isinstance checks and result.append/extend logic).

In `@apps/backend/qa/loop.py`:
- Around line 748-755: The filtering is mixing units: low_coverage_files
currently compares f.coverage_percentage to MINIMUM_COVERAGE_THRESHOLD but other
code (e.g., the conversion done when building coverage_report.files) suggests
coverage values may have been converted inconsistently; verify that
coverage_report.files entries expose coverage_percentage as a 0–100 value and,
if so, keep the comparison as-is, removing any earlier "* 100" conversion where
coverage_percentage is set (or alternatively convert MINIMUM_COVERAGE_THRESHOLD
to a 0–1 fraction if coverage_percentage is stored as 0–1); update the
construction code that sets coverage_percentage (the place that
multiplies/divides by 100) to be consistent with the comparison used in
low_coverage_files so all coverage values use the same units.
- Around line 757-765: The per-file coverage calculation in the
uncovered_files_list comprehension is multiplying
FileCoverage.coverage_percentage by 100 (variable uncovered_files_list and
low_coverage_files); remove the * 100 so you pass the existing percentage value
directly (i.e., use f.coverage_percentage instead of f.coverage_percentage *
100) and keep the other fields (lines_missed, lines_total, file_path) unchanged.
- Around line 714-720: The code in the QA loop multiplies
coverage_report.overall_coverage by 100 (creating coverage_percentage) but
overall_coverage is already a 0–100 percentage; remove the extra multiplication
and use coverage_report.overall_coverage directly when comparing to
MINIMUM_COVERAGE_THRESHOLD (i.e., set coverage_percentage =
coverage_report.overall_coverage or compare coverage_report.overall_coverage <
MINIMUM_COVERAGE_THRESHOLD), keeping the existing hasattr check for
overall_coverage.

In `@apps/backend/runners/ideation_runner.py`:
- Around line 29-32: The import block in ideation_runner.py is failing Ruff I001
because a module-level comment breaks the sorted imports; move the explanatory
comment currently above the import of validate_platform_dependencies into the
main() function right above the call to validate_platform_dependencies(), and
keep the import line (from core.dependency_validator import
validate_platform_dependencies) at the top of the file with the rest of the
sorted imports so import ordering remains correct.

In `@tests/test_fixture_generation.py`:
- Around line 73-116: The async test functions
(test_validate_fixture_files_valid_syntax,
test_validate_fixture_files_invalid_syntax,
test_validate_fixture_files_empty_list,
test_validate_fixture_files_missing_file) need to be marked for pytest-asyncio:
add `@pytest.mark.asyncio` above each async def (or add the same decorator to the
test class if present) and ensure pytest is imported in
tests/test_fixture_generation.py so pytest.mark is available; this will allow
validate_fixture_files coroutines to be awaited by pytest.
- Around line 132-140: The test defines an async mock function mock_run_session
(used to replace run_generator_session) but it's flagged because it doesn't
await anything; replace the manual async function with an AsyncMock to clearly
express an async stub: import AsyncMock from unittest.mock and assign
monkeypatch.setattr("agents.fixture_generator.run_generator_session",
AsyncMock(return_value={"success": True, "generated_files": [], "error":
None})); keep the same return payload so tests remain identical while avoiding
the static-analysis warning.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8bcf4716-b513-46c7-9d84-8413ad413e79

📥 Commits

Reviewing files that changed from the base of the PR and between aa0d6f5 and d916d91.

📒 Files selected for processing (11)

SUBTASK-4-2-SUMMARY.md
apps/backend/agents/_validation.py
apps/backend/agents/fixture_generator.py
apps/backend/agents/test_generator.py
apps/backend/analysis/integration_test_analyzer.py
apps/backend/prompts/fixture_generator.md
apps/backend/prompts/test_generator.md
apps/backend/qa/loop.py
apps/backend/runners/ideation_runner.py
tests/test_fixture_generation.py
tests/test_integration_test_generation.py

coderabbitai · 2026-03-23T09:36:39Z

+            # Allow conftest.py with no collected tests (fixtures only)
+            if result.returncode != 0 and file_path.name != "conftest.py":
                print_status(f"pytest collection failed for {test_file}", "error")
                logger.debug(f"pytest output: {result.stdout}\n{result.stderr}")
                return False


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify pytest exit codes for different scenarios echo "Checking pytest documentation for exit codes..." python -c "import pytest; help(pytest.ExitCode)" 2>/dev/null || echo "pytest.ExitCode not directly accessible" # Check if exit code 5 is NO_TESTS_COLLECTED python -c "import pytest; print('NO_TESTS_COLLECTED:', pytest.ExitCode.NO_TESTS_COLLECTED)" 2>/dev/null || echo "Cannot verify exit code enum"

Repository: OBenner/Auto-Coding

Length of output: 9597

🏁 Script executed:

# First, read the file to see the actual code and broader context cat -n apps/backend/agents/_validation.py | head -90 | tail -40

Repository: OBenner/Auto-Coding

Length of output: 2123

🏁 Script executed:

# Also check what command is being run (to see if it's --collect-only) grep -n "collect-only\|--collect" apps/backend/agents/_validation.py

Repository: OBenner/Auto-Coding

Length of output: 207

🏁 Script executed:

# Check for any test files or examples with conftest.py in the project find apps/backend -name "conftest.py" | head -5

Repository: OBenner/Auto-Coding

Length of output: 93

🏁 Script executed:

# Check what's in the project's conftest.py head -50 apps/backend/tests/conftest.py

Repository: OBenner/Auto-Coding

Length of output: 1388

🏁 Script executed:

# Test what exit code pytest returns for conftest.py with only fixtures cd apps/backend && python -c " import subprocess import tempfile import os from pathlib import Path # Create a temporary conftest.py with only fixtures with tempfile.TemporaryDirectory() as tmpdir: conftest_path = Path(tmpdir) / 'conftest.py' conftest_path.write_text(''' import pytest `@pytest.fixture` def sample_fixture(): return 42 ''') result = subprocess.run( ['pytest', str(conftest_path), '--collect-only', '-q'], capture_output=True, text=True, timeout=10 ) print(f'Exit code for fixture-only conftest.py: {result.returncode}') print(f'Stdout: {result.stdout}') print(f'Stderr: {result.stderr}') "

Repository: OBenner/Auto-Coding

Length of output: 619

🏁 Script executed:

# Test what exit code pytest returns for conftest.py with import error cd apps/backend && python -c " import subprocess import tempfile from pathlib import Path with tempfile.TemporaryDirectory() as tmpdir: conftest_path = Path(tmpdir) / 'conftest.py' conftest_path.write_text(''' import nonexistent_module `@pytest.fixture` def sample_fixture(): return 42 ''') result = subprocess.run( ['pytest', str(conftest_path), '--collect-only', '-q'], capture_output=True, text=True, timeout=10 ) print(f'Exit code for conftest.py with import error: {result.returncode}') print(f'Stdout: {result.stdout}') print(f'Stderr (first 500 chars): {result.stderr[:500]}') "

Repository: OBenner/Auto-Coding

Length of output: 619

Check specifically for pytest exit code 5 (NO_TESTS_COLLECTED) when validating conftest.py instead of allowing all non-zero codes.

The current logic permits any non-zero exit code for conftest.py files. This means a conftest.py with import errors or issues caught by pytest (exit codes 3, 4, etc.) would pass validation silently. Exit code 5 specifically indicates "no tests collected," which is the expected behavior for fixture-only conftest.py files. Check for that exit code explicitly:

Proposed refinement

# Allow conftest.py with no collected tests (fixtures only) - if result.returncode != 0 and file_path.name != "conftest.py": + # Exit code 5 = no tests collected (expected for fixture-only conftest.py) + # Other non-zero codes indicate actual errors + is_no_tests_collected = result.returncode == 5 + if result.returncode != 0 and not (file_path.name == "conftest.py" and is_no_tests_collected): print_status(f"pytest collection failed for {test_file}", "error") logger.debug(f"pytest output: {result.stdout}\n{result.stderr}") return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/backend/agents/_validation.py` around lines 69 - 73, The current validation lets any non-zero pytest exit code pass for conftest.py; update the check around result.returncode and file_path.name so that only exit code 5 (NO_TESTS_COLLECTED) is treated as acceptable for conftest.py. Concretely, replace the unconditional "if result.returncode != 0 and file_path.name != 'conftest.py':" logic with a branch that: if file_path.name == 'conftest.py' then allow only result.returncode == 5, otherwise require result.returncode == 0; keep the existing print_status, logger.debug, and return False behavior when the check fails (use the same variables result.returncode, result.stdout, result.stderr, test_file).

coderabbitai · 2026-03-23T09:36:39Z

 def format_coverage_gaps_prompt(gaps_summary: dict[str, Any]) -> str:
    """
-    Format coverage gaps as a prompt for the AI agent.
+    Format coverage gaps as a detailed prompt for the AI agent.
+
+    This function creates a comprehensive, prioritized report of coverage gaps
+    to guide test generation. It categorizes gaps by priority and provides
+    specific line numbers needing coverage.

    Args:
-        gaps_summary: Summary from analyze_coverage_gaps
+        gaps_summary: Summary from analyze_coverage_gaps with detailed metrics

    Returns:
-        Formatted string describing coverage gaps
+        Formatted string describing coverage gaps with prioritization
    """
    if "error" in gaps_summary:
-        return f"Coverage analysis failed: {gaps_summary['error']}"
+        return f"## Coverage Analysis Failed\n\n{gaps_summary['error']}"

-    if not gaps_summary.get("files_with_gaps"):
-        return "Coverage meets all thresholds. No gaps detected."
+    # Get threshold information
+    threshold = gaps_summary.get("threshold_percent", 80.0)
+    total_coverage = gaps_summary["total_coverage"]

    lines = []
-    lines.append("## Coverage Gaps Detected")
+    lines.append("## Coverage Gap Analysis")
    lines.append("")
-    lines.append(f"**Total Coverage:** {gaps_summary['total_coverage']:.1f}%")
-    lines.append(f"**Files with Gaps:** {len(gaps_summary['files_with_gaps'])}")
+
+    # Coverage status header
+    if gaps_summary.get("meets_threshold", False):
+        lines.append("✓ **Status:** Coverage meets threshold")
+        lines.append(
+            f"**Total Coverage:** {total_coverage:.1f}% (required: {threshold:.1f}%)"
+        )
+    else:
+        lines.append("⚠ **Status:** Coverage below threshold")
+        lines.append(
+            f"**Total Coverage:** {total_coverage:.1f}% (required: {threshold:.1f}%)"
+        )
+        lines.append(f"**Deficit:** {threshold - total_coverage:.1f}%")
+
+    # Statistics
    lines.append("")
+    lines.append("### Statistics")
+    lines.append(f"- **Files Analyzed:** {gaps_summary.get('total_files_analyzed', 0)}")
+    lines.append(
+        f"- **Files with Gaps:** {len(gaps_summary.get('files_with_gaps', []))}"
+    )
+    lines.append(
+        f"- **Total Missing Lines:** {gaps_summary.get('total_lines_missing', 0)}"
+    )

-    if gaps_summary.get("critical_gaps"):
-        lines.append("### Critical Gaps (Significantly Below Threshold)")
-        for file_path in gaps_summary["critical_gaps"][:5]:
-            lines.append(f"- {file_path}")
+    # Priority breakdown
+    if gaps_summary.get("files_with_gaps"):
        lines.append("")
+        lines.append("### Gap Prioritization")

-    lines.append("### Files Requiring Additional Tests")
-    for file_path in gaps_summary["files_with_gaps"][:10]:
-        missing_lines = gaps_summary["missing_lines_by_file"].get(file_path, [])
-        if missing_lines:
-            line_ranges = _format_line_ranges(missing_lines[:20])
-            lines.append(f"- **{file_path}**")
-            lines.append(f"  - Missing lines: {line_ranges}")
-        else:
-            lines.append(f"- **{file_path}**")
+        if gaps_summary.get("critical_gaps"):
+            critical_count = len(gaps_summary["critical_gaps"])
+            lines.append(
+                f"- **🔴 Critical:** {critical_count} file(s) with coverage < {threshold * 0.5:.1f}%"
+            )

-    if len(gaps_summary["files_with_gaps"]) > 10:
-        lines.append(
-            f"- ... and {len(gaps_summary['files_with_gaps']) - 10} more files"
-        )
+        if gaps_summary.get("high_priority_gaps"):
+            high_count = len(gaps_summary["high_priority_gaps"])
+            lines.append(
+                f"- **🟠 High Priority:** {high_count} file(s) with coverage {threshold * 0.5:.1f}%-{threshold * 0.8:.1f}%"
+            )
+
+        if gaps_summary.get("medium_priority_gaps"):
+            medium_count = len(gaps_summary["medium_priority_gaps"])
+            lines.append(
+                f"- **🟡 Medium Priority:** {medium_count} file(s) with coverage {threshold * 0.8:.1f}%-{threshold:.1f}%"
+            )
+
+    # Detailed file listing by priority
+    if gaps_summary.get("files_with_gaps"):
+        lines.append("")
+        lines.append("### Files Requiring Additional Tests")
+        lines.append("")
+        lines.append("*Files are listed by priority (critical → high → medium)*")
+        lines.append("")
+
+        # Show files with their coverage and missing lines
+        max_files_to_show = 15
+        shown_count = 0
+
+        for file_path in gaps_summary["files_with_gaps"]:
+            if shown_count >= max_files_to_show:
+                break
+
+            coverage = gaps_summary["coverage_by_file"].get(file_path, 0.0)
+            missing_lines = gaps_summary["missing_lines_by_file"].get(file_path, [])
+
+            # Determine priority indicator
+            if file_path in gaps_summary.get("critical_gaps", []):
+                priority = "🔴"
+            elif file_path in gaps_summary.get("high_priority_gaps", []):
+                priority = "🟠"
+            else:
+                priority = "🟡"
+
+            lines.append(f"{priority} **{file_path}**")
+            lines.append(
+                f"   - Coverage: {coverage:.1f}% (threshold: {threshold:.1f}%)"
+            )
+
+            if missing_lines:
+                # Show line ranges (limit to first 30 lines to keep prompt manageable)
+                line_ranges = _format_line_ranges(missing_lines[:30])
+                lines.append(f"   - Missing lines: {line_ranges}")

+                if len(missing_lines) > 30:
+                    lines.append(f"   - ... and {len(missing_lines) - 30} more lines")
+
+            shown_count += 1
+
+        # Show count of remaining files
+        remaining_files = len(gaps_summary["files_with_gaps"]) - shown_count
+        if remaining_files > 0:
+            lines.append("")
+            lines.append(f"... and {remaining_files} more file(s) with coverage gaps")
+
+    # Action items
+    lines.append("")
+    lines.append("### Action Items")
    lines.append("")
    lines.append(
-        "**Action Required:** Generate additional tests to cover the missing lines above."
+        "1. **Prioritize Critical Gaps:** Start with files marked 🔴 (lowest coverage)"
+    )
+    lines.append(
+        "2. **Target Missing Lines:** Write tests specifically for the missing line numbers"
+    )
+    lines.append(
+        "3. **Focus on High Impact:** Address high-priority gaps (🟠) before medium (🟡)"
+    )
+    lines.append(
+        "4. **Verify Coverage:** Run tests and re-check coverage after each batch"
+    )
+    lines.append("")
+    lines.append(
+        "**Note:** Coverage is measured at the line level. Ensure tests execute all"
+    )
+    lines.append(
+        "the missing lines listed above, including edge cases and error paths."
    )
-    lines.append("Focus on the specific line numbers that are not covered.")

    return "\n".join(lines)


🧹 Nitpick | 🔵 Trivial

Consider extracting helper functions to reduce cognitive complexity.

The function has cognitive complexity of 30 (threshold: 15), though this is somewhat expected for a formatting function that builds a detailed report. The linear structure (appending to a list) makes it readable despite the complexity. Consider extracting sections into helper functions if this file needs further maintenance.

♻️ Optional: Extract helper functions

def _format_priority_breakdown(gaps_summary: dict, threshold: float) -> list[str]: """Format the gap prioritization section.""" lines = [] if gaps_summary.get("critical_gaps"): critical_count = len(gaps_summary["critical_gaps"]) lines.append(f"- **🔴 Critical:** {critical_count} file(s) with coverage < {threshold * 0.5:.1f}%") # ... etc return lines def _format_file_listing(gaps_summary: dict, threshold: float, max_files: int = 15) -> list[str]: """Format the detailed file listing section.""" # ... extraction of lines 388-426 pass

🧰 Tools

🪛 GitHub Check: SonarCloud Code Analysis

[failure] 308-308: Refactor this function to reduce its Cognitive Complexity from 30 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=OBenner_Auto-Coding&issues=AZ0aCGR0hy_zYS_iqBkq&open=AZ0aCGR0hy_zYS_iqBkq&pullRequest=176

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/backend/agents/test_generator.py` around lines 308 - 452, The function format_coverage_gaps_prompt is too complex; extract logical sections into small helpers to reduce cognitive complexity and make maintenance easier: create helpers like _format_priority_breakdown(gaps_summary, threshold) to build the "Gap Prioritization" lines, _format_file_listing(gaps_summary, threshold, max_files=15) to produce the detailed per-file block (including priority indicator, coverage, and missing line ranges using the existing _format_line_ranges), and optionally _format_statistics(gaps_summary, threshold) and _format_action_items() for the stats and action items; then replace the corresponding blocks in format_coverage_gaps_prompt with calls to these helpers while preserving current output formatting and limits (e.g., max_files_to_show, truncation of missing lines).

coderabbitai · 2026-03-23T09:36:40Z

+        async def mock_run_session(*args, **kwargs):
+            return {
+                "success": True,
+                "generated_files": [],
+                "error": None
+            }
+        monkeypatch.setattr(
+            "agents.fixture_generator.run_generator_session",
+            mock_run_session


🧹 Nitpick | 🔵 Trivial

Async mock functions without await are intentional but could be simplified.

The static analysis flags these async functions for not using await. While this is intentional (the mocks need to be async to match the signature of run_generator_session), consider using a simpler approach with AsyncMock to make the intent clearer and avoid the warning.

♻️ Optional: Use AsyncMock for cleaner mock definition

`@pytest.fixture`(autouse=True) def _mock_fixture_generator_deps(self, monkeypatch): """Apply common monkeypatches for all fixture generator tests.""" - # Mock run_generator_session instead of create_client - # (fixture_generator imports run_generator_session, not create_client directly) - async def mock_run_session(*args, **kwargs): - return { - "success": True, - "generated_files": [], - "error": None - } monkeypatch.setattr( "agents.fixture_generator.run_generator_session", - mock_run_session + AsyncMock(return_value={ + "success": True, + "generated_files": [], + "error": None + }) )

🧰 Tools

🪛 GitHub Check: SonarCloud Code Analysis

[warning] 132-132: Use asynchronous features in this function or remove the async keyword.

See more on https://sonarcloud.io/project/issues?id=OBenner_Auto-Coding&issues=AZ0aCGSjhy_zYS_iqBku&open=AZ0aCGSjhy_zYS_iqBku&pullRequest=176

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_fixture_generation.py` around lines 132 - 140, The test defines an async mock function mock_run_session (used to replace run_generator_session) but it's flagged because it doesn't await anything; replace the manual async function with an AsyncMock to clearly express an async stub: import AsyncMock from unittest.mock and assign monkeypatch.setattr("agents.fixture_generator.run_generator_session", AsyncMock(return_value={"success": True, "generated_files": [], "error": None})); keep the same return payload so tests remain identical while avoiding the static-analysis warning.

- Fix critical double coverage percentage conversion in qa/loop.py (values were already 0-100, multiplying by 100 gave 7500% instead of 75%) - Remove debug print statements from integration_test_analyzer.py - Fix fixture_generator.py: propagate validation failure to success field, return success=False when no fixture files generated - Fix Ruff I001: unsorted imports in ideation_runner.py - Refactor _detect_edge_cases_from_ast into separate detector functions to reduce cognitive complexity from 48 to manageable helpers - Extract _try_create_http_service_info helper to deduplicate ExternalServiceInfo creation in _extract_external_services - Merge nested if statements in _detect_framework for readability - Use set for O(1) deduplication of analyzed_files in edge case detection - Classify save() as UPSERT instead of UPDATE (can be INSERT or UPDATE) - Extract integration test starting_message to module-level constant - Add @pytest.mark.asyncio to TestFixtureValidation async test methods - Update test assertions to match corrected fixture_generator behavior - Remove SUBTASK-4-2-SUMMARY.md development artifact - Add TEST-GENERATION-AGENT.md documentation guide - Merge latest develop (vite 8.0.1 upgrade, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Initialize captured_message as empty string instead of None to satisfy SonarCloud static analysis. The `in` operator requires a type that supports the membership protocol, and SonarCloud cannot trace nonlocal reassignment through monkeypatched mock callbacks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-23T18:57:58Z

Quality Gate passed

Issues
13 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.1% Duplication on New Code

See analysis details on SonarQube Cloud

Test User and others added 11 commits March 5, 2026 10:19

auto-claude: subtask-1-3 - Update test_generator.md prompt with integ…

179dbd6

…ration test guidelines Added comprehensive integration testing section with API endpoint testing patterns and service integration examples

auto-claude: subtask-2-1 - Create fixture generator agent for pytest …

772180e

…fixtures

auto-claude: subtask-2-2 - Integrate fixture generation into test_gen…

fab87ff

…erator workflow

auto-claude: subtask-3-1 - Enhance coverage analyzer to enforce 80% t…

9710df6

…hreshold

Initial commit

960ae48

docs: add subtask-4-2 implementation summary

80a270c

github-actions Bot added area/backend size/XL labels Mar 23, 2026

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Test User and others added 3 commits March 23, 2026 21:42

Merge remote-tracking branch 'origin/develop' into HEAD

ac65214

OBenner merged commit 027c187 into develop Mar 23, 2026
19 checks passed

OBenner deleted the auto-claude/190-automated-test-generation-agent branch March 24, 2026 05:56

Conversation

OBenner commented Mar 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Mar 23, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OBenner commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading