Skip to content

Conversation

@felixjordandev
Copy link
Collaborator

@felixjordandev felixjordandev commented Nov 23, 2025

Overview: This PR introduces a new suite of tests for the report summary engine, ensuring the accuracy and reliability of its core functionalities.

Changes

  • Implemented tests for the scoring logic within the summary engine.
  • Added verification for numeric aggregation calculations using mock agent data.
  • Developed tests to confirm correct weakness detection and reporting.
  • Ensured the summary construction process generates output JSON with all required keys and accurate values.
  • All new tests are located in backend/app/services/summary/tests/ and utilize example data from mock agents for thorough validation.

Summary by CodeRabbit

  • Tests
    • Expanded test coverage for the summary engine: added comprehensive checks for score calculations with partial and boundary inputs, validation of aggregated final summaries (overall text, strengths, weaknesses), and verification of content formatting and expected summary keys.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 23, 2025

Walkthrough

Adds a new test module backend/app/services/summary/tests/test_summary_engine_advanced.py with pytest fixtures and comprehensive tests for ReportSummaryEngine.generate_scores and ReportSummaryEngine.build_final_summary, covering partial/boundary inputs, default fallbacks, score formulas, and final-summary content and keys.

Changes

Cohort / File(s) Summary
New tests: Summary Engine
backend/app/services/summary/tests/test_summary_engine_advanced.py
Adds pytest fixture summary_engine and multiple tests validating generate_scores (partial data, boundary values, default fallbacks across tokenomics, sentiment, code audit, audit, team) and build_final_summary (all strengths, all weaknesses, mixed, zero-boundary). Asserts numeric score computations, presence/format of overall_summary, scores, weaknesses, and strengths, specific item names/counts, and mapped keys like "Tokenomics Strength" and "Sentiment Health". No production code changes.

Sequence Diagram(s)

No sequence diagram — changes are test additions only and do not modify or introduce control-flow between components.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Single new test file with repetitive arrange-act-assert patterns.
  • Review attention:
    • Fixture setup and initialization of summary_engine.
    • Correctness of numeric expectations and boundary-case assertions.
    • Exact strings/keys expected in the final summary structure.

Possibly related PRs

Poem

🐰 I hopped through tests with careful cheer,
Checking scores and edges far and near,
Strengths lined up, weaknesses too,
Numbers tidy, strings true-blue,
A little rabbit approves this review 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add comprehensive tests for report summary engine' directly and clearly describes the main change: adding a comprehensive test module for the ReportSummaryEngine with tests for scoring and summary building.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/add-summary-engine-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
backend/app/services/summary/tests/test_summary_engine_advanced.py (4)

19-21: Use pytest.approx() for floating point comparisons.

Exact equality checks for floating point numbers can lead to flaky tests due to precision issues.

Apply this diff:

-    assert scores["tokenomics_strength"] == (0.6 + 0.5) / 2 * 10 # 0.5 is default for utility_score
-    assert scores["sentiment_health"] == (0.8 - 0.1 + 1) / 2 * 10
-    assert scores["code_maturity"] == (0.8 * 0.6 + (1 - 0.1) * 0.4) * 10 # 0.1 is default for bug_density
+    assert scores["tokenomics_strength"] == pytest.approx((0.6 + 0.5) / 2 * 10)
+    assert scores["sentiment_health"] == pytest.approx((0.8 - 0.1 + 1) / 2 * 10)
+    assert scores["code_maturity"] == pytest.approx((0.8 * 0.6 + (1 - 0.1) * 0.4) * 10)

9-22: Add assertions for all computed scores.

The test only verifies 3 of 5 scores returned by generate_scores. Add assertions for audit_confidence and team_credibility to ensure complete validation of default fallback behavior.

Add these assertions:

    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8

34-38: Use pytest.approx() for floating point comparisons.

Same issue as the previous test—exact equality for floating point numbers should be avoided.

Apply this diff:

-    assert scores["tokenomics_strength"] == (0.0 + 1.0) / 2 * 10
-    assert scores["sentiment_health"] == (1.0 - 0.0 + 1) / 2 * 10
-    assert scores["code_maturity"] == (0.0 * 0.6 + (1 - 1.0) * 0.4) * 10
-    assert scores["audit_confidence"] == min(0 * 2, 5) + 0.0 * 5
-    assert scores["team_credibility"] == (0.0 * 0.5 + 1.0 * 0.5) * 10
+    assert scores["tokenomics_strength"] == pytest.approx((0.0 + 1.0) / 2 * 10)
+    assert scores["sentiment_health"] == pytest.approx((1.0 - 0.0 + 1) / 2 * 10)
+    assert scores["code_maturity"] == pytest.approx((0.0 * 0.6 + (1 - 1.0) * 0.4) * 10)
+    assert scores["audit_confidence"] == pytest.approx(min(0 * 2, 5) + 0.0 * 5)
+    assert scores["team_credibility"] == pytest.approx((0.0 * 0.5 + 1.0 * 0.5) * 10)

1-141: Consider adding tests for missing data categories and edge boundaries.

For more comprehensive coverage, consider adding:

  1. A test for generate_scores when entire data categories are missing (e.g., no audit_data or team_data keys), to verify all defaults work correctly.
  2. A test for build_final_summary with a score exactly at 5.0 to explicitly verify the weakness/neutral boundary behavior.

Example test for missing categories:

def test_generate_scores_missing_categories(summary_engine):
    # Test with completely missing data categories
    data = {}
    scores = summary_engine.generate_scores(data)
    
    # All scores should use defaults
    assert "tokenomics_strength" in scores
    assert "sentiment_health" in scores
    assert "code_maturity" in scores
    assert "audit_confidence" in scores
    assert "team_credibility" in scores
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffb0a2e and 42a09ab.

⛔ Files ignored due to path filters (1)
  • backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
📒 Files selected for processing (1)
  • backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/summary/tests/test_summary_engine_advanced.py (1)
backend/app/services/summary/report_summary_engine.py (1)
  • ReportSummaryEngine (4-72)
🔇 Additional comments (4)
backend/app/services/summary/tests/test_summary_engine_advanced.py (4)

1-6: LGTM!

Standard pytest setup with a clean fixture definition.


41-68: LGTM!

Test correctly validates the all-strengths scenario with appropriate assertions for structure, counts, and specific items.


69-96: LGTM!

Test correctly validates the all-weaknesses scenario with appropriate assertions.


97-141: LGTM!

Comprehensive test that validates mixed scores, including the edge case of 7.0 at the strength boundary. The assertions for overall summary content and scores dictionary are thorough and correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/app/services/summary/tests/test_summary_engine_advanced.py (3)

28-28: Consider removing the unused lines_of_code parameter.

The lines_of_code value is provided in the test data but is not used in the actual score calculation (as seen in the production code). While it doesn't affect test correctness, removing it would make the test data clearer and avoid potential confusion about which fields actually influence the calculations.


47-49: Replace hard-coded values with explicit calculations.

Hard-coding expected values like 7.8 and 7.0 makes the test brittle and obscures the relationship to the production code's default values. If defaults change in the production code, the test failure won't clearly indicate which assumption broke.

Apply this diff to make the expectations explicit and maintainable:

-    assert scores["code_maturity"] == pytest.approx(7.8) # defaults: test_coverage=0.7, bug_density=0.1
-    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
-    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8
+    assert scores["code_maturity"] == pytest.approx((0.7 * 0.6 + (1 - 0.1) * 0.4) * 10) # defaults: test_coverage=0.7, bug_density=0.1
+    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
+    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8

51-62: Consider removing this duplicate test.

This test is redundant with test_generate_scores_default_fallbacks (lines 40-50). Both tests pass an empty data dict and verify the same default behavior. The earlier test already validates that all score keys are present (implicitly through value assertions), making this test unnecessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a09ab and 5911120.

⛔ Files ignored due to path filters (1)
  • backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
📒 Files selected for processing (1)
  • backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/summary/tests/test_summary_engine_advanced.py (1)
backend/app/services/summary/report_summary_engine.py (1)
  • ReportSummaryEngine (4-72)
🔇 Additional comments (4)
backend/app/services/summary/tests/test_summary_engine_advanced.py (4)

1-6: LGTM!

The imports and fixture setup are correct and follow pytest conventions.


9-22: LGTM!

This test correctly validates the fallback behavior when partial data is provided, using the default values from the production code.


64-91: LGTM!

This test thoroughly validates the all-strengths scenario, correctly asserting that scores >= 7.0 are classified as strengths according to the production code logic.


92-197: LGTM!

All three remaining test functions provide excellent coverage:

  • test_build_final_summary_all_weaknesses validates the all-weaknesses scenario
  • test_build_final_summary_mixed_scores thoroughly tests mixed classifications with detailed assertions on content
  • test_build_final_summary_boundary_score_5_0 correctly validates the boundary case where scores of exactly 5.0 fall into neither strengths nor weaknesses

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/services/summary/tests/test_summary_engine_advanced.py (1)

8-185: Consider adding error handling and edge case tests.

While the current test coverage is solid for happy-path and boundary scenarios, consider adding tests for:

  • Invalid input types (e.g., strings instead of numbers in data/scores)
  • Negative values or scores > 10
  • Empty nlg_outputs dict with valid scores
  • Malformed data structures

These additional tests would improve robustness and catch potential runtime errors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5911120 and fd68c9d.

⛔ Files ignored due to path filters (1)
  • backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
📒 Files selected for processing (1)
  • backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)
🔇 Additional comments (6)
backend/app/services/summary/tests/test_summary_engine_advanced.py (6)

23-39: Good boundary value coverage.

The test effectively covers boundary values (0.0 and 1.0) for all scoring components. The audit_confidence calculation with zero audits correctly produces a score of 0, which appropriately represents projects with no audit history.


108-153: Comprehensive mixed-score test coverage.

This test effectively validates the classification logic for strengths (≥7), weaknesses (<5), and neutral scores (5-6). It also verifies the overall_summary content formatting and score mapping structure.


154-185: Excellent boundary case test for neutral threshold.

This test thoroughly validates that scores exactly at 5.0 are classified as neutral (neither strength nor weakness), with explicit assertions for each score category.


1-6: Import path and module structure are correct.

The ReportSummaryEngine class exists at backend/app/services/summary/report_summary_engine.py, and all required __init__.py files are present in the package hierarchy. The import statement is valid and will resolve correctly.


74-78: ****

The test assertions correctly verify the key mappings. The build_final_summary implementation transforms snake_case score keys (e.g., tokenomics_strength) to TitleCase (e.g., "Tokenomics Strength") using .replace('_', ' ').title(), which matches all assertions in lines 74-78, 102-106, 130-131, and 134-135. The strength/weakness thresholds (≥7.0 and <5.0, respectively) are also correctly implemented and tested.


19-21: All scoring formulas and default values are correct.

Verification confirms the test assertions in lines 19-21, 34-38, and 45-49 all match the ReportSummaryEngine.generate_scores implementation exactly. The tested formulas and defaults (distribution/utility: 0.5; positive/negative ratio: 0.5; test_coverage: 0.7; bug_density: 0.1; num_audits: 1; critical_resolved: 1.0; experience: 0.7; transparency: 0.8) all align with the implementation. The tests appropriately verify partial data fallback to defaults, boundary conditions, and complete default scenarios.

data = {
"tokenomics_data": {"distribution_score": 0.6}, # Missing utility_score
"sentiment_data": {"positive_sentiment_ratio": 0.8, "negative_sentiment_ratio": 0.1},
"code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the comment about missing fields.

The comment mentions "lines_of_code" as a missing field, but this field is not referenced in any of the test assertions. If lines_of_code is not used in the scoring formula, remove it from the comment to avoid confusion.

-        "code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density
+        "code_audit_data": {"test_coverage": 0.8}, # Missing bug_density
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density
"code_audit_data": {"test_coverage": 0.8}, # Missing bug_density
🤖 Prompt for AI Agents
In backend/app/services/summary/tests/test_summary_engine_advanced.py around
line 14, the inline comment references "lines_of_code" as a missing field
although no test assertions or scoring logic reference it; either remove
"lines_of_code" from the comment to avoid confusion or, if it should be part of
the test, add the field to the test input and corresponding assertions/score
usage. Update the comment to only mention actual missing fields (e.g.,
"bug_density") or implement the field in the test to match the comment.

@klingonaston
Copy link
Collaborator

Nice, the added tests will definitely boost the reliability of the summary engine. 🚀 Approved!

@klingonaston klingonaston merged commit 92835bb into main Nov 23, 2025
1 check passed
@klingonaston klingonaston deleted the feat/add-summary-engine-tests branch November 23, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants