feat: Add comprehensive tests for report summary engine #63

felixjordandev · 2025-11-23T18:06:01Z

Overview: This PR introduces a new suite of tests for the report summary engine, ensuring the accuracy and reliability of its core functionalities.

Changes

Implemented tests for the scoring logic within the summary engine.
Added verification for numeric aggregation calculations using mock agent data.
Developed tests to confirm correct weakness detection and reporting.
Ensured the summary construction process generates output JSON with all required keys and accurate values.
All new tests are located in backend/app/services/summary/tests/ and utilize example data from mock agents for thorough validation.

Summary by CodeRabbit

Tests
- Expanded test coverage for the summary engine: added comprehensive checks for score calculations with partial and boundary inputs, validation of aggregated final summaries (overall text, strengths, weaknesses), and verification of content formatting and expected summary keys.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…, and delete duplicate test.

coderabbitai · 2025-11-23T18:06:08Z

Walkthrough

Adds a new test module backend/app/services/summary/tests/test_summary_engine_advanced.py with pytest fixtures and comprehensive tests for ReportSummaryEngine.generate_scores and ReportSummaryEngine.build_final_summary, covering partial/boundary inputs, default fallbacks, score formulas, and final-summary content and keys.

Changes

Cohort / File(s)	Summary
New tests: Summary Engine `backend/app/services/summary/tests/test_summary_engine_advanced.py`	Adds pytest fixture `summary_engine` and multiple tests validating `generate_scores` (partial data, boundary values, default fallbacks across tokenomics, sentiment, code audit, audit, team) and `build_final_summary` (all strengths, all weaknesses, mixed, zero-boundary). Asserts numeric score computations, presence/format of `overall_summary`, `scores`, `weaknesses`, and `strengths`, specific item names/counts, and mapped keys like "Tokenomics Strength" and "Sentiment Health". No production code changes.

Sequence Diagram(s)

No sequence diagram — changes are test additions only and do not modify or introduce control-flow between components.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Single new test file with repetitive arrange-act-assert patterns.
Review attention:
- Fixture setup and initialization of summary_engine.
- Correctness of numeric expectations and boundary-case assertions.
- Exact strings/keys expected in the final summary structure.

Possibly related PRs

Feat: Add final summary builder with structured output #59 — Tests validate structured dict output (overall_summary, scores, weaknesses, strengths) introduced/modified in that PR.
feat: Implement scoring functions in summary engine #58 — Tests exercise ReportSummaryEngine.generate_scores and build_final_summary changes from that PR.
Feat: Add Summary Engine Base Class #57 — Tests rely on SummaryEngine base-class method signatures and behaviors introduced there.

Poem

🐰 I hopped through tests with careful cheer,
Checking scores and edges far and near,
Strengths lined up, weaknesses too,
Numbers tidy, strings true-blue,
A little rabbit approves this review 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Add comprehensive tests for report summary engine' directly and clearly describes the main change: adding a comprehensive test module for the ReportSummaryEngine with tests for scoring and summary building.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/add-summary-engine-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

backend/app/services/summary/tests/test_summary_engine_advanced.py (4)
19-21: Use pytest.approx() for floating point comparisons.

Exact equality checks for floating point numbers can lead to flaky tests due to precision issues.

Apply this diff:
-    assert scores["tokenomics_strength"] == (0.6 + 0.5) / 2 * 10 # 0.5 is default for utility_score
-    assert scores["sentiment_health"] == (0.8 - 0.1 + 1) / 2 * 10
-    assert scores["code_maturity"] == (0.8 * 0.6 + (1 - 0.1) * 0.4) * 10 # 0.1 is default for bug_density
+    assert scores["tokenomics_strength"] == pytest.approx((0.6 + 0.5) / 2 * 10)
+    assert scores["sentiment_health"] == pytest.approx((0.8 - 0.1 + 1) / 2 * 10)
+    assert scores["code_maturity"] == pytest.approx((0.8 * 0.6 + (1 - 0.1) * 0.4) * 10)
9-22: Add assertions for all computed scores.

The test only verifies 3 of 5 scores returned by generate_scores. Add assertions for audit_confidence and team_credibility to ensure complete validation of default fallback behavior.

Add these assertions:
    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8
34-38: Use pytest.approx() for floating point comparisons.

Same issue as the previous test—exact equality for floating point numbers should be avoided.

Apply this diff:
-    assert scores["tokenomics_strength"] == (0.0 + 1.0) / 2 * 10
-    assert scores["sentiment_health"] == (1.0 - 0.0 + 1) / 2 * 10
-    assert scores["code_maturity"] == (0.0 * 0.6 + (1 - 1.0) * 0.4) * 10
-    assert scores["audit_confidence"] == min(0 * 2, 5) + 0.0 * 5
-    assert scores["team_credibility"] == (0.0 * 0.5 + 1.0 * 0.5) * 10
+    assert scores["tokenomics_strength"] == pytest.approx((0.0 + 1.0) / 2 * 10)
+    assert scores["sentiment_health"] == pytest.approx((1.0 - 0.0 + 1) / 2 * 10)
+    assert scores["code_maturity"] == pytest.approx((0.0 * 0.6 + (1 - 1.0) * 0.4) * 10)
+    assert scores["audit_confidence"] == pytest.approx(min(0 * 2, 5) + 0.0 * 5)
+    assert scores["team_credibility"] == pytest.approx((0.0 * 0.5 + 1.0 * 0.5) * 10)
1-141: Consider adding tests for missing data categories and edge boundaries.

For more comprehensive coverage, consider adding:

A test for generate_scores when entire data categories are missing (e.g., no audit_data or team_data keys), to verify all defaults work correctly.

A test for build_final_summary with a score exactly at 5.0 to explicitly verify the weakness/neutral boundary behavior.

Example test for missing categories:
def test_generate_scores_missing_categories(summary_engine):
    # Test with completely missing data categories
    data = {}
    scores = summary_engine.generate_scores(data)
    
    # All scores should use defaults
    assert "tokenomics_strength" in scores
    assert "sentiment_health" in scores
    assert "code_maturity" in scores
    assert "audit_confidence" in scores
    assert "team_credibility" in scores

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffb0a2e and 42a09ab.

⛔ Files ignored due to path filters (1)

backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc

📒 Files selected for processing (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1)

backend/app/services/summary/report_summary_engine.py (1)

ReportSummaryEngine (4-72)

🔇 Additional comments (4)

backend/app/services/summary/tests/test_summary_engine_advanced.py (4)

1-6: LGTM!

Standard pytest setup with a clean fixture definition.

41-68: LGTM!

Test correctly validates the all-strengths scenario with appropriate assertions for structure, counts, and specific items.

69-96: LGTM!

Test correctly validates the all-weaknesses scenario with appropriate assertions.

97-141: LGTM!

Comprehensive test that validates mixed scores, including the edge case of 7.0 at the strength boundary. The assertions for overall summary content and scores dictionary are thorough and correct.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

backend/app/services/summary/tests/test_summary_engine_advanced.py (3)
28-28: Consider removing the unused lines_of_code parameter.

The lines_of_code value is provided in the test data but is not used in the actual score calculation (as seen in the production code). While it doesn't affect test correctness, removing it would make the test data clearer and avoid potential confusion about which fields actually influence the calculations.

47-49: Replace hard-coded values with explicit calculations.

Hard-coding expected values like 7.8 and 7.0 makes the test brittle and obscures the relationship to the production code's default values. If defaults change in the production code, the test failure won't clearly indicate which assumption broke.

Apply this diff to make the expectations explicit and maintainable:
-    assert scores["code_maturity"] == pytest.approx(7.8) # defaults: test_coverage=0.7, bug_density=0.1
-    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
-    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8
+    assert scores["code_maturity"] == pytest.approx((0.7 * 0.6 + (1 - 0.1) * 0.4) * 10) # defaults: test_coverage=0.7, bug_density=0.1
+    assert scores["audit_confidence"] == pytest.approx(min(1 * 2, 5) + 1.0 * 5)  # defaults: num_audits=1, critical_resolved=1.0
+    assert scores["team_credibility"] == pytest.approx((0.7 * 0.5 + 0.8 * 0.5) * 10)  # defaults: experience=0.7, transparency=0.8
51-62: Consider removing this duplicate test.

This test is redundant with test_generate_scores_default_fallbacks (lines 40-50). Both tests pass an empty data dict and verify the same default behavior. The earlier test already validates that all score keys are present (implicitly through value assertions), making this test unnecessary.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a09ab and 5911120.

⛔ Files ignored due to path filters (1)

backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc

📒 Files selected for processing (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1)

backend/app/services/summary/report_summary_engine.py (1)

ReportSummaryEngine (4-72)

🔇 Additional comments (4)

backend/app/services/summary/tests/test_summary_engine_advanced.py (4)

1-6: LGTM!

The imports and fixture setup are correct and follow pytest conventions.

9-22: LGTM!

This test correctly validates the fallback behavior when partial data is provided, using the default values from the production code.

64-91: LGTM!

This test thoroughly validates the all-strengths scenario, correctly asserting that scores >= 7.0 are classified as strengths according to the production code logic.

92-197: LGTM!

All three remaining test functions provide excellent coverage:

test_build_final_summary_all_weaknesses validates the all-weaknesses scenario

test_build_final_summary_mixed_scores thoroughly tests mixed classifications with detailed assertions on content

test_build_final_summary_boundary_score_5_0 correctly validates the boundary case where scores of exactly 5.0 fall into neither strengths nor weaknesses

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1)

8-185: Consider adding error handling and edge case tests.

While the current test coverage is solid for happy-path and boundary scenarios, consider adding tests for:

Invalid input types (e.g., strings instead of numbers in data/scores)

Negative values or scores > 10

Empty nlg_outputs dict with valid scores

Malformed data structures

These additional tests would improve robustness and catch potential runtime errors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5911120 and fd68c9d.

⛔ Files ignored due to path filters (1)

backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc

📒 Files selected for processing (1)

backend/app/services/summary/tests/test_summary_engine_advanced.py (1 hunks)

🔇 Additional comments (6)

backend/app/services/summary/tests/test_summary_engine_advanced.py (6)

23-39: Good boundary value coverage.

The test effectively covers boundary values (0.0 and 1.0) for all scoring components. The audit_confidence calculation with zero audits correctly produces a score of 0, which appropriately represents projects with no audit history.

108-153: Comprehensive mixed-score test coverage.

This test effectively validates the classification logic for strengths (≥7), weaknesses (<5), and neutral scores (5-6). It also verifies the overall_summary content formatting and score mapping structure.

154-185: Excellent boundary case test for neutral threshold.

This test thoroughly validates that scores exactly at 5.0 are classified as neutral (neither strength nor weakness), with explicit assertions for each score category.

1-6: Import path and module structure are correct.

The ReportSummaryEngine class exists at backend/app/services/summary/report_summary_engine.py, and all required __init__.py files are present in the package hierarchy. The import statement is valid and will resolve correctly.

74-78: ****

The test assertions correctly verify the key mappings. The build_final_summary implementation transforms snake_case score keys (e.g., tokenomics_strength) to TitleCase (e.g., "Tokenomics Strength") using .replace('_', ' ').title(), which matches all assertions in lines 74-78, 102-106, 130-131, and 134-135. The strength/weakness thresholds (≥7.0 and <5.0, respectively) are also correctly implemented and tested.

19-21: All scoring formulas and default values are correct.

Verification confirms the test assertions in lines 19-21, 34-38, and 45-49 all match the ReportSummaryEngine.generate_scores implementation exactly. The tested formulas and defaults (distribution/utility: 0.5; positive/negative ratio: 0.5; test_coverage: 0.7; bug_density: 0.1; num_audits: 1; critical_resolved: 1.0; experience: 0.7; transparency: 0.8) all align with the implementation. The tests appropriately verify partial data fallback to defaults, boundary conditions, and complete default scenarios.

coderabbitai · 2025-11-23T18:39:39Z

backend/app/services/summary/tests/test_summary_engine_advanced.py

+    data = {
+        "tokenomics_data": {"distribution_score": 0.6}, # Missing utility_score
+        "sentiment_data": {"positive_sentiment_ratio": 0.8, "negative_sentiment_ratio": 0.1},
+        "code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density


⚠️ Potential issue | 🟡 Minor

Clarify the comment about missing fields.

The comment mentions "lines_of_code" as a missing field, but this field is not referenced in any of the test assertions. If lines_of_code is not used in the scoring formula, remove it from the comment to avoid confusion.

- "code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density + "code_audit_data": {"test_coverage": 0.8}, # Missing bug_density

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density

"code_audit_data": {"test_coverage": 0.8}, # Missing bug_density

🤖 Prompt for AI Agents

In backend/app/services/summary/tests/test_summary_engine_advanced.py around line 14, the inline comment references "lines_of_code" as a missing field although no test assertions or scoring logic reference it; either remove "lines_of_code" from the comment to avoid confusion or, if it should be part of the test, add the field to the test input and corresponding assertions/score usage. Update the comment to only mention actual missing fields (e.g., "bug_density") or implement the field in the test to match the comment.

klingonaston · 2025-11-23T19:12:04Z

Nice, the added tests will definitely boost the reliability of the summary engine. 🚀 Approved!

felixjordandev added 3 commits November 23, 2025 12:05

feat: Add comprehensive tests for summary engine logic

42a09ab

Refactor summary engine tests for robustness

5911120

Refine summary engine tests: remove unused param, update calculations…

fd68c9d

…, and delete duplicate test.

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

klingonaston approved these changes Nov 23, 2025

View reviewed changes

klingonaston merged commit 92835bb into main Nov 23, 2025
1 check passed

klingonaston deleted the feat/add-summary-engine-tests branch November 23, 2025 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add comprehensive tests for report summary engine #63

feat: Add comprehensive tests for report summary engine #63

Uh oh!

felixjordandev commented Nov 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 23, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 23, 2025

Uh oh!

klingonaston commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	"code_audit_data": {"test_coverage": 0.8}, # Missing lines_of_code, bug_density
	"code_audit_data": {"test_coverage": 0.8}, # Missing bug_density

feat: Add comprehensive tests for report summary engine #63

feat: Add comprehensive tests for report summary engine #63

Uh oh!

Conversation

felixjordandev commented Nov 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

klingonaston commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felixjordandev commented Nov 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 23, 2025 •

edited

Loading