Skip to content

Conversation

@klingonaston
Copy link
Collaborator

@klingonaston klingonaston commented Nov 25, 2025

Overview: The orchestrator now updates the report state after each processing stage, providing better visibility into report generation progress.

Changes

  • Integrated report_repository.update_partial calls into orchestrator.py to persist status changes.
  • Each agent execution now triggers a report state update, reflecting pending, running, and completed statuses.
  • Ensured report_id and relevant stage-specific data are correctly passed for partial updates.

Summary by CodeRabbit

  • New Features

    • Multiple analysis agents added (on-chain, social sentiment, team docs, code audit) for richer report content.
    • Orchestrator starts on app startup to run background report processing.
  • Improvements

    • Report processing is now DB-backed with more granular status phases and preserved partial results.
    • Endpoints return clearer responses for in-progress, failed, or completed reports; URL validation reduces noisy failures.
  • Documentation

    • Added migrations docs and migration tooling configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Walkthrough

Orchestration moved from in-memory to DB-backed: Orchestrator now constructed with a session factory and uses ReportRepository and expanded ReportStatusEnum values. Agents run under AsyncSession contexts, persist partial outputs/statuses, and process_report/NLG/summary stages update report state via the repository.

Changes

Cohort / File(s) Change Summary
Core orchestrator & agents
backend/app/core/orchestrator.py
Replaced in-memory orchestration with a DB-backed Orchestrator(session_factory) that instantiates ReportRepository. Agents run with per-agent DB sessions, URL validation gating, concurrent execution returning name→result mapping, and aggregation of agent outputs into partial report data. Added create_orchestrator factory.
Report state model
backend/app/db/models/report_state.py
Extended ReportStatusEnum with: RUNNING_AGENTS, AGENTS_COMPLETED, AGENTS_FAILED, AGENTS_PARTIAL_SUCCESS, GENERATING_NLG, NLG_COMPLETED, GENERATING_SUMMARY, SUMMARY_COMPLETED. Added error_message column to ReportState.
Repository (session factory + partial updates)
backend/app/db/repositories/report_repository.py
Converted to use a session_factory (constructor now accepts a callable). All methods run session-scoped transactions (async with). Added get_report_by_id and update_partial(report_id, data); improved rollback/commit/error handling.
API routes & background runner
backend/app/api/v1/routes.py
Endpoints now accept AsyncSession = Depends(get_session) and create ReportRepository. generate_report_endpoint enqueues a background runner that updates statuses via repository (RUNNING_AGENTS → ...), calls process_report, and marks failures/completion. Status/data endpoints read DB state and return enum-driven responses.
Report processing service
backend/app/services/report_processor.py
process_report(report_id, token_id, report_repository) now uses DB-backed Orchestrator, updates repository at agent/NLG/summary stages, stores partials and final report, and handles exceptions with partial updates and status transitions.
Report service (persistence API)
backend/app/services/report_service.py
Replaced in-memory persistence with ReportRepository. generate_report, get_report_status, and get_report_data now use repository calls and return enum-based statuses and persisted partial/final payloads.
Summary engine tweak
backend/app/services/summary/report_summary_engine.py
Removed unused placeholder retrieval of lines_of_code from code audit input.
Application startup
backend/main.py
Added module-level `orchestrator_instance: Orchestrator
Dependencies
requirements.txt
Bumped SQLAlchemy and added scraping/text deps: tenacity==9.1.2, textblob==0.19.0, requests==2.32.5, beautifulsoup4==4.14.2.
Alembic / migrations tooling
alembic.ini, backend/app/db/migrations/*
Added Alembic config and migration tooling files (env.py, script template, README) to support DB migrations and autogeneration.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant API as FastAPI
    participant BG as Background Task
    participant Repo as ReportRepository (session_factory)
    participant Orch as Orchestrator (session per-agent)
    participant Agents as Agents (concurrent)
    participant NLG as NLG/Summary

    API->>Repo: create_report_entry(report_id,...)
    API->>BG: enqueue _run_agents_in_background(report_id, token, repo)

    BG->>Repo: update_partial(report_id, {status: RUNNING_AGENTS})
    BG->>Orch: create Orchestrator(session_factory)
    BG->>Orch: Orch.execute_agents(report_id, token)

    Orch->>Agents: dispatch agents (each opens AsyncSession via session_factory)
    par Agents run concurrently
      Agents-->>Repo: update_partial(report_id, {agent: partial_output}) or return result
    end

    Orch-->>Repo: update_partial(report_id, {status: AGENTS_COMPLETED/AGENTS_PARTIAL_SUCCESS, agent_outputs: ...})
    BG->>Repo: update_partial(report_id, {status: GENERATING_NLG})
    BG->>NLG: generate NLG content
    NLG-->>Repo: update_partial(report_id, {nlg_sections: ..., status: NLG_COMPLETED})
    BG->>Repo: update_partial(report_id, {status: GENERATING_SUMMARY})
    BG->>NLG: generate summary
    NLG-->>Repo: update_partial(report_id, {final_report_json: ..., status: SUMMARY_COMPLETED/COMPLETED})
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus:
    • Orchestrator constructor and session_factory usage (concurrency and lifecycle).
    • Transactional semantics and error/rollback behavior in ReportRepository, especially update_partial.
    • Correct, consistent use of new ReportStatusEnum states across API, services, and background runner.
    • Agent implementations: per-agent session handling, timeouts, external HTTP/scraping calls and their error paths.
    • Startup initialization (create_orchestrator) and any global state safety.

Possibly related PRs

Poem

🐰 I hopped from RAM to rows of rows,
Sessions warm where saved data grows.
Agents sprint, each with its part,
I stash their snippets, stitch the chart—
A tidy report straight from the heart. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.71% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating report state in the orchestrator after each processing stage, which aligns with the PR's core objective of improving visibility into report generation progress.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/orchestrator-report-state-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/core/orchestrator.py (1)

41-251: Critical: Session lifecycle violation will cause database errors.

The create_orchestrator function creates a session within an async with context (line 51), but returns an Orchestrator instance (line 251) that holds a reference to that session via self.report_repository. When the function returns, the context manager closes the session, leaving the returned orchestrator with a closed session.

Any subsequent database operations through orchestrator.report_repository will fail with session-closed errors.

Solution: The orchestrator should either:

  1. Accept a session from the caller (who manages the lifecycle), or
  2. Create and manage its own session internally with proper lifecycle management

Apply this diff for option 1 (caller-managed session):

-async def create_orchestrator(register_dummy: bool = False) -> Orchestrator:
+async def create_orchestrator(session: AsyncSession, register_dummy: bool = False) -> Orchestrator:
     """
     Factory function to create and configure an Orchestrator instance.
 
     Args:
+        session (AsyncSession): Database session for the orchestrator
         register_dummy (bool): If True, a 'dummy_agent' will be registered with the orchestrator.
 
     Returns:
         Orchestrator: A new instance of the Orchestrator.
     """
-    async with get_session() as session:
-        def _is_valid_url(url: str | None, url_name: str) -> bool:
+    def _is_valid_url(url: str | None, url_name: str) -> bool:
         # ... rest of function
-        return orch
+    return orch

Then the caller would manage the session:

async with get_session() as session:
    orchestrator = await create_orchestrator(session)
    # use orchestrator while session is open
🧹 Nitpick comments (1)
backend/app/core/orchestrator.py (1)

141-141: Refine exception handling to catch specific exceptions.

Catching blind Exception (lines 141, 182, 233) is overly broad and can mask unexpected errors. While the current logging with orchestrator_logger.exception() helps, it's better practice to catch specific expected exceptions.

Consider catching specific exceptions for each agent. For example:

-        except Exception as e:
+        except (asyncio.TimeoutError, ValueError, KeyError, httpx.HTTPError) as e:
             orchestrator_logger.exception("Social Sentiment Agent failed for report %s", report_id)
             return {"status": "failed", "error": str(e)}
+        except Exception as e:
+            orchestrator_logger.exception("Unexpected error in Social Sentiment Agent for report %s", report_id)
+            raise  # Re-raise unexpected exceptions

This pattern allows expected errors to be handled gracefully while surfacing unexpected issues for investigation.

Also applies to: 182-182, 233-233

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7303025 and 2e95ee7.

⛔ Files ignored due to path filters (53)
  • __pycache__/main.cpython-313.pyc is excluded by !**/*.pyc
  • backend/__pycache__/main.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/api/v1/__pycache__/__init__.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/api/v1/__pycache__/routes.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/core/__pycache__/config.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/core/__pycache__/exceptions.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/core/__pycache__/orchestrator.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/core/__pycache__/storage.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/db/__pycache__/connection.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/db/models/__pycache__/report_state.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/db/repositories/__pycache__/__init__.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/db/repositories/__pycache__/report_repository.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/db/tests/__pycache__/test_connection.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/__pycache__/report_processor.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/__pycache__/report_processor.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/__pycache__/price_agent.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/__pycache__/trend_agent.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/__pycache__/volume_agent.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_code_audit_agent.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_code_audit_agent.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_onchain_agent.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_onchain_agent.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_social_sentiment_agent.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_social_sentiment_agent.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_llm_client.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_llm_client.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_nlg_engine.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_nlg_engine.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_report_nlg_engine.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/tests/__pycache__/test_report_nlg_engine.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/summary/tests/__pycache__/test_report_summary_engine.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/summary/tests/__pycache__/test_report_summary_engine.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/summary/tests/__pycache__/test_summary_engine_advanced.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/tests/__pycache__/test_onchain_agent.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/validation/__pycache__/__init__.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/validation/__pycache__/validation_engine.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/tests/__pycache__/test_report_processor.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/app/tests/__pycache__/test_report_processor.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_orchestrator.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_orchestrator_config.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_orchestrator_config.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_orchestrator_integration.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_orchestrator_integration.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_report_processor.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_routes.cpython-313-pytest-8.2.0.pyc is excluded by !**/*.pyc
  • backend/tests/__pycache__/test_routes.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
📒 Files selected for processing (4)
  • backend/app/core/orchestrator.py (3 hunks)
  • backend/app/db/models/report_state.py (1 hunks)
  • backend/app/db/repositories/report_repository.py (1 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/db/repositories/report_repository.py (1)
backend/app/db/models/report_state.py (1)
  • ReportState (25-34)
backend/app/core/orchestrator.py (6)
backend/app/db/repositories/report_repository.py (1)
  • ReportRepository (8-82)
backend/app/db/models/report_state.py (1)
  • ReportState (25-34)
backend/app/services/agents/onchain_agent.py (2)
  • fetch_onchain_metrics (36-78)
  • fetch_tokenomics (86-129)
backend/app/services/agents/social_sentiment_agent.py (3)
  • SocialSentimentAgent (28-148)
  • fetch_social_data (63-91)
  • analyze_sentiment (93-148)
backend/app/services/agents/team_doc_agent.py (2)
  • scrape_team_profiles (100-146)
  • analyze_whitepaper (148-188)
backend/app/services/agents/code_audit_agent.py (4)
  • CodeAuditAgent (44-336)
  • fetch_repo_metrics (195-236)
  • analyze_code_activity (238-266)
  • search_and_summarize_audit_reports (268-296)
🪛 Ruff (0.14.5)
backend/app/db/repositories/report_repository.py

79-79: Consider moving this statement to an else block

(TRY300)

backend/app/core/orchestrator.py

141-141: Do not catch blind exception: Exception

(BLE001)


182-182: Do not catch blind exception: Exception

(BLE001)


233-233: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
backend/app/db/repositories/report_repository.py (1)

73-82: LGTM! Implementation is consistent with existing patterns.

The update_partial method follows the same structure as other update methods in this repository. The use of **data provides flexibility for updating arbitrary fields, which aligns with the PR's goal of persisting various stage-specific updates.

backend/app/core/orchestrator.py (2)

10-13: LGTM! Database integration imports are appropriate.

The imports for ReportRepository, ReportState, AsyncSession, and get_session properly support the shift to database-backed orchestration.


37-39: LGTM! Repository initialization is correct.

The orchestrator properly stores the ReportRepository instance for database operations.

requirements.txt (1)

12-12: Preserved async database driver is appropriate.

The asyncpg==0.30.0 dependency aligns with the orchestrator's shift to AsyncSession for database-backed state persistence as described in the PR objectives.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
backend/app/services/summary/report_summary_engine.py (1)

23-29: Keep code‑maturity comments aligned with actual inputs

You removed the unused lines_of_code placeholder (good), but the surrounding comments still reference it and “reasonable LOC”, while the score now depends only on coverage and bug density. This is slightly misleading.

Either reintroduce lines_of_code into the formula or (simpler) update the comments, e.g.:

-        # Assuming 'code_audit_data' contains 'lines_of_code', 'test_coverage', 'bug_density'
+        # Assuming 'code_audit_data' contains 'test_coverage' and 'bug_density'
@@
-        # Simple rule: higher coverage, lower bug density, reasonable LOC
-        scores["code_maturity"] = (test_coverage * 0.6 + (1 - bug_density) * 0.4) * 10 # Scale to 1-10
+        # Simple rule: higher coverage and lower bug density
+        scores["code_maturity"] = (test_coverage * 0.6 + (1 - bug_density) * 0.4) * 10  # Scale to 1-10
backend/app/core/orchestrator.py (1)

68-127: Critical: orch is undefined, factory is missing, and trailing return orch makes the file invalid.

  • All agent functions and registrations (onchain_data_agent, social_sentiment_agent_func, team_documentation_agent, code_audit_agent_func) call orch.report_repository... and orch.register_agent(...), but there is no definition of orch anywhere in this file. This will raise a NameError at import time when orch.register_agent(...) is executed (e.g., Line [127], Line [160], Line [215], Line [270]).
  • The docstring says instances “should be created using the create_orchestrator factory function”, and backend/main.py imports create_orchestrator, but there is no such function defined here.
  • Line [274] (return orch) is indented without being inside any function/class, which triggers an IndentationError and prevents the module from importing at all.

You need to make the wiring coherent and syntactically valid, for example by:

  • Introducing an async factory create_orchestrator(session: AsyncSession) -> Orchestrator that:
    • Instantiates orch = Orchestrator(session).
    • Registers all these agents on orch.
    • Returns orch.
  • Moving the agent definitions and orch.register_agent(...) calls inside that factory, so they close over the local orch instead of relying on a global.
  • Removing the stray top-level return orch and any other unreachable/incorrectly indented code.

Until this is fixed, backend.app.core.orchestrator (and anything importing it) will fail to import.

Also applies to: 131-161, 163-215, 217-272, 274-274

backend/app/services/report_processor.py (1)

74-77: Fix overall_status handling to avoid type mismatches and crashes on unexpected agent results.

Two issues here:

  1. Type mismatch when persisting status (Line [90]):

    overall_status = "completed"  # or "failed"
    await report_repository.update_report_status(report_id, overall_status)

    update_report_status is typed to accept ReportStatusEnum, and the rest of the code (routes, services) expect values from that enum. Persisting raw strings breaks that contract and makes DB state heterogeneous.

    Prefer:

  •    overall_status = "completed"
    
  •    if any(result["status"] == "failed" for result in agent_results.values()):
    
  •        overall_status = "failed"
    
  •    if any(isinstance(result, dict) and result.get("status") == "failed" for result in agent_results.values()):
    
  •        overall_status = ReportStatusEnum.FAILED
    
  •    else:
    
  •        overall_status = ReportStatusEnum.COMPLETED
    

...

  •    await report_repository.update_report_status(report_id, overall_status)
    
  •    await report_repository.update_report_status(report_id, overall_status)
    
    
    
  1. Potential crash if an agent returns an Exception (Line [75]):

    Because execute_agents uses return_exceptions=True, a failing agent can produce an Exception object instead of a dict. Indexing result["status"] on such an object will raise TypeError. The isinstance(result, dict) guard in the diff above fixes that by only inspecting structured results.

Together, these changes make status updates type-safe and robust to unexpected agent failures.

Also applies to: 80-83, 90-91

backend/app/api/v1/routes.py (1)

48-73: Extend status handling in /reports/{report_id}/data for all ReportStatusEnum values.

Currently, get_report_data_endpoint treats statuses as follows:

  • COMPLETED → return data (Lines [53-55]).
  • PROCESSING, PENDING, RUNNING_AGENTS, GENERATING_NLG, GENERATING_SUMMARY → 202 “still processing” (Lines [56-63]).
  • FAILED → 409 with error detail (Lines [64-73]).

However ReportStatusEnum includes additional states (AGENTS_COMPLETED, AGENTS_FAILED, AGENTS_PARTIAL_SUCCESS, NLG_COMPLETED, SUMMARY_COMPLETED, and likely CANCELLED per other code) that fall through to the final “not found or not completed” branch and are treated as an error.

Consider:

  • Mapping AGENTS_COMPLETED, NLG_COMPLETED, SUMMARY_COMPLETED into the “still processing” path until the final COMPLETED state is reached.
  • Mapping AGENTS_FAILED and CANCELLED into a failure branch similar to FAILED, returning a 4xx with appropriate detail.
  • Ensuring whatever additional failure statuses you add in ReportStatusEnum are properly surfaced here.

This will keep the API semantics aligned with the richer state machine implemented in the orchestrator and processor.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e95ee7 and ae6b1fd.

📒 Files selected for processing (7)
  • backend/app/api/v1/routes.py (1 hunks)
  • backend/app/core/orchestrator.py (6 hunks)
  • backend/app/services/report_processor.py (4 hunks)
  • backend/app/services/report_service.py (1 hunks)
  • backend/app/services/summary/report_summary_engine.py (1 hunks)
  • backend/main.py (2 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backend/main.py (1)
backend/app/core/orchestrator.py (1)
  • Orchestrator (22-49)
backend/app/services/report_processor.py (6)
backend/app/core/orchestrator.py (1)
  • Orchestrator (22-49)
backend/app/core/storage.py (1)
  • set_report_status (7-13)
backend/app/services/nlg/report_nlg_engine.py (2)
  • ReportNLGEngine (12-147)
  • generate_nlg_outputs (24-72)
backend/app/services/summary/report_summary_engine.py (1)
  • ReportSummaryEngine (4-72)
backend/app/db/repositories/report_repository.py (2)
  • ReportRepository (8-82)
  • update_report_status (35-44)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/services/report_service.py (4)
backend/app/db/repositories/report_repository.py (1)
  • ReportRepository (8-82)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/models/report_models.py (2)
  • ReportRequest (4-6)
  • ReportResponse (8-10)
backend/app/utils/id_generator.py (1)
  • generate_report_id (3-8)
🪛 Ruff (0.14.5)
backend/app/services/report_processor.py

93-93: Consider moving this statement to an else block

(TRY300)

backend/app/core/orchestrator.py

115-115: missing closing quote in string literal

(invalid-syntax)


117-117: missing closing quote in string literal

(invalid-syntax)


274-274: Unexpected indentation

(invalid-syntax)


274-274: Expected a statement

(invalid-syntax)

backend/app/api/v1/routes.py

24-24: Do not catch blind exception: Exception

(BLE001)


29-29: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


38-38: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


48-48: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (2)
requirements.txt (1)

12-16: ✓ All dependencies verified as legitimately integrated and security fix confirmed.

The review comment is accurate. All newly added dependencies are actively imported and used in the codebase:

  • tenacity (retry logic): Used in onchain_agent.py and social_sentiment_agent.py
  • textblob (NLG): Used in social_sentiment_agent.py
  • beautifulsoup4 (HTML parsing): Used in team_doc_agent.py (imported as bs4)
  • requests (HTTP): Upgraded from 2.32.3 to 2.32.5, addressing CVE-2024-47081

All versions are current and the dependencies align well with the PR's orchestration and state management objectives. No spurious or unused additions detected.

backend/app/core/orchestrator.py (1)

35-49: Core orchestration helpers look sound.

execute_agents and aggregate_results correctly run all registered agents concurrently and only merge "status" == "completed" payloads, logging everything else. _is_valid_url’s HTTP/HTTPS and netloc checks are a good guard against misconfigured endpoints. No changes needed here.

Also applies to: 52-62

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/report_processor.py (1)

36-42: Orchestrator now creates its own ReportRepository internally.

Line 36 creates an Orchestrator(report_repository.session), but looking at the Orchestrator.__init__ in orchestrator.py, it creates its own ReportRepository from the session. This means there are now two ReportRepository instances for the same session—one passed to process_report and one created by Orchestrator. This could lead to subtle commit/rollback inconsistencies.

Additionally, lines 37-39 register price_agent_run, trend_agent_run, volume_agent_run which appear to be legacy agents, while the create_orchestrator factory in orchestrator.py registers the new DB-backed agents (onchain_data_agent, social_sentiment_agent, etc.). This inconsistency means the processor won't use the new agents.

Consider either:

  1. Use create_orchestrator() factory instead of manually instantiating and registering agents
  2. Or pass the existing report_repository to the orchestrator instead of creating a new one internally
♻️ Duplicate comments (1)
backend/app/api/v1/routes.py (1)

19-19: Remove duplicate import.

get_session is already imported at line 4. This redefinition is flagged by Ruff (F811).

-from backend.app.db.connection import get_session # Added for background task
-
🧹 Nitpick comments (3)
backend/app/services/report_processor.py (1)

7-7: Unused imports from backend.app.core.storage.

save_report_data and set_report_status are imported but never used—only try_set_processing is called. Consider removing the unused imports.

-from backend.app.core.storage import save_report_data, set_report_status, try_set_processing
+from backend.app.core.storage import try_set_processing
backend/app/api/v1/routes.py (1)

60-60: Duplicate condition check for RUNNING_AGENTS.value.

The condition checks ReportStatusEnum.RUNNING_AGENTS.value twice, which is redundant.

-        elif report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.PENDING.value or report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.GENERATING_NLG.value or report_result.get("status") == ReportStatusEnum.GENERATING_SUMMARY.value:
+        elif report_result.get("status") in (
+            ReportStatusEnum.PENDING.value,
+            ReportStatusEnum.RUNNING_AGENTS.value,
+            ReportStatusEnum.GENERATING_NLG.value,
+            ReportStatusEnum.GENERATING_SUMMARY.value,
+        ):
backend/app/core/orchestrator.py (1)

36-41: Add strict=True to zip() for safer iteration.

If the number of task names doesn't match the number of results (which shouldn't happen here but is a good defensive practice), strict=True would raise an error instead of silently truncating.

-        results = await asyncio.gather(*tasks.values(), return_exceptions=True)
-        return dict(zip(tasks.keys(), results))
+        results = await asyncio.gather(*tasks.values(), return_exceptions=True)
+        return dict(zip(tasks.keys(), results, strict=True))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae6b1fd and 27bfc1d.

📒 Files selected for processing (4)
  • backend/app/api/v1/routes.py (1 hunks)
  • backend/app/core/orchestrator.py (3 hunks)
  • backend/app/services/report_processor.py (4 hunks)
  • backend/app/services/report_service.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/services/report_service.py (4)
backend/app/db/repositories/report_repository.py (3)
  • ReportRepository (8-82)
  • create_report_entry (12-33)
  • get_report_state (68-71)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/models/report_models.py (2)
  • ReportRequest (4-6)
  • ReportResponse (8-10)
backend/app/utils/id_generator.py (1)
  • generate_report_id (3-8)
backend/app/services/report_processor.py (4)
backend/app/core/orchestrator.py (1)
  • Orchestrator (23-50)
backend/app/services/nlg/report_nlg_engine.py (2)
  • ReportNLGEngine (12-147)
  • generate_nlg_outputs (24-72)
backend/app/db/repositories/report_repository.py (5)
  • ReportRepository (8-82)
  • update_report_status (35-44)
  • update_partial (73-82)
  • store_partial_report_results (46-55)
  • save_final_report (57-66)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/core/orchestrator.py (6)
backend/app/db/repositories/report_repository.py (3)
  • ReportRepository (8-82)
  • update_report_status (35-44)
  • update_partial (73-82)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/services/agents/onchain_agent.py (2)
  • fetch_onchain_metrics (36-78)
  • fetch_tokenomics (86-129)
backend/app/services/agents/social_sentiment_agent.py (2)
  • fetch_social_data (63-91)
  • analyze_sentiment (93-148)
backend/app/services/agents/team_doc_agent.py (3)
  • TeamDocAgent (9-188)
  • scrape_team_profiles (100-146)
  • analyze_whitepaper (148-188)
backend/app/services/agents/code_audit_agent.py (4)
  • CodeAuditAgent (44-336)
  • fetch_repo_metrics (195-236)
  • analyze_code_activity (238-266)
  • search_and_summarize_audit_reports (268-296)
🪛 Ruff (0.14.5)
backend/app/services/report_processor.py

93-93: Consider moving this statement to an else block

(TRY300)

backend/app/core/orchestrator.py

41-41: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


65-65: Redefinition of unused get_session from line 13

Remove definition: get_session

(F811)


135-135: Consider moving this statement to an else block

(TRY300)


140-140: Do not catch blind exception: Exception

(BLE001)


170-170: Consider moving this statement to an else block

(TRY300)


175-175: Do not catch blind exception: Exception

(BLE001)


227-227: Consider moving this statement to an else block

(TRY300)


232-232: Do not catch blind exception: Exception

(BLE001)


284-284: Consider moving this statement to an else block

(TRY300)


289-289: Do not catch blind exception: Exception

(BLE001)

backend/app/api/v1/routes.py

26-26: Expected except or finally after try block

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


28-28: Expected a statement

(invalid-syntax)


28-29: Expected an expression

(invalid-syntax)


29-29: Unexpected indentation

(invalid-syntax)


32-32: Expected a statement

(invalid-syntax)

🔇 Additional comments (4)
backend/app/services/report_service.py (2)

8-12: Repository integration looks correct.

The generate_report function now properly uses create_report_entry(report_id) and returns the enum value via ReportStatusEnum.PENDING.value, which aligns with the repository API.


14-19: Status retrieval correctly migrated to repository.

The function now uses get_report_state and properly accesses report.status.value for the response.

backend/app/core/orchestrator.py (2)

23-30: Orchestrator class structure is sound.

The class properly initializes with a session and creates a ReportRepository. The agent registration and execution patterns are appropriate.


53-63: URL validation helper is well-implemented.

The _is_valid_url function properly validates HTTP/HTTPS URLs and logs appropriate warnings for invalid configurations.

Comment on lines +50 to +53
except Exception as e:
logger.exception("Error generating NLG outputs for report %s", report_id)
await report_repository.update_partial(report_id, {"status": ReportStatusEnum.FAILED, "error": str(e)})
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Script executed:

#!/bin/bash
# Find ReportState model and examine its structure
find . -type f -name "*.py" | xargs grep -l "class ReportState" | head -5

Repository: LumintelAnalytics/ChainReport-API


<shell_result>
Repository: LumintelAnalytics/ChainReport-API
Exit code: 0

stdout:

./backend/app/db/models/report_state.py

</shell_result>


Script executed:

#!/bin/bash
# Examine ReportState model definition
cat -n backend/app/db/models/report_state.py

Repository: LumintelAnalytics/ChainReport-API


<shell_result>
Repository: LumintelAnalytics/ChainReport-API
Exit code: 0

stdout:

     1	from enum import Enum as PyEnum
     2	from sqlalchemy import Column, String, DateTime, JSON, ForeignKey
     3	from sqlalchemy.sql import func
     4	from sqlalchemy.types import Enum as SQLEnum
     5	
     6	from backend.app.db.base import Base
     7	
     8	class ReportStatusEnum(PyEnum):
     9	    PENDING = "pending"
    10	    RUNNING = "running"
    11	    FAILED = "failed"
    12	    COMPLETED = "completed"
    13	    RUNNING_AGENTS = "running_agents"
    14	    AGENTS_COMPLETED = "agents_completed"
    15	    AGENTS_FAILED = "agents_failed"
    16	    AGENTS_PARTIAL_SUCCESS = "agents_partial_success"
    17	    GENERATING_NLG = "generating_nlg"
    18	    NLG_COMPLETED = "nlg_completed"
    19	    GENERATING_SUMMARY = "generating_summary"
    20	    SUMMARY_COMPLETED = "summary_completed"
    21	
    22	
    23	
    24	
    25	class ReportState(Base):
    26	    __tablename__ = "report_states"
    27	
    28	    report_id = Column(String, ForeignKey("reports.id"), primary_key=True)
    29	    status = Column(SQLEnum(ReportStatusEnum, values_callable=lambda x: [e.value for e in x]), default=ReportStatusEnum.PENDING, index=True)
    30	    created_at = Column(DateTime(timezone=True), server_default=func.now())
    31	    updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
    32	    partial_agent_output = Column(JSON, nullable=True)  # Stores partial outputs from agents
    33	    raw_data = Column(JSON, nullable=True)  # Stores raw data collected by agents
    34	    final_report_json = Column(JSON, nullable=True)  # Stores the final generated report JSON

</shell_result>


Script executed:

#!/bin/bash
# Search for update_partial method definition
rg -n "def update_partial" backend/app -A 10

Repository: LumintelAnalytics/ChainReport-API


<shell_result>
Repository: LumintelAnalytics/ChainReport-API
Exit code: 0

stdout:

backend/app/db/repositories/report_repository.py:73:    async def update_partial(self, report_id: str, data: Dict[str, Any]) -> ReportState | None:
backend/app/db/repositories/report_repository.py-74-        try:
backend/app/db/repositories/report_repository.py-75-            stmt = update(ReportState).where(ReportState.report_id == report_id).values(**data).returning(ReportState)
backend/app/db/repositories/report_repository.py-76-            result = await self.session.execute(stmt)
backend/app/db/repositories/report_repository.py-77-            updated_report_state = result.scalar_one_or_none()
backend/app/db/repositories/report_repository.py-78-            await self.session.commit()
backend/app/db/repositories/report_repository.py-79-            return updated_report_state
backend/app/db/repositories/report_repository.py-80-        except Exception:
backend/app/db/repositories/report_repository.py-81-            await self.session.rollback()
backend/app/db/repositories/report_repository.py-82-            raise

</shell_result>


Perfect! I've verified the concern. The review comment is correct - there is indeed a bug:

  1. The ReportState model has columns: report_id, status, created_at, updated_at, partial_agent_output, raw_data, and final_report_json - no error column.

  2. The update_partial method uses SQLAlchemy's .values(**data) which directly maps dictionary keys to model columns.

  3. Passing {"status": ReportStatusEnum.FAILED, "error": str(e)} will cause SQLAlchemy to attempt setting a non-existent error column, resulting in a runtime error.


update_partial call passes undefined "error" key to ReportState model.

The ReportState model has no error column. The call update_partial(report_id, {"status": ReportStatusEnum.FAILED, "error": str(e)}) will raise a SQLAlchemy error at runtime because .values(**data) attempts to set the non-existent column. Either remove the "error" key or add an error column to the ReportState model if error tracking is needed.

🤖 Prompt for AI Agents
In backend/app/services/report_processor.py around lines 50 to 53, the call to
update_partial passes an "error" key which the ReportState model does not have,
causing SQLAlchemy to fail when mapping .values(**data); remove the "error" key
from the data passed (e.g. update_partial(report_id, {"status":
ReportStatusEnum.FAILED})) or alternatively add an error column to the
ReportState model and migrate the DB if you need to persist error text; choose
one approach and implement the corresponding code and migration so
update_partial only receives valid model fields.

Comment on lines +88 to +90
await report_repository.save_final_report(report_id, final_report_content)

set_report_status(report_id, overall_status)
await report_repository.update_report_status(report_id, overall_status)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Type mismatch: passing string to update_report_status expecting ReportStatusEnum.

Line 90 passes overall_status (a string like "completed" or "failed") to update_report_status, but the repository method signature expects status: ReportStatusEnum. This will likely cause a database error or type validation failure.

-        await report_repository.update_report_status(report_id, overall_status)
+        final_status = ReportStatusEnum.COMPLETED if overall_status == "completed" else ReportStatusEnum.FAILED
+        await report_repository.update_report_status(report_id, final_status)
🤖 Prompt for AI Agents
In backend/app/services/report_processor.py around lines 88 to 90, the call
await report_repository.update_report_status(report_id, overall_status) passes
overall_status as a raw string but the repository expects a ReportStatusEnum;
change the code to convert/validate the string into the enum before calling
update_report_status (e.g., import ReportStatusEnum and map or construct the
enum from the string, handling invalid values by raising a clear error or
defaulting to a safe enum value), then pass that enum instance to
update_report_status.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (5)
alembic.ini (1)

63-63: Hardcoded SQLite URL may cause issues in production environments.

The sqlalchemy.url is hardcoded to a local SQLite database. For production deployments, this should typically be overridden via environment variables or a separate configuration. Consider using Alembic's ability to set the URL programmatically in env.py from environment variables.

backend/app/db/repositories/report_repository.py (1)

72-76: Missing exception handling in get_report_by_id.

Unlike other methods, get_report_by_id lacks try/except/rollback handling. While SELECT queries rarely require rollback, consistency with other methods and protection against connection errors would be beneficial.

     async def get_report_by_id(self, report_id: str) -> ReportState | None:
         async with self.session_factory() as session:
-            stmt = select(ReportState).where(ReportState.report_id == report_id)
-            result = await session.execute(stmt)
-            return result.scalar_one_or_none()
+            try:
+                stmt = select(ReportState).where(ReportState.report_id == report_id)
+                result = await session.execute(stmt)
+                return result.scalar_one_or_none()
+            except Exception:
+                await session.rollback()
+                raise
backend/app/core/orchestrator.py (1)

41-41: Add strict=True to zip() for safety.

Per Ruff B905, using zip() without explicit strict= can silently drop elements if iterables have different lengths. Since tasks.keys() and results should always match, adding strict=True ensures early detection of any mismatch.

-        return dict(zip(tasks.keys(), results))
+        return dict(zip(tasks.keys(), results, strict=True))
backend/app/api/v1/routes.py (1)

27-29: Consider narrowing the exception catch.

Catching bare Exception is flagged by Ruff (BLE001). While acceptable for top-level background task handlers to prevent silent failures, consider catching more specific exceptions or at minimum excluding BaseException subclasses like KeyboardInterrupt and SystemExit.

-        except Exception as e:
+        except (ValueError, RuntimeError, IOError) as e:  # or define specific exceptions

Alternatively, if you want to catch broadly but re-raise system exceptions:

except BaseException as e:
    if isinstance(e, (KeyboardInterrupt, SystemExit)):
        raise
    # handle other exceptions
backend/app/db/migrations/script.py.mako (1)

15-18: Broaden down_revision type annotation to support merge revisions

Alembic allows down_revision to be a tuple or list (e.g., ('rev1', 'rev2')) for merge migrations, not just a single string or None. The current annotation is narrower than what Alembic actually generates.

To match real values and align with branch_labels/depends_on, change line 16:

-down_revision: Union[str, None] = ${repr(down_revision)}
+down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}

This preserves runtime behavior while enabling type checkers and IDEs to recognize all valid down_revision shapes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27bfc1d and eff89aa.

⛔ Files ignored due to path filters (2)
  • backend/app/db/__pycache__/base.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/db/migrations/__pycache__/env.cpython-313.pyc is excluded by !**/*.pyc
📒 Files selected for processing (9)
  • alembic.ini (1 hunks)
  • backend/app/api/v1/routes.py (1 hunks)
  • backend/app/core/orchestrator.py (3 hunks)
  • backend/app/db/migrations/README (1 hunks)
  • backend/app/db/migrations/env.py (1 hunks)
  • backend/app/db/migrations/script.py.mako (1 hunks)
  • backend/app/db/models/report_state.py (2 hunks)
  • backend/app/db/repositories/report_repository.py (1 hunks)
  • requirements.txt (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/app/db/migrations/README
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/db/repositories/report_repository.py (2)
backend/app/db/models/report.py (1)
  • Report (4-6)
backend/app/db/models/report_state.py (2)
  • ReportState (25-35)
  • ReportStatusEnum (8-20)
backend/app/api/v1/routes.py (7)
backend/app/db/database.py (1)
  • get_db (26-28)
backend/app/db/repositories/report_repository.py (3)
  • ReportRepository (8-88)
  • update_report_status (36-46)
  • update_partial (78-88)
backend/app/models/report_models.py (2)
  • ReportRequest (4-6)
  • ReportResponse (8-10)
backend/app/services/report_service.py (3)
  • generate_report (8-12)
  • get_report_status (14-19)
  • get_report_data (21-34)
backend/app/core/storage.py (1)
  • get_report_status (15-18)
backend/app/core/exceptions.py (1)
  • ReportNotFoundException (4-6)
backend/app/db/models/report_state.py (1)
  • ReportStatusEnum (8-20)
backend/app/db/migrations/env.py (1)
backend/app/db/connection.py (1)
  • connect (18-62)
🪛 Ruff (0.14.5)
backend/app/core/orchestrator.py

41-41: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


129-129: Consider moving this statement to an else block

(TRY300)


134-134: Do not catch blind exception: Exception

(BLE001)


166-166: Consider moving this statement to an else block

(TRY300)


171-171: Do not catch blind exception: Exception

(BLE001)


225-225: Consider moving this statement to an else block

(TRY300)


230-230: Do not catch blind exception: Exception

(BLE001)


284-284: Consider moving this statement to an else block

(TRY300)


289-289: Do not catch blind exception: Exception

(BLE001)

backend/app/db/repositories/report_repository.py

21-21: Consider moving this statement to an else block

(TRY300)


43-43: Consider moving this statement to an else block

(TRY300)


55-55: Consider moving this statement to an else block

(TRY300)


67-67: Consider moving this statement to an else block

(TRY300)


85-85: Consider moving this statement to an else block

(TRY300)

backend/app/api/v1/routes.py

27-27: Do not catch blind exception: Exception

(BLE001)


33-33: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


33-33: Undefined name get_session

(F821)


42-42: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


42-42: Undefined name get_session

(F821)


52-52: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


52-52: Undefined name get_session

(F821)

🔇 Additional comments (5)
backend/app/db/models/report_state.py (2)

13-20: New status enum values properly support stage tracking.

The expanded ReportStatusEnum provides granular visibility into the report generation pipeline stages as intended by the PR objectives. The values align with the orchestrator's state transitions.


35-35: LGTM on the error_message column addition.

The error_message column provides a proper place to persist error details during failed processing stages, complementing the status transitions.

backend/app/core/orchestrator.py (2)

65-69: Factory function correctly initializes Orchestrator with session factory.

The create_orchestrator function properly passes AsyncSessionLocal as the session factory to the Orchestrator, which is the correct pattern for the repository's per-call session management.


76-138: Agent function structure implements DB-backed state tracking as intended.

The onchain_data_agent function properly:

  • Updates status to RUNNING_AGENTS before execution
  • Handles timeout and exception scenarios
  • Persists partial results and status via update_partial
  • Uses appropriate ReportStatusEnum values

This aligns with the PR objectives to update report state after each processing stage.

However, verify the session/repository instantiation issue flagged earlier is resolved before merging.

backend/app/db/migrations/script.py.mako (1)

1-13: Template structure and stubs look solid

Header metadata, imports, and upgrade/downgrade stubs are idiomatic for Alembic script templates. The use of type hints and optional imports injection is clean and should work well with autogenerated bodies.

Also applies to: 21-26

from fastapi import APIRouter, BackgroundTasks, Depends
from fastapi.responses import JSONResponse
from sqlalchemy.ext.asyncio import AsyncSession
from backend.app.db.database import get_db
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

get_session is undefined; should use get_db.

Line 4 imports get_db, but lines 33, 42, and 52 reference get_session which doesn't exist. This will raise a NameError at runtime.

-from backend.app.db.database import get_db
+from backend.app.db.database import get_db as get_session

Alternatively, update all usages on lines 33, 42, 52 to use get_db instead of get_session.

🤖 Prompt for AI Agents
In backend/app/api/v1/routes.py around line 4 (and usages at lines 33, 42, 52),
the module imports get_db but the route dependencies call get_session
(undefined); replace each get_session reference with get_db in the Depends(...)
calls so the correct dependency is used, or if you intended to keep get_session
instead, import get_session from backend.app.db.database; ensure the dependency
name matches the imported function everywhere.

Comment on lines 19 to +30
async def _run_agents_in_background(report_id: str, token_id: str):
try:
await orchestrator_instance.execute_agents_concurrently(report_id, token_id)
except Exception as e:
api_logger.error(f"Agent execution failed for report {report_id}: {e}")
# Here you might want to update the report status to 'failed' in in_memory_reports
# For now, we'll just log it.
if report_id in in_memory_reports:
in_memory_reports[report_id]["status"] = "failed"
in_memory_reports[report_id]["detail"] = f"Agent execution failed: {e}"
async for session in get_db():
report_repository = ReportRepository(session)
try:
await report_repository.update_report_status(report_id, ReportStatusEnum.RUNNING_AGENTS)
await process_report(report_id, token_id, report_repository)
await report_repository.update_report_status(report_id, ReportStatusEnum.COMPLETED)
break # Exit the async for loop after successful processing
except Exception as e:
api_logger.error(f"Report processing failed for report {report_id}: {e}")
await report_repository.update_partial(report_id, {"status": ReportStatusEnum.FAILED, "error_message": str(e)})
break # Exit the async for loop on failure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ReportRepository expects a session factory, not a session instance.

According to ReportRepository.__init__(self, session_factory: Callable[..., AsyncSession]), the constructor expects a callable that creates sessions. The repository methods internally call async with self.session_factory() as session. Passing a raw AsyncSession object will fail when the repository attempts to invoke it as a callable.

Either pass the factory directly:

 async def _run_agents_in_background(report_id: str, token_id: str):
-    async for session in get_db():
-        report_repository = ReportRepository(session)
-        try:
-            await report_repository.update_report_status(report_id, ReportStatusEnum.RUNNING_AGENTS)
-            await process_report(report_id, token_id, report_repository)
-            await report_repository.update_report_status(report_id, ReportStatusEnum.COMPLETED)
-            break  # Exit the async for loop after successful processing
-        except Exception as e:
-            api_logger.error(f"Report processing failed for report {report_id}: {e}")
-            await report_repository.update_partial(report_id, {"status": ReportStatusEnum.FAILED, "error_message": str(e)})
-            break # Exit the async for loop on failure
+    from backend.app.db.database import AsyncSessionLocal
+    report_repository = ReportRepository(AsyncSessionLocal)
+    try:
+        await report_repository.update_report_status(report_id, ReportStatusEnum.RUNNING_AGENTS)
+        await process_report(report_id, token_id, report_repository)
+        await report_repository.update_report_status(report_id, ReportStatusEnum.COMPLETED)
+    except Exception as e:
+        api_logger.error(f"Report processing failed for report {report_id}: {e}")
+        await report_repository.update_partial(report_id, {"status": ReportStatusEnum.FAILED, "error_message": str(e)})

Or refactor ReportRepository to accept either a session or session factory based on your architectural preference.

🧰 Tools
🪛 Ruff (0.14.5)

27-27: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In backend/app/api/v1/routes.py around lines 19-30, the code constructs
ReportRepository with a session instance but ReportRepository.__init__ expects a
session factory (callable). Change the loop to obtain and pass the session
factory (rename the loop variable to session_factory) and instantiate the
repository as ReportRepository(session_factory) so repository methods can call
async with self.session_factory(); do not pass a raw AsyncSession
instance—alternatively, if get_db yields sessions instead of a factory, wrap it
in a no-arg async callable that returns that session and pass that callable to
ReportRepository.

Comment on lines +33 to +36
async def generate_report_endpoint(request: ReportRequest, background_tasks: BackgroundTasks, session: AsyncSession = Depends(get_session)):
api_logger.info(f"Received report generation request for token_id: {request.token_id}")
report_response = await generate_report(request)
report_repository = ReportRepository(session)
report_response = await generate_report(request, report_repository)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same ReportRepository constructor mismatch in endpoint.

The endpoint passes a raw AsyncSession to ReportRepository, but the repository expects a session factory callable. This inconsistency with the repository's API will cause failures when repository methods try to use self.session_factory() as a context manager.

This applies to all three endpoints (lines 33-35, 42-44, 52-54). Either the ReportRepository needs to be refactored to accept raw sessions, or you need to pass a factory. Given the repository's current design, you might need a wrapper:

report_repository = ReportRepository(lambda: contextmanager_yielding_session(session))

Or consider a design where request-scoped endpoints use a different repository pattern than background tasks.

🧰 Tools
🪛 Ruff (0.14.5)

33-33: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


33-33: Undefined name get_session

(F821)

🤖 Prompt for AI Agents
In backend/app/api/v1/routes.py around lines 33-36 (and similarly at 42-44 and
52-54), the code passes a raw AsyncSession into ReportRepository even though the
repository expects a session-factory callable; wrap the request-scoped session
into a callable that returns a contextmanager yielding that session (e.g.,
create a small lambda or helper using contextlib.contextmanager that yields the
provided session) and pass that callable to ReportRepository for all three
endpoints, or alternatively refactor ReportRepository to accept raw sessions and
update its methods accordingly so the usage is consistent.

api_logger.info(f"Returning data for report_id: {report_id}")
return report_result
elif report_result.get("status") == "processing":
elif report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.PENDING.value or report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.GENERATING_NLG.value or report_result.get("status") == ReportStatusEnum.GENERATING_SUMMARY.value:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate condition: RUNNING_AGENTS.value checked twice.

The condition checks ReportStatusEnum.RUNNING_AGENTS.value twice, which is redundant and suggests a copy-paste error.

-        elif report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.PENDING.value or report_result.get("status") == ReportStatusEnum.RUNNING_AGENTS.value or report_result.get("status") == ReportStatusEnum.GENERATING_NLG.value or report_result.get("status") == ReportStatusEnum.GENERATING_SUMMARY.value:
+        elif report_result.get("status") in (
+            ReportStatusEnum.PENDING.value,
+            ReportStatusEnum.RUNNING_AGENTS.value,
+            ReportStatusEnum.GENERATING_NLG.value,
+            ReportStatusEnum.GENERATING_SUMMARY.value,
+        ):

This refactor also improves readability by using in with a tuple instead of chained or comparisons.

🤖 Prompt for AI Agents
In backend/app/api/v1/routes.py around line 60, the conditional redundantly
checks ReportStatusEnum.RUNNING_AGENTS.value twice and uses a long chained OR
expression; replace it with a single membership test using "in" and a tuple of
the desired status values (remove the duplicate RUNNING_AGENTS entry) so the
condition reads something like: report_result.get("status") in
(ReportStatusEnum.RUNNING_AGENTS.value, ReportStatusEnum.PENDING.value,
ReportStatusEnum.GENERATING_NLG.value,
ReportStatusEnum.GENERATING_SUMMARY.value).

Comment on lines +28 to +30
def __init__(self, session_factory: Callable[..., AsyncSession]):
self._agents: Dict[str, Callable] = {}
self.report_repository = ReportRepository(session_factory)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent session handling between Orchestrator and agent functions.

The Orchestrator.__init__ correctly receives a session_factory and passes it to ReportRepository. However, the agent functions (lines 78-79, 145-146, 180-181, 241-242) create their own session via get_db() and then pass that session directly to ReportRepository(session). This creates two issues:

  1. The ReportRepository constructor expects a factory, not a session instance
  2. The orchestrator's self.report_repository is never used by agents (they create their own)

Consider either:

  • Having agents use self.report_repository from the orchestrator (requires passing orchestrator reference)
  • Updating agents to pass AsyncSessionLocal as the factory:
-            async with get_db() as session:
-                report_repository = ReportRepository(session)
+            report_repository = ReportRepository(AsyncSessionLocal)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/core/orchestrator.py around lines 28-30 and agent usages at lines
78-79, 145-146, 180-181, 241-242: agents are creating a session via get_db() and
instantiating ReportRepository(session) even though Orchestrator.__init__
receives a session_factory and ReportRepository expects a factory, and
Orchestrator.self.report_repository is unused; fix by either (A) have agents use
the orchestrator's report_repository (pass the orchestrator or its
report_repository into agent functions and remove per-agent ReportRepository
creation), or (B) if agents must construct their own repository, pass the
session factory (AsyncSessionLocal) into ReportRepository instead of a session
instance and stop calling get_db() there; ensure ReportRepository always
receives the factory type and that all code paths consistently use the same
factory-based pattern.

Comment on lines +123 to +128
existing_report = await report_repository.get_report_by_id(report_id)
existing_partial_agent_output = existing_report.partial_agent_output if existing_report else {}
if overall_agent_status == "completed":
await report_repository.update_partial(report_id, {"status": ReportStatusEnum.AGENTS_COMPLETED, "partial_agent_output": {**existing_partial_agent_output, "onchain_data_agent": result}})
else:
await report_repository.update_partial(report_id, {"status": ReportStatusEnum.AGENTS_FAILED, "error_message": "Onchain Data Agent failed", "partial_agent_output": {**existing_partial_agent_output, "onchain_data_agent": result}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential AttributeError if partial_agent_output is None.

When existing_report exists but partial_agent_output is None (its default), the dict unpacking {**existing_partial_agent_output, ...} will fail with TypeError: 'NoneType' object is not a mapping.

Apply defensive handling:

                     existing_report = await report_repository.get_report_by_id(report_id)
-                    existing_partial_agent_output = existing_report.partial_agent_output if existing_report else {}
+                    existing_partial_agent_output = (existing_report.partial_agent_output or {}) if existing_report else {}

The same fix is needed at lines 164, 223, and 282.

🤖 Prompt for AI Agents
In backend/app/core/orchestrator.py around lines 123-128,
existing_report.partial_agent_output can be None which will cause a TypeError
when using dict unpacking; change the code to coalesce the value to an empty
dict before unpacking (e.g., set existing_partial_agent_output =
existing_report.partial_agent_output or {}), then use
{**existing_partial_agent_output, "onchain_data_agent": result} and similar for
the failure branch; apply the same defensive coalescing fix at the other
occurrences on lines ~164, ~223, and ~282.

Comment on lines +8 to +15
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config

# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove duplicate configuration blocks.

Lines 8-15 and 23-30 contain identical code (config assignment and fileConfig call). This duplication should be removed.

 # this is the Alembic Config object, which provides
 # access to the values within the .ini file in use.
 config = context.config

 # Interpret the config file for Python logging.
 # This line sets up loggers basically.
 if config.config_file_name is not None:
     fileConfig(config.config_file_name)

 # add your model's MetaData object here
 # for 'autogenerate' support
 import os, sys
 sys.path.append(os.getcwd())
 from backend.app.db.base import Base
-
-# this is the Alembic Config object, which provides
-# access to the values within the .ini file in use.
-config = context.config
-
-# Interpret the config file for Python logging.
-# This line sets up loggers basically.
-if config.config_file_name is not None:
-    fileConfig(config.config_file_name)

 # add your model's MetaData object here

Also applies to: 23-30

🤖 Prompt for AI Agents
In backend/app/db/migrations/env.py around lines 8-15 and 23-30, the assignment
"config = context.config" and the subsequent fileConfig call are duplicated;
remove the second duplicate block (lines 23-30) so the config assignment and
fileConfig(config.config_file_name) appear only once before any logging or
environment setup, leaving the rest of the migration setup unchanged.

Comment on lines +74 to +91
import asyncio

async def process_migrations():
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)

async with connectable.connect() as connection:
context.configure(
connection=connection, target_metadata=target_metadata
)

async with context.begin_transaction():
context.run_migrations()

asyncio.run(process_migrations())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Async/sync engine mismatch will cause runtime errors.

engine_from_config returns a synchronous Engine, but the code attempts to use it with async with connectable.connect() patterns. This will fail at runtime because a sync engine doesn't support async context managers.

For async migrations with aiosqlite, use the run_sync pattern:

 def run_migrations_online() -> None:
     """Run migrations in 'online' mode.

     In this scenario we need to create an Engine
     and associate a connection with the context.

     """
-    import asyncio
-
-    async def process_migrations():
-        connectable = engine_from_config(
-            config.get_section(config.config_ini_section, {}),
-            prefix="sqlalchemy.",
-            poolclass=pool.NullPool,
-        )
-
-        async with connectable.connect() as connection:
-            context.configure(
-                connection=connection, target_metadata=target_metadata
-            )
-
-            async with context.begin_transaction():
-                context.run_migrations()
-
-    asyncio.run(process_migrations())
+    from sqlalchemy.ext.asyncio import create_async_engine
+    import asyncio
+
+    async def run_async_migrations():
+        connectable = create_async_engine(
+            config.get_main_option("sqlalchemy.url"),
+            poolclass=pool.NullPool,
+        )
+
+        async with connectable.connect() as connection:
+            await connection.run_sync(do_run_migrations)
+
+        await connectable.dispose()
+
+    def do_run_migrations(connection):
+        context.configure(
+            connection=connection,
+            target_metadata=target_metadata,
+        )
+        with context.begin_transaction():
+            context.run_migrations()
+
+    asyncio.run(run_async_migrations())

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +9 to +10
def __init__(self, session_factory: Callable[..., AsyncSession]):
self.session_factory = session_factory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Constructor signature conflicts with usage in orchestrator agent functions.

The constructor expects session_factory: Callable[..., AsyncSession], but in orchestrator.py (lines 79, 146, 181, 242), agents call ReportRepository(session) passing an AsyncSession directly, not a callable factory. This type mismatch will cause runtime errors when methods try to call self.session_factory().

Either update the orchestrator to pass a factory, or provide an overloaded constructor:

 class ReportRepository:
-    def __init__(self, session_factory: Callable[..., AsyncSession]):
-        self.session_factory = session_factory
+    def __init__(self, session_or_factory: AsyncSession | Callable[..., AsyncSession]):
+        if callable(session_or_factory) and not isinstance(session_or_factory, AsyncSession):
+            self.session_factory = session_or_factory
+        else:
+            # Wrap a single session in a context manager
+            self._session = session_or_factory
+            self.session_factory = self._wrap_session
+
+    @contextlib.asynccontextmanager
+    async def _wrap_session(self):
+        yield self._session

Alternatively, update orchestrator agent functions to pass AsyncSessionLocal instead of session.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/app/db/repositories/report_repository.py around lines 9-10, the
constructor currently types and treats session_factory as Callable[...,
AsyncSession] but orchestrator passes an AsyncSession instance; change the
constructor to accept either a Callable[..., AsyncSession] or an AsyncSession
(use Union[Callable[..., AsyncSession], AsyncSession]) and normalize it: if
given a callable store it as self.session_factory, otherwise wrap the provided
AsyncSession in a zero-arg factory (e.g., a lambda returning that session) so
existing methods can safely call self.session_factory(); update the type hints
and any imports accordingly so runtime calls to self.session_factory() work
without changing orchestrator, or alternatively update the orchestrator to pass
the actual session factory (AsyncSessionLocal) at the call sites (lines 79, 146,
181, 242).

@felixjordandev felixjordandev merged commit 37e4dca into main Nov 25, 2025
1 check passed
@felixjordandev felixjordandev deleted the feat/orchestrator-report-state-updates branch November 25, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants