Skip to content

Conversation

@elisafalk
Copy link
Collaborator

@elisafalk elisafalk commented Nov 20, 2025

Overview: This PR introduces a new function to summarize team and documentation data, providing a structured analysis for reports.

Changes

  • Implemented generate_team_doc_text within backend/app/services/agents/team_doc_agent.py.
  • Utilizes LLM prompts to analyze scraped text, focusing on roles, experience, and credibility.
  • Ensures the output is structured for easy integration into the final report.
  • Added new prompt templates in backend/app/services/nlg/prompt_templates.py to support the summarization.

Summary by CodeRabbit

  • New Features

    • Introduced automated team documentation generation that produces a multi-section analysis covering team roles, experience, credibility, and documentation strength.
  • Enhancements

    • Added structured prompt templates and a template rendering utility to improve and standardize generated summaries.
  • Tests

    • Added unit tests validating the multi-section team documentation generation flow.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

A new asynchronous method generate_team_doc_text was added to TeamDocAgent to produce a four-part team/document analysis by building prompts from new templates and performing four sequential LLM calls. New prompt templates and a fill_template utility were introduced, and an async unit test was added to validate the flow.

Changes

Cohort / File(s) Summary
TeamDocAgent Enhancement
backend/app/services/agents/team_doc_agent.py
Added async def generate_team_doc_text(self, team_data: List[Dict[str, Any]], doc_data: Dict[str, Any]) -> str which builds four prompts (roles, experience, credibility, documentation strength) using templates, streams/generates text via LLMClient for each section, and concatenates Markdown-headed sections into a single string. Imports for LLM interaction and templates added.
Test Coverage
backend/app/services/agents/tests/test_team_doc_agent_new_feature.py
New async unit test that patches LLMClient as an async context manager, stubs four sequential generate_text responses, invokes generate_team_doc_text, asserts the combined Markdown output, and verifies four LLM calls were made.
Prompt Template System
backend/app/services/nlg/prompt_templates.py
Added templates: team_roles_summary, team_experience_summary, team_credibility_summary, documentation_strength_summary. Added fill_template(template: str, **kwargs) -> str utility for rendering templates with keyword args.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant TeamDocAgent
    participant PromptTemplates
    participant LLMClient
    participant Output

    Caller->>TeamDocAgent: generate_team_doc_text(team_data, doc_data)
    TeamDocAgent->>PromptTemplates: get_template("team_roles_summary")
    PromptTemplates-->>TeamDocAgent: template_1
    TeamDocAgent->>PromptTemplates: fill_template(template_1, team_data, doc_data)
    PromptTemplates-->>TeamDocAgent: prompt_1
    TeamDocAgent->>LLMClient: async with LLMClient() -> generate_text(prompt_1)
    LLMClient-->>TeamDocAgent: response_1

    TeamDocAgent->>PromptTemplates: get_template("team_experience_summary")
    PromptTemplates-->>TeamDocAgent: template_2
    TeamDocAgent->>PromptTemplates: fill_template(template_2, ...)
    PromptTemplates-->>TeamDocAgent: prompt_2
    TeamDocAgent->>LLMClient: generate_text(prompt_2)
    LLMClient-->>TeamDocAgent: response_2

    TeamDocAgent->>PromptTemplates: get_template("team_credibility_summary")
    PromptTemplates-->>TeamDocAgent: template_3
    TeamDocAgent->>LLMClient: generate_text(prompt_3)
    LLMClient-->>TeamDocAgent: response_3

    TeamDocAgent->>PromptTemplates: get_template("documentation_strength_summary")
    PromptTemplates-->>TeamDocAgent: template_4
    TeamDocAgent->>LLMClient: generate_text(prompt_4)
    LLMClient-->>TeamDocAgent: response_4

    TeamDocAgent->>Output: concatenate headings + response_1..4
    Output-->>Caller: final_markdown_text
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • Areas to focus:
    • Correct async context manager usage and error propagation around LLMClient.
    • Template placeholders vs. fill_template argument names and escaping.
    • Test mocking of LLMClient as an async context manager and verification of four calls and final string formatting.

Possibly related PRs

Poem

🐰 Whiskers twitch, prompts alight,

Four little calls in the quiet night.
I stitch team tales, section by section,
Hopping through templates with gentle affection.
Docs bloom bright—a rabbit's reflection ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main feature being added: team documentation summarization functionality via the new generate_team_doc_text method.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/summarize-team-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
backend/app/services/nlg/prompt_templates.py (1)

106-111: Add error handling for missing template placeholders.

The fill_template function will raise KeyError if a template contains placeholders that aren't provided in kwargs. Consider adding a try-except block to provide a more informative error message.

Apply this diff to add error handling:

 def fill_template(template: str, **kwargs) -> str:
     """
     Fills a given template with the provided data using keyword arguments.
     This allows for flexible placeholder names in the template.
     """
-    return template.format(**kwargs)
+    try:
+        return template.format(**kwargs)
+    except KeyError as e:
+        raise ValueError(f"Missing required template placeholder: {e}") from e
backend/app/services/agents/team_doc_agent.py (2)

30-71: Consider parallelizing independent LLM calls for better performance.

The four LLM calls are executed sequentially, meaning total execution time is the sum of all individual call times. Since these calls are independent, they could be parallelized using asyncio.gather() to significantly reduce total latency.

Here's how to parallelize the calls:

async def generate_team_doc_text(self, team_data: List[Dict[str, Any]], doc_data: Dict[str, Any]) -> str:
    """..."""
    import asyncio
    orchestrator_logger.info("Generating team and documentation analysis using LLM.")
    
    def extract_content(response: Dict[str, Any]) -> str:
        """Safely extract content from LLM response."""
        choices = response.get("choices", [])
        if not choices:
            return "N/A"
        return choices[0].get("message", {}).get("content", "N/A")
    
    try:
        async with LLMClient() as client:
            # Prepare all prompts
            team_data_json = json.dumps(team_data, indent=2)
            doc_data_json = json.dumps(doc_data, indent=2)
            
            prompts = [
                fill_template(get_template("team_roles_summary"), team_data=team_data_json),
                fill_template(get_template("team_experience_summary"), team_data=team_data_json),
                fill_template(get_template("team_credibility_summary"), team_data=team_data_json),
                fill_template(get_template("documentation_strength_summary"), doc_data=doc_data_json),
            ]
            
            # Execute all LLM calls in parallel
            responses = await asyncio.gather(
                *[client.generate_text(prompt) for prompt in prompts],
                return_exceptions=True
            )
            
            # Build output with headings
            headings = [
                "### Team Roles and Responsibilities\n",
                "### Team Experience and Expertise\n",
                "### Team Credibility\n",
                "### Documentation Strength\n"
            ]
            
            summary_parts = []
            for heading, response in zip(headings, responses):
                summary_parts.append(heading)
                if isinstance(response, Exception):
                    orchestrator_logger.error("LLM call failed: %s", response)
                    summary_parts.append("N/A")
                else:
                    summary_parts.append(extract_content(response))
                summary_parts.append("\n\n")
                
    except Exception as e:
        orchestrator_logger.error("Error generating team and documentation analysis: %s", e)
        return "Error: Failed to generate team and documentation analysis."
    
    return "".join(summary_parts)

32-65: Optional: Cache JSON serialization to avoid redundant work.

The json.dumps(team_data, indent=2) call is repeated three times (lines 34, 44, 54). While not a critical issue, caching this value would eliminate redundant serialization work.

Apply this diff:

         orchestrator_logger.info("Generating team and documentation analysis using LLM.")
         summary_parts = []
+        team_data_json = json.dumps(team_data, indent=2)
+        doc_data_json = json.dumps(doc_data, indent=2)
 
         async with LLMClient() as client:
             # Summarize Team Roles
             team_roles_prompt = fill_template(
                 get_template("team_roles_summary"),
-                team_data=json.dumps(team_data, indent=2)
+                team_data=team_data_json
             )
             ...
             team_experience_prompt = fill_template(
                 get_template("team_experience_summary"),
-                team_data=json.dumps(team_data, indent=2)
+                team_data=team_data_json
             )
             ...
             team_credibility_prompt = fill_template(
                 get_template("team_credibility_summary"),
-                team_data=json.dumps(team_data, indent=2)
+                team_data=team_data_json
             )
             ...
             doc_strength_prompt = fill_template(
                 get_template("documentation_strength_summary"),
-                doc_data=json.dumps(doc_data, indent=2)
+                doc_data=doc_data_json
             )
backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (1)

28-33: Remove extraneous comments.

The # Corrected comments on lines 29-32 suggest these were previously incorrect but should now be removed as they don't add value to understanding the test.

Apply this diff:

     # Mock LLM responses
     mock_llm_client_instance.generate_text.side_effect = [
-        {"choices": [{"message": {"content": "Summary of team roles."}}]}, # Corrected
-        {"choices": [{"message": {"content": "Summary of team experience."}}]}, # Corrected
-        {"choices": [{"message": {"content": "Summary of team credibility."}}]}, # Corrected
-        {"choices": [{"message": {"content": "Summary of documentation strength."}}]}, # Corrected
+        {"choices": [{"message": {"content": "Summary of team roles."}}]},
+        {"choices": [{"message": {"content": "Summary of team experience."}}]},
+        {"choices": [{"message": {"content": "Summary of team credibility."}}]},
+        {"choices": [{"message": {"content": "Summary of documentation strength."}}]},
     ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4896014 and 053d9a6.

⛔ Files ignored due to path filters (4)
  • backend/app/services/agents/__pycache__/team_doc_agent.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/__pycache__/prompt_templates.cpython-313.pyc is excluded by !**/*.pyc
  • backend/logs/app.log is excluded by !**/*.log
📒 Files selected for processing (3)
  • backend/app/services/agents/team_doc_agent.py (1 hunks)
  • backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (1 hunks)
  • backend/app/services/nlg/prompt_templates.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/services/agents/team_doc_agent.py (2)
backend/app/services/nlg/llm_client.py (2)
  • LLMClient (9-55)
  • generate_text (30-55)
backend/app/services/nlg/prompt_templates.py (2)
  • get_template (6-104)
  • fill_template (106-111)
backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (2)
backend/app/services/agents/team_doc_agent.py (2)
  • TeamDocAgent (10-161)
  • generate_team_doc_text (15-71)
backend/app/services/nlg/llm_client.py (1)
  • generate_text (30-55)
🔇 Additional comments (3)
backend/app/services/nlg/prompt_templates.py (1)

69-102: LGTM! Well-structured prompt templates.

The four new templates are clear, well-organized, and appropriately scoped for their respective analysis tasks.

backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (2)

5-26: LGTM! Well-structured test setup.

The test correctly mocks LLMClient as an async context manager and provides appropriate test data for validation.


43-51: Test will fail due to production code bug.

This test expects the output to match expected_output, but the production code at backend/app/services/agents/team_doc_agent.py lines 38, 48, 58, 68 has an IndexError risk due to unsafe list indexing. While the test's mock data avoids triggering this bug (by providing non-empty choices), the test doesn't validate the error-handling path or edge cases where the API might return empty responses.

Consider adding additional test cases to cover:

  1. Empty choices list in API response
  2. Missing message or content fields
  3. LLM client exceptions

Example additional test:

@pytest.mark.asyncio
async def test_generate_team_doc_text_handles_empty_response(mocker):
    mock_llm_client_class = mocker.patch('backend.app.services.agents.team_doc_agent.LLMClient')
    mock_llm_client_instance = AsyncMock()
    mock_llm_client_class.return_value.__aenter__.return_value = mock_llm_client_instance
    
    from backend.app.services.agents.team_doc_agent import TeamDocAgent
    agent = TeamDocAgent()
    
    # Mock empty response
    mock_llm_client_instance.generate_text.side_effect = [
        {"choices": []},  # Empty choices should not crash
        {"choices": [{"message": {"content": "Summary of team experience."}}]},
        {"choices": [{"message": {"content": "Summary of team credibility."}}]},
        {"choices": [{"message": {"content": "Summary of documentation strength."}}]},
    ]
    
    result = await agent.generate_team_doc_text([], {})
    
    # Should handle gracefully, not crash
    assert "Team Roles and Responsibilities" in result

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/services/agents/team_doc_agent.py (2)

29-96: Consider extracting a helper function to reduce duplication.

The four LLM call blocks (Team Roles, Team Experience, Team Credibility, Documentation Strength) follow an identical pattern. Extracting this logic into a helper function would improve maintainability and reduce the ~70 lines of repetitive code.

Here's a suggested refactor:

+    async def _generate_section(
+        self, client: LLMClient, template_name: str, section_header: str, **template_kwargs
+    ) -> str:
+        """Helper to generate a single analysis section."""
+        prompt = fill_template(get_template(template_name), **template_kwargs)
+        try:
+            response = await client.generate_text(prompt)
+            choices = response.get("choices") or []
+            content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A"
+            return f"### {section_header}\n{content}\n\n"
+        except Exception as e:
+            orchestrator_logger.error(f"Error generating {section_header}: {e}")
+            return f"### {section_header}\nN/A (Failed to generate {section_header.lower()})\n\n"
+
     async def generate_team_doc_text(self, team_data: List[Dict[str, Any]], doc_data: Dict[str, Any]) -> str:
         """
         Summarizes team roles, experience, credibility, and documentation strength
         using LLM prompts to turn scraped text into a readable analysis.

         Args:
             team_data: A list of dictionaries, each representing a team member's profile.
             doc_data: A dictionary containing extracted whitepaper/documentation details.

         Returns:
             A structured string containing the summarized analysis.
         """
         orchestrator_logger.info("Generating team and documentation analysis using LLM.")
-        summary_parts = []
+        team_data_json = json.dumps(team_data, indent=2)
+        doc_data_json = json.dumps(doc_data, indent=2)

         async with LLMClient() as client:
-            # Summarize Team Roles
-            team_roles_prompt = fill_template(
-                get_template("team_roles_summary"),
-                team_data=json.dumps(team_data, indent=2)
-            )
-            try:
-                team_roles_response = await client.generate_text(team_roles_prompt)
-                choices = team_roles_response.get("choices") or []
-                content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A"
-                summary_parts.append("### Team Roles and Responsibilities\n")
-                summary_parts.append(content)
-            except Exception as e:
-                orchestrator_logger.error(f"Error generating team roles summary: {e}")
-                summary_parts.append("### Team Roles and Responsibilities\n")
-                summary_parts.append("N/A (Failed to generate team roles summary)")
-            summary_parts.append("\n\n")
-
-            # Summarize Team Experience
-            team_experience_prompt = fill_template(
-                get_template("team_experience_summary"),
-                team_data=json.dumps(team_data, indent=2)
-            )
-            try:
-                team_experience_response = await client.generate_text(team_experience_prompt)
-                choices = team_experience_response.get("choices") or []
-                content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A"
-                summary_parts.append("### Team Experience and Expertise\n")
-                summary_parts.append(content)
-            except Exception as e:
-                orchestrator_logger.error(f"Error generating team experience summary: {e}")
-                summary_parts.append("### Team Experience and Expertise\n")
-                summary_parts.append("N/A (Failed to generate team experience summary)")
-            summary_parts.append("\n\n")
-
-            # Summarize Team Credibility
-            team_credibility_prompt = fill_template(
-                get_template("team_credibility_summary"),
-                team_data=json.dumps(team_data, indent=2)
-            )
-            try:
-                team_credibility_response = await client.generate_text(team_credibility_prompt)
-                choices = team_credibility_response.get("choices") or []
-                content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A"
-                summary_parts.append("### Team Credibility\n")
-                summary_parts.append(content)
-            except Exception as e:
-                orchestrator_logger.error(f"Error generating team credibility summary: {e}")
-                summary_parts.append("### Team Credibility\n")
-                summary_parts.append("N/A (Failed to generate team credibility summary)")
-            summary_parts.append("\n\n")
-
-            # Summarize Documentation Strength
-            doc_strength_prompt = fill_template(
-                get_template("documentation_strength_summary"),
-                doc_data=json.dumps(doc_data, indent=2)
-            )
-            try:
-                doc_strength_response = await client.generate_text(doc_strength_prompt)
-                choices = doc_strength_response.get("choices") or []
-                content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A"
-                summary_parts.append("### Documentation Strength\n")
-                summary_parts.append(content)
-            except Exception as e:
-                orchestrator_logger.error(f"Error generating documentation strength summary: {e}")
-                summary_parts.append("### Documentation Strength\n")
-                summary_parts.append("N/A (Failed to generate documentation strength summary)")
-            summary_parts.append("\n\n")
-
-        return "".join(summary_parts)
+            roles = await self._generate_section(
+                client, "team_roles_summary", "Team Roles and Responsibilities", team_data=team_data_json
+            )
+            experience = await self._generate_section(
+                client, "team_experience_summary", "Team Experience and Expertise", team_data=team_data_json
+            )
+            credibility = await self._generate_section(
+                client, "team_credibility_summary", "Team Credibility", team_data=team_data_json
+            )
+            docs = await self._generate_section(
+                client, "documentation_strength_summary", "Documentation Strength", doc_data=doc_data_json
+            )
+
+        return roles + experience + credibility + docs

29-96: Consider parallelizing LLM calls for better performance.

The four LLM calls are currently executed sequentially, which means the total execution time is the sum of all individual call times. Since these calls are independent, they could be executed in parallel using asyncio.gather() to reduce the total time to approximately the duration of the slowest call.

If you implement the helper function refactor suggested above, you can easily parallelize like this:

async with LLMClient() as client:
    results = await asyncio.gather(
        self._generate_section(
            client, "team_roles_summary", "Team Roles and Responsibilities", team_data=team_data_json
        ),
        self._generate_section(
            client, "team_experience_summary", "Team Experience and Expertise", team_data=team_data_json
        ),
        self._generate_section(
            client, "team_credibility_summary", "Team Credibility", team_data=team_data_json
        ),
        self._generate_section(
            client, "documentation_strength_summary", "Documentation Strength", doc_data=doc_data_json
        ),
        return_exceptions=False  # Let exceptions propagate from individual sections
    )

return "".join(results)

Note: You'll need to add import asyncio at the top of the file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 053d9a6 and 4246453.

⛔ Files ignored due to path filters (4)
  • backend/app/services/agents/__pycache__/team_doc_agent.cpython-313.pyc is excluded by !**/*.pyc
  • backend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.4.2.pyc is excluded by !**/*.pyc
  • backend/app/services/nlg/__pycache__/prompt_templates.cpython-313.pyc is excluded by !**/*.pyc
  • backend/logs/app.log is excluded by !**/*.log
📒 Files selected for processing (1)
  • backend/app/services/agents/team_doc_agent.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/agents/team_doc_agent.py (2)
backend/app/services/nlg/llm_client.py (2)
  • LLMClient (9-55)
  • generate_text (30-55)
backend/app/services/nlg/prompt_templates.py (2)
  • get_template (6-104)
  • fill_template (106-111)
🪛 Ruff (0.14.5)
backend/app/services/agents/team_doc_agent.py

41-41: Do not catch blind exception: Exception

(BLE001)


58-58: Do not catch blind exception: Exception

(BLE001)


75-75: Do not catch blind exception: Exception

(BLE001)


92-92: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (3)
backend/app/services/agents/team_doc_agent.py (3)

6-7: LGTM!

The new imports are necessary for the LLM integration and are properly used throughout the generate_team_doc_text method.


14-25: LGTM!

The method signature is well-defined with proper type hints, and the docstring clearly explains the purpose, parameters, and return value.


41-41: Static analysis warning about broad exception catching is acceptable here.

The static analysis tool flags catching bare Exception on these lines. However, this is intentional and appropriate for this use case because:

  1. The LLM client can raise multiple exception types (network errors, API errors, timeouts, etc.)
  2. Each catch block properly logs the specific error for debugging
  3. The method gracefully degrades by returning a clear fallback message
  4. We want to ensure the method completes even if individual sections fail

The current error handling strikes a good balance between robustness and user experience.

Also applies to: 58-58, 75-75, 92-92

@felixjordandev felixjordandev merged commit f006f60 into main Nov 20, 2025
1 check passed
@felixjordandev felixjordandev deleted the feat/summarize-team-docs branch November 20, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants