-
Notifications
You must be signed in to change notification settings - Fork 0
Feat: Add team documentation summarization #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new asynchronous method Changes
Sequence DiagramsequenceDiagram
participant Caller
participant TeamDocAgent
participant PromptTemplates
participant LLMClient
participant Output
Caller->>TeamDocAgent: generate_team_doc_text(team_data, doc_data)
TeamDocAgent->>PromptTemplates: get_template("team_roles_summary")
PromptTemplates-->>TeamDocAgent: template_1
TeamDocAgent->>PromptTemplates: fill_template(template_1, team_data, doc_data)
PromptTemplates-->>TeamDocAgent: prompt_1
TeamDocAgent->>LLMClient: async with LLMClient() -> generate_text(prompt_1)
LLMClient-->>TeamDocAgent: response_1
TeamDocAgent->>PromptTemplates: get_template("team_experience_summary")
PromptTemplates-->>TeamDocAgent: template_2
TeamDocAgent->>PromptTemplates: fill_template(template_2, ...)
PromptTemplates-->>TeamDocAgent: prompt_2
TeamDocAgent->>LLMClient: generate_text(prompt_2)
LLMClient-->>TeamDocAgent: response_2
TeamDocAgent->>PromptTemplates: get_template("team_credibility_summary")
PromptTemplates-->>TeamDocAgent: template_3
TeamDocAgent->>LLMClient: generate_text(prompt_3)
LLMClient-->>TeamDocAgent: response_3
TeamDocAgent->>PromptTemplates: get_template("documentation_strength_summary")
PromptTemplates-->>TeamDocAgent: template_4
TeamDocAgent->>LLMClient: generate_text(prompt_4)
LLMClient-->>TeamDocAgent: response_4
TeamDocAgent->>Output: concatenate headings + response_1..4
Output-->>Caller: final_markdown_text
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
backend/app/services/nlg/prompt_templates.py (1)
106-111: Add error handling for missing template placeholders.The
fill_templatefunction will raiseKeyErrorif a template contains placeholders that aren't provided inkwargs. Consider adding a try-except block to provide a more informative error message.Apply this diff to add error handling:
def fill_template(template: str, **kwargs) -> str: """ Fills a given template with the provided data using keyword arguments. This allows for flexible placeholder names in the template. """ - return template.format(**kwargs) + try: + return template.format(**kwargs) + except KeyError as e: + raise ValueError(f"Missing required template placeholder: {e}") from ebackend/app/services/agents/team_doc_agent.py (2)
30-71: Consider parallelizing independent LLM calls for better performance.The four LLM calls are executed sequentially, meaning total execution time is the sum of all individual call times. Since these calls are independent, they could be parallelized using
asyncio.gather()to significantly reduce total latency.Here's how to parallelize the calls:
async def generate_team_doc_text(self, team_data: List[Dict[str, Any]], doc_data: Dict[str, Any]) -> str: """...""" import asyncio orchestrator_logger.info("Generating team and documentation analysis using LLM.") def extract_content(response: Dict[str, Any]) -> str: """Safely extract content from LLM response.""" choices = response.get("choices", []) if not choices: return "N/A" return choices[0].get("message", {}).get("content", "N/A") try: async with LLMClient() as client: # Prepare all prompts team_data_json = json.dumps(team_data, indent=2) doc_data_json = json.dumps(doc_data, indent=2) prompts = [ fill_template(get_template("team_roles_summary"), team_data=team_data_json), fill_template(get_template("team_experience_summary"), team_data=team_data_json), fill_template(get_template("team_credibility_summary"), team_data=team_data_json), fill_template(get_template("documentation_strength_summary"), doc_data=doc_data_json), ] # Execute all LLM calls in parallel responses = await asyncio.gather( *[client.generate_text(prompt) for prompt in prompts], return_exceptions=True ) # Build output with headings headings = [ "### Team Roles and Responsibilities\n", "### Team Experience and Expertise\n", "### Team Credibility\n", "### Documentation Strength\n" ] summary_parts = [] for heading, response in zip(headings, responses): summary_parts.append(heading) if isinstance(response, Exception): orchestrator_logger.error("LLM call failed: %s", response) summary_parts.append("N/A") else: summary_parts.append(extract_content(response)) summary_parts.append("\n\n") except Exception as e: orchestrator_logger.error("Error generating team and documentation analysis: %s", e) return "Error: Failed to generate team and documentation analysis." return "".join(summary_parts)
32-65: Optional: Cache JSON serialization to avoid redundant work.The
json.dumps(team_data, indent=2)call is repeated three times (lines 34, 44, 54). While not a critical issue, caching this value would eliminate redundant serialization work.Apply this diff:
orchestrator_logger.info("Generating team and documentation analysis using LLM.") summary_parts = [] + team_data_json = json.dumps(team_data, indent=2) + doc_data_json = json.dumps(doc_data, indent=2) async with LLMClient() as client: # Summarize Team Roles team_roles_prompt = fill_template( get_template("team_roles_summary"), - team_data=json.dumps(team_data, indent=2) + team_data=team_data_json ) ... team_experience_prompt = fill_template( get_template("team_experience_summary"), - team_data=json.dumps(team_data, indent=2) + team_data=team_data_json ) ... team_credibility_prompt = fill_template( get_template("team_credibility_summary"), - team_data=json.dumps(team_data, indent=2) + team_data=team_data_json ) ... doc_strength_prompt = fill_template( get_template("documentation_strength_summary"), - doc_data=json.dumps(doc_data, indent=2) + doc_data=doc_data_json )backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (1)
28-33: Remove extraneous comments.The
# Correctedcomments on lines 29-32 suggest these were previously incorrect but should now be removed as they don't add value to understanding the test.Apply this diff:
# Mock LLM responses mock_llm_client_instance.generate_text.side_effect = [ - {"choices": [{"message": {"content": "Summary of team roles."}}]}, # Corrected - {"choices": [{"message": {"content": "Summary of team experience."}}]}, # Corrected - {"choices": [{"message": {"content": "Summary of team credibility."}}]}, # Corrected - {"choices": [{"message": {"content": "Summary of documentation strength."}}]}, # Corrected + {"choices": [{"message": {"content": "Summary of team roles."}}]}, + {"choices": [{"message": {"content": "Summary of team experience."}}]}, + {"choices": [{"message": {"content": "Summary of team credibility."}}]}, + {"choices": [{"message": {"content": "Summary of documentation strength."}}]}, ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
backend/app/services/agents/__pycache__/team_doc_agent.cpython-313.pycis excluded by!**/*.pycbackend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pycbackend/app/services/nlg/__pycache__/prompt_templates.cpython-313.pycis excluded by!**/*.pycbackend/logs/app.logis excluded by!**/*.log
📒 Files selected for processing (3)
backend/app/services/agents/team_doc_agent.py(1 hunks)backend/app/services/agents/tests/test_team_doc_agent_new_feature.py(1 hunks)backend/app/services/nlg/prompt_templates.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/services/agents/team_doc_agent.py (2)
backend/app/services/nlg/llm_client.py (2)
LLMClient(9-55)generate_text(30-55)backend/app/services/nlg/prompt_templates.py (2)
get_template(6-104)fill_template(106-111)
backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (2)
backend/app/services/agents/team_doc_agent.py (2)
TeamDocAgent(10-161)generate_team_doc_text(15-71)backend/app/services/nlg/llm_client.py (1)
generate_text(30-55)
🔇 Additional comments (3)
backend/app/services/nlg/prompt_templates.py (1)
69-102: LGTM! Well-structured prompt templates.The four new templates are clear, well-organized, and appropriately scoped for their respective analysis tasks.
backend/app/services/agents/tests/test_team_doc_agent_new_feature.py (2)
5-26: LGTM! Well-structured test setup.The test correctly mocks
LLMClientas an async context manager and provides appropriate test data for validation.
43-51: Test will fail due to production code bug.This test expects the output to match
expected_output, but the production code atbackend/app/services/agents/team_doc_agent.pylines 38, 48, 58, 68 has an IndexError risk due to unsafe list indexing. While the test's mock data avoids triggering this bug (by providing non-emptychoices), the test doesn't validate the error-handling path or edge cases where the API might return empty responses.Consider adding additional test cases to cover:
- Empty
choiceslist in API response- Missing
messageorcontentfields- LLM client exceptions
Example additional test:
@pytest.mark.asyncio async def test_generate_team_doc_text_handles_empty_response(mocker): mock_llm_client_class = mocker.patch('backend.app.services.agents.team_doc_agent.LLMClient') mock_llm_client_instance = AsyncMock() mock_llm_client_class.return_value.__aenter__.return_value = mock_llm_client_instance from backend.app.services.agents.team_doc_agent import TeamDocAgent agent = TeamDocAgent() # Mock empty response mock_llm_client_instance.generate_text.side_effect = [ {"choices": []}, # Empty choices should not crash {"choices": [{"message": {"content": "Summary of team experience."}}]}, {"choices": [{"message": {"content": "Summary of team credibility."}}]}, {"choices": [{"message": {"content": "Summary of documentation strength."}}]}, ] result = await agent.generate_team_doc_text([], {}) # Should handle gracefully, not crash assert "Team Roles and Responsibilities" in result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/app/services/agents/team_doc_agent.py (2)
29-96: Consider extracting a helper function to reduce duplication.The four LLM call blocks (Team Roles, Team Experience, Team Credibility, Documentation Strength) follow an identical pattern. Extracting this logic into a helper function would improve maintainability and reduce the ~70 lines of repetitive code.
Here's a suggested refactor:
+ async def _generate_section( + self, client: LLMClient, template_name: str, section_header: str, **template_kwargs + ) -> str: + """Helper to generate a single analysis section.""" + prompt = fill_template(get_template(template_name), **template_kwargs) + try: + response = await client.generate_text(prompt) + choices = response.get("choices") or [] + content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A" + return f"### {section_header}\n{content}\n\n" + except Exception as e: + orchestrator_logger.error(f"Error generating {section_header}: {e}") + return f"### {section_header}\nN/A (Failed to generate {section_header.lower()})\n\n" + async def generate_team_doc_text(self, team_data: List[Dict[str, Any]], doc_data: Dict[str, Any]) -> str: """ Summarizes team roles, experience, credibility, and documentation strength using LLM prompts to turn scraped text into a readable analysis. Args: team_data: A list of dictionaries, each representing a team member's profile. doc_data: A dictionary containing extracted whitepaper/documentation details. Returns: A structured string containing the summarized analysis. """ orchestrator_logger.info("Generating team and documentation analysis using LLM.") - summary_parts = [] + team_data_json = json.dumps(team_data, indent=2) + doc_data_json = json.dumps(doc_data, indent=2) async with LLMClient() as client: - # Summarize Team Roles - team_roles_prompt = fill_template( - get_template("team_roles_summary"), - team_data=json.dumps(team_data, indent=2) - ) - try: - team_roles_response = await client.generate_text(team_roles_prompt) - choices = team_roles_response.get("choices") or [] - content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A" - summary_parts.append("### Team Roles and Responsibilities\n") - summary_parts.append(content) - except Exception as e: - orchestrator_logger.error(f"Error generating team roles summary: {e}") - summary_parts.append("### Team Roles and Responsibilities\n") - summary_parts.append("N/A (Failed to generate team roles summary)") - summary_parts.append("\n\n") - - # Summarize Team Experience - team_experience_prompt = fill_template( - get_template("team_experience_summary"), - team_data=json.dumps(team_data, indent=2) - ) - try: - team_experience_response = await client.generate_text(team_experience_prompt) - choices = team_experience_response.get("choices") or [] - content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A" - summary_parts.append("### Team Experience and Expertise\n") - summary_parts.append(content) - except Exception as e: - orchestrator_logger.error(f"Error generating team experience summary: {e}") - summary_parts.append("### Team Experience and Expertise\n") - summary_parts.append("N/A (Failed to generate team experience summary)") - summary_parts.append("\n\n") - - # Summarize Team Credibility - team_credibility_prompt = fill_template( - get_template("team_credibility_summary"), - team_data=json.dumps(team_data, indent=2) - ) - try: - team_credibility_response = await client.generate_text(team_credibility_prompt) - choices = team_credibility_response.get("choices") or [] - content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A" - summary_parts.append("### Team Credibility\n") - summary_parts.append(content) - except Exception as e: - orchestrator_logger.error(f"Error generating team credibility summary: {e}") - summary_parts.append("### Team Credibility\n") - summary_parts.append("N/A (Failed to generate team credibility summary)") - summary_parts.append("\n\n") - - # Summarize Documentation Strength - doc_strength_prompt = fill_template( - get_template("documentation_strength_summary"), - doc_data=json.dumps(doc_data, indent=2) - ) - try: - doc_strength_response = await client.generate_text(doc_strength_prompt) - choices = doc_strength_response.get("choices") or [] - content = choices[0].get("message", {}).get("content", "N/A") if choices else "N/A" - summary_parts.append("### Documentation Strength\n") - summary_parts.append(content) - except Exception as e: - orchestrator_logger.error(f"Error generating documentation strength summary: {e}") - summary_parts.append("### Documentation Strength\n") - summary_parts.append("N/A (Failed to generate documentation strength summary)") - summary_parts.append("\n\n") - - return "".join(summary_parts) + roles = await self._generate_section( + client, "team_roles_summary", "Team Roles and Responsibilities", team_data=team_data_json + ) + experience = await self._generate_section( + client, "team_experience_summary", "Team Experience and Expertise", team_data=team_data_json + ) + credibility = await self._generate_section( + client, "team_credibility_summary", "Team Credibility", team_data=team_data_json + ) + docs = await self._generate_section( + client, "documentation_strength_summary", "Documentation Strength", doc_data=doc_data_json + ) + + return roles + experience + credibility + docs
29-96: Consider parallelizing LLM calls for better performance.The four LLM calls are currently executed sequentially, which means the total execution time is the sum of all individual call times. Since these calls are independent, they could be executed in parallel using
asyncio.gather()to reduce the total time to approximately the duration of the slowest call.If you implement the helper function refactor suggested above, you can easily parallelize like this:
async with LLMClient() as client: results = await asyncio.gather( self._generate_section( client, "team_roles_summary", "Team Roles and Responsibilities", team_data=team_data_json ), self._generate_section( client, "team_experience_summary", "Team Experience and Expertise", team_data=team_data_json ), self._generate_section( client, "team_credibility_summary", "Team Credibility", team_data=team_data_json ), self._generate_section( client, "documentation_strength_summary", "Documentation Strength", doc_data=doc_data_json ), return_exceptions=False # Let exceptions propagate from individual sections ) return "".join(results)Note: You'll need to add
import asyncioat the top of the file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
backend/app/services/agents/__pycache__/team_doc_agent.cpython-313.pycis excluded by!**/*.pycbackend/app/services/agents/tests/__pycache__/test_team_doc_agent_new_feature.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pycbackend/app/services/nlg/__pycache__/prompt_templates.cpython-313.pycis excluded by!**/*.pycbackend/logs/app.logis excluded by!**/*.log
📒 Files selected for processing (1)
backend/app/services/agents/team_doc_agent.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/agents/team_doc_agent.py (2)
backend/app/services/nlg/llm_client.py (2)
LLMClient(9-55)generate_text(30-55)backend/app/services/nlg/prompt_templates.py (2)
get_template(6-104)fill_template(106-111)
🪛 Ruff (0.14.5)
backend/app/services/agents/team_doc_agent.py
41-41: Do not catch blind exception: Exception
(BLE001)
58-58: Do not catch blind exception: Exception
(BLE001)
75-75: Do not catch blind exception: Exception
(BLE001)
92-92: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (3)
backend/app/services/agents/team_doc_agent.py (3)
6-7: LGTM!The new imports are necessary for the LLM integration and are properly used throughout the
generate_team_doc_textmethod.
14-25: LGTM!The method signature is well-defined with proper type hints, and the docstring clearly explains the purpose, parameters, and return value.
41-41: Static analysis warning about broad exception catching is acceptable here.The static analysis tool flags catching bare
Exceptionon these lines. However, this is intentional and appropriate for this use case because:
- The LLM client can raise multiple exception types (network errors, API errors, timeouts, etc.)
- Each catch block properly logs the specific error for debugging
- The method gracefully degrades by returning a clear fallback message
- We want to ensure the method completes even if individual sections fail
The current error handling strikes a good balance between robustness and user experience.
Also applies to: 58-58, 75-75, 92-92
Overview: This PR introduces a new function to summarize team and documentation data, providing a structured analysis for reports.
Changes
generate_team_doc_textwithinbackend/app/services/agents/team_doc_agent.py.backend/app/services/nlg/prompt_templates.pyto support the summarization.Summary by CodeRabbit
New Features
Enhancements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.