-
Notifications
You must be signed in to change notification settings - Fork 0
Feat: solving issues #37 & #38! #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added RealTimeSummaryWorkflow to fetch and summarize recent data. - Introduced fetch_and_summarize_realtime_data activity for real-time data processing. - Updated schema to include RealTimeSummaryWorkflowInput for input parameters. - Refactored existing activities to use platform_id instead of platform_name. - Enhanced logging and error handling for better traceability.
|
Warning Rate limit exceeded@amindadgar has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 21 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
""" WalkthroughThe changes remove the use of platform names in favor of platform IDs across activities, schemas, and workflows. A new real-time summarization workflow and activity are introduced, enabling the retrieval and summarization of recent data from Qdrant using OpenAI. Registry and import statements are updated to reflect these modifications and new components. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TemporalWorkflow
participant Activity
participant Qdrant
participant OpenAI
Client->>TemporalWorkflow: Start RealTimeSummaryWorkflow(input)
TemporalWorkflow->>Activity: fetch_and_summarize_realtime_data(input)
Activity->>Qdrant: Query recent data points (by period/date/collection)
Qdrant-->>Activity: Return data points
Activity->>OpenAI: Request summary (chat completion)
OpenAI-->>Activity: Return summary text
Activity-->>TemporalWorkflow: Return summary
TemporalWorkflow-->>Client: Return summary
Possibly related PRs
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (6)
workflows.py (1)
16-17: Address unused imports flagged by Ruff.The imports for
PlatformSummariesWorkflowandRealTimeSummaryWorkfloware flagged as unused by the static analyzer, but they appear to be re-exported for use inregistry.py.Either use these imports directly within this module or add a comment to explicitly indicate they are re-exported:
-from hivemind_summarizer.summarizer_workflow import PlatformSummariesWorkflow -from hivemind_summarizer.real_time_summary_workflow import RealTimeSummaryWorkflow +# These imports are re-exported for use in registry.py +from hivemind_summarizer.summarizer_workflow import PlatformSummariesWorkflow +from hivemind_summarizer.real_time_summary_workflow import RealTimeSummaryWorkflowAlternatively, consider using
__all__to explicitly declare the exported names:+__all__ = [ + "PlatformSummariesWorkflow", + "RealTimeSummaryWorkflow", + "CommunityWebsiteWorkflow", + "WebsiteIngestionSchedulerWorkflow", + "MediaWikiETLWorkflow", + "IngestionWorkflow", +]🧰 Tools
🪛 Ruff (0.8.2)
16-16:
hivemind_summarizer.summarizer_workflow.PlatformSummariesWorkflowimported but unusedRemove unused import:
hivemind_summarizer.summarizer_workflow.PlatformSummariesWorkflow(F401)
17-17:
hivemind_summarizer.real_time_summary_workflow.RealTimeSummaryWorkflowimported but unusedRemove unused import:
hivemind_summarizer.real_time_summary_workflow.RealTimeSummaryWorkflow(F401)
hivemind_summarizer/real_time_summary_workflow.py (1)
36-41: Consider configuring more robust retry policies.The current retry policy is good, but for interactions with external services (OpenAI), you might want to add handling for specific exceptions.
Consider enhancing the retry policy with exception filtering:
retry_policy=RetryPolicy( maximum_attempts=3, initial_interval=timedelta(seconds=1), maximum_interval=timedelta(minutes=1), backoff_coefficient=2.0, + non_retryable_error_types=["ValueError", "TypeError"], ),hivemind_summarizer/summarizer_workflow.py (1)
36-36: Update docstring to reflect removed parameter.The docstring still mentions
platform_namewhich has been removed from the input model.- Input containing platform_id, community_id, start_date, end_date, extract_text_only and platform_name + Input containing platform_id, community_id, start_date, end_date, and extract_text_onlyhivemind_summarizer/activities.py (3)
185-189: Error message mentions “Platform name” instead ofplatform_id.The parameter was renamed, but the message was not:
- raise ValueError("Platform name is required but was not provided") + raise ValueError("platform_id is required but was not provided")Keeping wording consistent prevents confusion during debugging.
333-343: Risk of exceeding model context length; consider chunked summarisation.
combined_textconcatenates up to 500 payloads and feeds them directly to GPT-4o. Large communities can easily breach the model’s context window and generate “context length exceeded” errors.Recommended pattern:
- Summarise in batches (e.g., 2-3 k token chunks) locally.
- Feed the intermediate summaries into a final “summary of summaries” call.
This keeps calls deterministic, reduces cost, and avoids runtime failures.
303-306: ```diff
try:content = json.loads(content)except json.JSONDecodeError:pass
from contextlib import suppresswith suppress(json.JSONDecodeError):content = json.loads(content)Adopting `contextlib.suppress` both shortens the code and follows Ruff’s SIM105 recommendation. <details> <summary>🧰 Tools</summary> <details> <summary>🪛 Ruff (0.8.2)</summary> 303-306: Use `contextlib.suppress(json.JSONDecodeError)` instead of `try`-`except`-`pass` Replace with `contextlib.suppress(json.JSONDecodeError)` (SIM105) </details> </details> </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: CodeRabbit UI** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 525bae07317f570e1402f05bdb13a736a1da69f2 and 40ac8ffcaa4233f3a26ffea7ed7a438f4d272029. </details> <details> <summary>📒 Files selected for processing (6)</summary> * `hivemind_summarizer/activities.py` (7 hunks) * `hivemind_summarizer/real_time_summary_workflow.py` (1 hunks) * `hivemind_summarizer/schema.py` (1 hunks) * `hivemind_summarizer/summarizer_workflow.py` (2 hunks) * `registry.py` (3 hunks) * `workflows.py` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧬 Code Graph Analysis (3)</summary> <details> <summary>workflows.py (2)</summary><blockquote> <details> <summary>hivemind_summarizer/summarizer_workflow.py (1)</summary> * `PlatformSummariesWorkflow` (21-82) </details> <details> <summary>hivemind_summarizer/real_time_summary_workflow.py (1)</summary> * `RealTimeSummaryWorkflow` (11-44) </details> </blockquote></details> <details> <summary>hivemind_summarizer/real_time_summary_workflow.py (2)</summary><blockquote> <details> <summary>hivemind_summarizer/activities.py (1)</summary> * `fetch_and_summarize_realtime_data` (228-349) </details> <details> <summary>hivemind_summarizer/schema.py (1)</summary> * `RealTimeSummaryWorkflowInput` (27-34) </details> </blockquote></details> <details> <summary>hivemind_summarizer/activities.py (1)</summary><blockquote> <details> <summary>hivemind_summarizer/schema.py (3)</summary> * `PlatformSummariesActivityInput` (4-8) * `PlatformSummariesRangeActivityInput` (11-16) * `RealTimeSummaryWorkflowInput` (27-34) </details> </blockquote></details> </details><details> <summary>🪛 Ruff (0.8.2)</summary> <details> <summary>workflows.py</summary> 16-16: `hivemind_summarizer.summarizer_workflow.PlatformSummariesWorkflow` imported but unused Remove unused import: `hivemind_summarizer.summarizer_workflow.PlatformSummariesWorkflow` (F401) --- 17-17: `hivemind_summarizer.real_time_summary_workflow.RealTimeSummaryWorkflow` imported but unused Remove unused import: `hivemind_summarizer.real_time_summary_workflow.RealTimeSummaryWorkflow` (F401) </details> <details> <summary>hivemind_summarizer/activities.py</summary> 303-306: Use `contextlib.suppress(json.JSONDecodeError)` instead of `try`-`except`-`pass` Replace with `contextlib.suppress(json.JSONDecodeError)` (SIM105) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms (1)</summary> * GitHub Check: ci / test / Test </details> <details> <summary>🔇 Additional comments (7)</summary><blockquote> <details> <summary>hivemind_summarizer/real_time_summary_workflow.py (1)</summary> `1-44`: **LGTM! Well-structured workflow implementation.** This new workflow is well-implemented with proper documentation, error handling, and retry policies. The code follows the established workflow patterns in the codebase. </details> <details> <summary>registry.py (4)</summary> `16-16`: **LGTM! New activity registration.** The new activity `fetch_and_summarize_realtime_data` is properly registered. --- `24-24`: **LGTM! New workflow registration.** The new `RealTimeSummaryWorkflow` is properly registered. --- `33-33`: **LGTM! Added workflow to registry list.** The `RealTimeSummaryWorkflow` is correctly added to the `WORKFLOWS` list. --- `49-49`: **LGTM! Added activity to registry list.** The `fetch_and_summarize_realtime_data` activity is correctly added to the `ACTIVITIES` list. </details> <details> <summary>hivemind_summarizer/summarizer_workflow.py (2)</summary> `60-60`: **LGTM! Successfully migrated from platform_name to platform_id.** The parameter has been correctly updated to use `platform_id` directly instead of resolving the platform name. --- `75-75`: **LGTM! Successfully migrated from platform_name to platform_id.** The parameter has been correctly updated to use `platform_id` directly instead of resolving the platform name. </details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
README.md (2)
15-15: New Real-Time Summaries bullet
The addition of the Real-Time Summaries bullet aligns well with the new feature. Consider adding a hyperlink to theRealTimeSummaryWorkflow(e.g.,[RealTimeSummaryWorkflow](src/hivemind_summarizer/workflows/real_time_summary.py)) for quick navigation.
88-121: Refine Real-Time Summary example
The new example for Real-Time Summaries is comprehensive. A few nitpicks for clarity and grammar:
- Fix typo in Note 2: change “tha raw data” to “the raw data”.
- Rephrase the note to:
Note: Provide either a
collection_namefilter or bothplatform_idandcommunity_id.- To avoid repeating “Use this when,” consider varying the phrasing (e.g., “Use this workflow if you want to …”).
🧰 Tools
🪛 LanguageTool
[style] ~120-~120: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ries for recent data. Use this when you want to create fresh summaries for the specifie...(REP_WANT_TO_VB)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(3 hunks)hivemind_summarizer/schema.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- hivemind_summarizer/schema.py
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~120-~120: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ries for recent data. Use this when you want to create fresh summaries for the specifie...
(REP_WANT_TO_VB)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / lint / Lint
- GitHub Check: ci / test / Test
🔇 Additional comments (1)
README.md (1)
55-87: Platform Summaries example updated
The code snippet correctly replacesplatform_name/community_namewithplatform_id/community_idand clarifies that it fetches existing summaries from Qdrant. The formatting and parameter descriptions are clear and consistent.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes