-
Notifications
You must be signed in to change notification settings - Fork 0
feat: updated to get latest summary docs! #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update refactors the handling of collection identification within the summarization workflow and activities. Instead of passing a constructed collection name string, the code now separately manages Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Workflow
participant Activities
participant Qdrant
participant CustomPipeline
User->>Workflow: Start summary fetch (with platform_name, community_id, [date])
Workflow->>Activities: get_platform_name(input)
Activities-->>Workflow: platform_name
alt date is provided
Workflow->>Activities: fetch_telegram_summaries_by_date(platform_name, community_id, date)
Activities->>Qdrant: Query summaries for date
Qdrant-->>Activities: Return summaries
else no date provided
Activities->>CustomPipeline: Get latest date for [platform_name]_summary
CustomPipeline-->>Activities: latest_date
Activities->>Qdrant: Query summaries for latest_date
Qdrant-->>Activities: Return summaries
end
Activities-->>Workflow: Return summaries
Workflow-->>User: Return result
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
hivemind_summarizer/activities.py (3)
239-241: Update error message to match the new collection name construction.Since the collection name is now constructed internally, the error message should reference the constructed value.
logging.info( - f"Fetching summaries for date range: {start_date} to {end_date} from collection: {collection_name}" + f"Fetching summaries for date range: {start_date} to {end_date} from collection: {community_id}_{input.platform_name}_summary" )
274-277: Update error message in the exception handler.The error message still references
collection_namedirectly, but it should use the constructed collection name for consistency.logging.error( - f"Error fetching summaries for date range {start_date} to {end_date} from collection {collection_name}: {str(e)}" + f"Error fetching summaries for date range {start_date} to {end_date} from collection {community_id}_{input.platform_name}_summary: {str(e)}" )
104-106: Update docstrings to reflect parameter changes.The docstrings for both
fetch_telegram_summaries_by_dateandfetch_telegram_summaries_by_date_rangestill mentioncollection_namein the parameter descriptions, but the schema now uses separateplatform_nameandcommunity_idfields.Update the docstrings to reflect the new parameter structure:
Parameters ---------- input : TelegramSummariesActivityInput - Input object containing date, collection_name and extract_text_only + Input object containing date, platform_name, community_id and extract_text_onlyParameters ---------- input : TelegramSummariesRangeActivityInput - Input object containing start_date, end_date, collection_name and extract_text_only + Input object containing start_date, end_date, platform_name, community_id and extract_text_onlyAlso applies to: 218-220
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
hivemind_summarizer/activities.py(8 hunks)hivemind_summarizer/schema.py(2 hunks)hivemind_summarizer/workflows.py(4 hunks)registry.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
registry.py (1)
hivemind_summarizer/activities.py (1)
get_platform_name(45-92)
hivemind_summarizer/workflows.py (1)
hivemind_summarizer/activities.py (1)
get_platform_name(45-92)
hivemind_summarizer/activities.py (1)
hivemind_summarizer/schema.py (1)
TelegramGetCollectionNameInput(19-21)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / test / Test
- GitHub Check: ci / lint / Lint
🔇 Additional comments (14)
registry.py (2)
15-15: Import updated to match renamed activity function.The change correctly updates the import statement to use
get_platform_nameinstead of the previousget_collection_name, aligning with the renamed activity function in the activities module.
45-45: Export list updated to include the renamed activity.The
ACTIVITIESlist is properly updated to export the renamedget_platform_nameactivity, ensuring consistency with the import changes above.hivemind_summarizer/workflows.py (4)
12-12: Import updated to reflect activity function renaming.The import statement has been correctly updated to use the renamed activity function
get_platform_name.
57-59: Variable and function name updated to reflect new functionality.The renamed variable (
platform_nameinstead ofcollection_name) and activity function call (get_platform_name) properly reflect that the function now returns only the platform name rather than constructing a collection name string.
72-75: Inputs correctly separated to match the new schema design.The activity input has been properly updated to pass
platform_nameandcommunity_idseparately instead of a combinedcollection_name, aligning with the schema changes.
88-90: Consistent implementation of parameter changes in date range functionality.The date range activity call has been updated with the same parameter structure as the single date function, ensuring consistency across the codebase.
hivemind_summarizer/schema.py (3)
5-8: Schema updated to support optional date and separate identifiers.The changes to
TelegramSummariesActivityInput:
- Make
dateoptional with a default ofNoneto support fetching the latest date- Replace
collection_namewith separateplatform_nameandcommunity_idfieldsThese changes enable the new functionality for fetching the latest summary when no date is provided.
15-16: Range activity input updated for consistency.The
TelegramSummariesRangeActivityInputclass has been updated with the same pattern of separateplatform_nameandcommunity_idfields, ensuring consistency across all input models.
27-27: Workflow input updated to support optional start date.Making
start_dateoptional with a default ofNonealigns with the changes in the activity input and allows the workflow to fetch the latest summary when no specific date is provided.hivemind_summarizer/activities.py (5)
9-13: New imports added to support latest date retrieval.The imports for
CustomIngestionPipelineandmodelsfromqdrant_client.httpare necessary additions to support the new functionality for retrieving the latest document date when no specific date is provided.
45-88: Activity renamed and return value simplified.The activity has been properly renamed from
get_collection_nametoget_platform_nameand now returns only the platform name instead of constructing a collection name. The docstring has been updated to reflect this change.
114-120: Collection name construction moved inside the activity.The activity now correctly constructs the collection name internally using the provided
community_idandplatform_name, and validates thatplatform_nameis provided. This aligns with the refactoring approach of managing these identifiers separately.
131-170: Added support for fetching latest date when none is provided.This new logic implements a key feature of this PR - the ability to retrieve the latest available summary when no specific date is provided. It uses
CustomIngestionPipelineto get the latest document date from the collection and then queries for summaries on that date.
265-267: Input parameters properly updated in date range function.The
fetch_telegram_summaries_by_datecall correctly passesplatform_nameandcommunity_idseparately as required by the updated schema.
Summary by CodeRabbit
New Features
Bug Fixes
Chores