-
Notifications
You must be signed in to change notification settings - Fork 0
feat: converted mediawiki activities to workflows! #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe MediaWiki ETL workflow has been refactored by splitting the extract, transform, and load steps into three separate child workflows. Each step is now encapsulated in its own class with a dedicated asynchronous Changes
Sequence Diagram(s)sequenceDiagram
participant MainWorkflow
participant ExtractWorkflow
participant TransformWorkflow
participant LoadWorkflow
MainWorkflow->>ExtractWorkflow: execute_child_workflow (extract)
ExtractWorkflow-->>MainWorkflow: Extraction complete
MainWorkflow->>TransformWorkflow: execute_child_workflow (transform)
TransformWorkflow-->>MainWorkflow: Transformation complete
MainWorkflow->>LoadWorkflow: execute_child_workflow (load)
LoadWorkflow-->>MainWorkflow: Load complete
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hivemind_etl/mediawiki/workflows.py (1)
13-14: Consider adding class docstrings for better documentation.While not critical for functionality, adding class docstrings would improve code documentation and address the static analysis warnings.
@workflow.defn +class ExtractMediaWikiWorkflow: + """Workflow for extracting MediaWiki data and storing in S3.""" + @workflow.defn +class TransformMediaWikiWorkflow: + """Workflow for transforming MediaWiki data and storing in S3.""" + @workflow.defn +class LoadMediaWikiWorkflow: + """Workflow for loading transformed MediaWiki data into the database."""Also applies to: 38-39, 68-69
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 14-14: Missing class docstring
(C0115)
[refactor] 14-14: Too few public methods (1/2)
(R0903)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hivemind_etl/mediawiki/workflows.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
hivemind_etl/mediawiki/workflows.py (1)
hivemind_etl/mediawiki/activities.py (3)
extract_mediawiki(56-80)transform_mediawiki_data(84-125)load_mediawiki_data(129-167)
🪛 Pylint (3.3.7)
hivemind_etl/mediawiki/workflows.py
[convention] 14-14: Missing class docstring
(C0115)
[convention] 25-25: Import outside toplevel (hivemind_etl.mediawiki.activities.extract_mediawiki)
(C0415)
[refactor] 14-14: Too few public methods (1/2)
(R0903)
[convention] 39-39: Missing class docstring
(C0115)
[convention] 55-55: Import outside toplevel (hivemind_etl.mediawiki.activities.transform_mediawiki_data)
(C0415)
[refactor] 39-39: Too few public methods (1/2)
(R0903)
[convention] 69-69: Missing class docstring
(C0115)
[convention] 80-80: Import outside toplevel (hivemind_etl.mediawiki.activities.load_mediawiki_data)
(C0415)
[refactor] 69-69: Too few public methods (1/2)
(R0903)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci / build-push / Build + Push Image
🔇 Additional comments (4)
hivemind_etl/mediawiki/workflows.py (4)
13-35: LGTM! Well-structured extract workflow implementation.The extraction workflow is correctly implemented with appropriate timeouts and retry policies. The import placement inside the run method follows Temporal best practices for workflow isolation.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 14-14: Missing class docstring
(C0115)
[convention] 25-25: Import outside toplevel (hivemind_etl.mediawiki.activities.extract_mediawiki)
(C0415)
[refactor] 14-14: Too few public methods (1/2)
(R0903)
38-65: LGTM! Transform workflow correctly returns the S3 key.The transformation workflow properly returns the S3 key which is essential for the subsequent load step. The timeout of 6 hours is appropriate for data transformation operations.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 39-39: Missing class docstring
(C0115)
[convention] 55-55: Import outside toplevel (hivemind_etl.mediawiki.activities.transform_mediawiki_data)
(C0415)
[refactor] 39-39: Too few public methods (1/2)
(R0903)
68-90: LGTM! Load workflow implementation is correct.The load workflow correctly accepts the mediawiki_platform dict containing the transformed_data_key and has appropriate timeout settings for database operations.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 69-69: Missing class docstring
(C0115)
[convention] 80-80: Import outside toplevel (hivemind_etl.mediawiki.activities.load_mediawiki_data)
(C0415)
[refactor] 69-69: Too few public methods (1/2)
(R0903)
128-149: LGTM! Child workflow execution is properly implemented.The refactoring from activities to child workflows is architecturally sound and provides better separation of concerns. The unique workflow IDs per community ensure proper isolation, and the timeout configurations are appropriately preserved.
Summary by CodeRabbit