Refactor provisioning service and extract worker tasks#565
Conversation
- Moved provisioning logic from middleware to a dedicated provisioning service, enhancing code organization and clarity. - Updated worker tasks to handle graph provisioning and materialization, replacing previous direct monitoring patterns. - Introduced autoscaling parameters for worker tasks in the deployment configuration, allowing for dynamic scaling based on workload. - Cleaned up imports and adjusted related tests to reflect the new structure, ensuring comprehensive coverage and validation of the provisioning process.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 2m 12s —— View job PR Review: Refactor provisioning service and extract worker tasksTasks:
This is a solid architectural improvement that significantly reduces coupling between middleware, routers, and long-running operations. The net -126 lines speaks for itself. A few issues below ranging from a rule violation to a runtime bug worth addressing before merge. Issues1.
|
- Replaced direct lock release methods in `DagsterJobMonitorTask` and `GraphMaterializationTask` with a shared `release_lock` method from the `BaseTask` class, promoting code reuse and consistency. - Updated the Dagster report timeout test to ensure it checks for a positive float value instead of a fixed default, enhancing test robustness. - Cleaned up comments and import statements for clarity and maintainability.
- Added error handling in `graph_creation_queue_sensor` to revert graph status to QUEUED upon enqueue failure, improving resilience and logging for failed tasks. - Updated `worker_inflight_reaper_sensor` to include detailed metadata in the dead-letter queue (DLQ) for tasks that exceed retry limits, aiding in troubleshooting and monitoring. - Refactored `DagsterJobMonitorTask` to support cancellation during job monitoring, allowing for better control over long-running tasks and improved user experience. - Cleaned up code and comments for clarity and maintainability.
…ponse - Introduced a custom JSON serialization function to handle date and datetime objects, ensuring they are properly formatted in the streaming response. - Updated the `generate_stream` function to utilize the new serialization method, improving the handling of large result sets.
Summary
Refactors the Dagster job handling and graph provisioning architecture by extracting core operations into dedicated worker tasks and consolidating the provisioning logic into a standalone service. The SSE direct monitor is restructured into a proper
ProvisioningServiceunder the operations layer, and long-running concerns (Dagster monitoring, graph materialization, subgraph creation) are broken out into discrete, independently deployable worker tasks.Key Accomplishments
Architecture Restructuring
middleware/sse/direct_monitor.py→operations/graph/provisioning_service.py, significantly reducing its scope and complexity (~509 lines changed, net reduction in code)dagster_monitoring.py— Handles Dagster job status polling and lifecycle trackinggraph_materialization.py— Manages graph materialization workflows as background taskssubgraph_creation.py— Encapsulates subgraph creation/provisioning logicmaterialize.py,subgraphs/main.py,backups/backup.py,backups/restore.py) by delegating heavy lifting to the new worker tasks and provisioning serviceInfrastructure
cloudformation/worker.yaml) with additional resource definitions to support the new task-based architecture (+107 lines).github/workflows/deploy-dagster.yml) to align with the refactored job structureCleanup
robosystems/middleware/sse/__init__.pyrobosystems/worker/constants.pyBreaking Changes
middleware/sse/direct_monitormodule has been removed and relocated tooperations/graph/provisioning_service. Any external references to the old import path will break.robosystems.middleware.sseshould verify their imports still resolve.Testing Notes
tests/middleware/sse/test_direct_monitor.pyhave been substantially refactored (~173 lines changed) to align with the newProvisioningServiceinterface and locationdagster_monitoring,graph_materialization,subgraph_creation) should have integration tests added in follow-up work to validate task execution, retry behavior, and failure handlingInfrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
feature/worker-tasksmainCo-Authored-By: Claude noreply@anthropic.com