Skip to content

Refactor graph provisioning and consolidate subscription handling#331

Merged
jfrench9 merged 7 commits intomainfrom
feature/subscription-improvements
Feb 10, 2026
Merged

Refactor graph provisioning and consolidate subscription handling#331
jfrench9 merged 7 commits intomainfrom
feature/subscription-improvements

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

This PR consolidates graph provisioning and billing subscription management into a more streamlined architecture. The changes eliminate redundant provisioning workflows, improve subscription handling, and enhance rate limiting capabilities across the platform.

Key Accomplishments

Architecture Simplification

  • Removed dedicated provisioning pipeline: Eliminated separate Dagster jobs and sensors for graph provisioning, consolidating this functionality into existing graph management workflows
  • Streamlined subscription handling: Unified billing and graph subscription logic to reduce complexity and improve maintainability
  • Enhanced rate limiting: Expanded rate limiting middleware with additional configuration options and improved monitoring

Billing & Subscription Improvements

  • Consolidated subscription workflows: Merged graph subscription management into the main billing flow, eliminating duplicate subscription handling logic
  • Improved webhook processing: Enhanced admin webhook handling for better subscription lifecycle management
  • Enhanced audit logging: Strengthened billing audit trails with improved logging for subscription events
  • Simplified checkout flow: Streamlined checkout process by removing redundant offering validation steps

API & Router Enhancements

  • New graph subscription endpoints: Added dedicated subscription management endpoints under the graphs router
  • Simplified graph management: Reduced complexity in main graph router by consolidating provisioning logic
  • Enhanced SSE monitoring: Improved real-time monitoring capabilities for direct graph connections

Breaking Changes

  • Removed provisioning-specific endpoints: Applications relying on dedicated provisioning APIs will need to migrate to the consolidated graph management endpoints
  • Environment variable cleanup: Removed unused provisioning-related environment variables
  • Modified webhook signatures: Admin webhook handlers have updated signatures that may affect external integrations

Testing Notes

  • Comprehensive test suite updates: Updated 631 test cases to reflect the new consolidated architecture
  • Removed provisioning tests: Eliminated 192 dedicated provisioning tests as functionality is now covered by existing graph management tests
  • Enhanced integration testing: Strengthened webhook and subscription flow testing

Infrastructure Considerations

  • Reduced Dagster complexity: Simplified pipeline architecture reduces resource overhead and improves maintainability
  • Environment cleanup: Removed unused configuration variables and streamlined setup processes
  • Enhanced monitoring: Improved observability for subscription and billing workflows
  • Rate limiting enhancements: New rate limiting configurations may require infrastructure review for optimal performance

This refactoring significantly reduces codebase complexity while maintaining all core functionality, resulting in a net reduction of over 1,000 lines of code through consolidation and elimination of redundant systems.


🤖 Generated with Claude Code

Branch Info:

  • Source: feature/subscription-improvements
  • Target: main
  • Type: feature

Co-Authored-By: Claude noreply@anthropic.com

- Removed the DIRECT_GRAPH_PROVISIONING_ENABLED parameter from configuration files and code to streamline provisioning logic.
- Added new functionality to create billing subscriptions for graphs and entity graphs directly within the provisioning process.
- Updated rate limiting for billing endpoints to ensure generous limits during checkout.
- Cleaned up unused provisioning jobs and sensors to enhance code maintainability and clarity.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude bot commented Feb 10, 2026

Claude finished @jfrench9's task in 2m 6s —— View job


Code Review Complete ✅

Reviewing PR #331: Refactor graph provisioning and consolidate subscription handling


Summary

This is a substantial architectural refactoring that successfully consolidates graph provisioning and billing subscription handling. The changes eliminate redundant systems, improve maintainability, and provide a cleaner separation of concerns. The PR achieves its stated goals with a net reduction of over 1,000 lines of code.

✅ Strengths

1. Excellent Consolidation

  • Successfully eliminated the separate provisioning pipeline (deleted dagster/jobs/provisioning.py and dagster/sensors/provisioning.py)
  • Consolidated provisioning logic into direct execution via middleware/sse/direct_monitor.py
  • Removed ~600 lines from dagster/jobs/graph.py by moving graph creation jobs to direct execution
  • Tests properly updated to reflect architectural changes (no more Dagster mocking for provisioning)

2. Improved Webhook Handling (routers/admin/webhooks.py)

  • Enhanced webhook processing with proper background task execution
  • Added setup_intent.succeeded handler for payment method tracking
  • Improved subscription cancellation handling with portal reactivation support
  • Better idempotency checking and audit logging

3. Better Subscription Management (dagster/jobs/billing.py)

  • Enhanced _handle_subscription_updated with three clear scenarios:
    • Portal cancellation (cancel_at_period_end=true)
    • Portal reactivation (removing pending cancellation)
    • Standard status transitions
  • Proper period date syncing from Stripe
  • Metadata preservation (checkout session ID saved before overwriting)

4. Enhanced Rate Limiting

  • Added billing_rate_limit_dependency for checkout/payment flows
  • Clear documentation that billing operations should never be blocked
  • Proper separation of billing-specific rate limits

5. Improved Checkout Flow (routers/billing/checkout.py)

  • Cleanup of orphaned pending_payment subscriptions before creating new ones
  • Fallback lookup by checkout_session_id in metadata after webhook processing
  • Better error handling and status descriptions

⚠️ Issues & Concerns

1. Audit Log Workaround (robosystems/models/billing/audit_log.py:284-299)

Severity: Medium

The mark_webhook_processed method includes a workaround for webhooks without entity references:

# Many webhook events (invoice.created, invoice.paid) have no metadata
# with entity references. log_event() requires at least one FK reference,
# but actor_user_id is a FK to users — we can't fake it. Create directly.
if not any([subscription_id, actor_user_id]):
    audit_log = cls(
        event_type=BillingEventType.WEBHOOK_RECEIVED.value,
        description=f"{provider} webhook: {event_type}",
        actor_type=f"{provider}_webhook",
        event_data=stored_event_data,
    )
    session.add(audit_log)
    session.commit()

Issue: This bypasses the standard log_event() method which provides consistency and validation. While the comment explains why, this creates two paths for audit logging.

Recommendation: Consider refactoring log_event() to make FK references truly optional, or create a dedicated log_webhook_event() method that handles this case properly.

2. Direct Provisioning Environment Variable Removed

Severity: Low

The PR removes DIRECT_GRAPH_PROVISIONING_ENABLED from env.py and aws.sh, suggesting all provisioning is now direct. However:

# From billing.py:553-557
async def _trigger_resource_provisioning(...):
    """Trigger resource provisioning after payment confirmation.
    
    Directly provisions the resource, eliminating sensor polling delay and
    ECS cold start.
    """

Observation: The code now always uses direct provisioning, which is good for consistency. However, the migration path isn't documented.

Recommendation: Consider adding a note in the PR description or CLAUDE.md about this architectural decision and that there's no rollback option to sensor-based provisioning.

3. Missing Error Handling (routers/graphs/subscriptions.py:390-397)

Severity: Medium

When repository provisioning fails, the subscription is marked as failed, but there's no rollback of the Stripe subscription:

except Exception as e:
    logger.error(f"Repository provisioning failed: {e}")
    failed_sub = db.query(BillingSubscription).filter_by(id=subscription_id).first()
    if failed_sub:
        failed_sub.status = "failed"
        db.commit()
    raise HTTPException(status_code=500, detail=f"Repository provisioning failed: {e}")

Issue: If provisioning fails after creating a Stripe subscription, the customer will be charged but won't get access. The Stripe subscription should be canceled.

Recommendation: Add Stripe subscription cleanup in the error path:

if failed_sub and failed_sub.stripe_subscription_id:
    try:
        provider = get_payment_provider("stripe")
        provider.cancel_subscription(failed_sub.stripe_subscription_id)
    except Exception as cancel_error:
        logger.error(f"Failed to cancel Stripe subscription: {cancel_error}")

4. Rate Limit Category Addition (config/rate_limits.py:38)

Severity: Low

Added BILLING category with comment "never block paying customers", which is philosophically good. However:

EndpointCategory.BILLING: (60, RateLimitPeriod.MINUTE),  # Never block payments

Observation: The limit is still 60/minute for free tier. While appropriate, the comment "never block" is slightly misleading since there IS still a rate limit (just a generous one).

Recommendation: Consider rewording comment to "Generous limits to avoid blocking legitimate payment flows" or similar.

5. Test Coverage Gap

Severity: Medium

The PR removes tests/dagster/test_provisioning.py (192 tests) but the provisioning logic now lives in middleware/sse/direct_monitor.py.

Observation: I don't see new tests specifically for run_graph_provisioning() and run_user_repository_provisioning() in the direct_monitor module.

Recommendation: Consider adding integration tests for the direct provisioning functions to ensure the same functionality is tested, just in a different location.

6. Webhook Handler Context Object (routers/admin/webhooks.py:17-26)

Severity: Low (code smell)

The _WebhookLogContext class is a lightweight stand-in for Dagster's OpExecutionContext:

class _WebhookLogContext:
    """Lightweight stand-in for Dagster's OpExecutionContext.
    
    The billing handler functions use context.log.info/warning/error.
    This routes those calls to our standard logger.
    """
    def __init__(self) -> None:
        self.log = logger

Issue: This creates a coupling where webhook handlers expect a Dagster-like context object. If the billing handlers change their logging API, this breaks.

Recommendation: Consider refactoring the billing handlers to accept a standard logger instead of a context object, or create a proper LogContext protocol/ABC.


🔍 Minor Observations

  1. Justfile Changes: Substantial reformatting (168 line changes) but functionality appears preserved. Good cleanup.

  2. Subscription Metadata Handling: Good use of JSONB dict creation to force SQLAlchemy change detection (billing.py:223-226).

  3. Period Date Syncing: Excellent addition of Stripe period syncing in _handle_subscription_updated (billing.py:418-424).

  4. Checkout Session Cleanup: Smart addition of orphaned subscription cleanup before creating new checkouts (checkout.py:123-140).


📋 Breaking Changes Review

The PR correctly identifies breaking changes:

✅ Confirmed Breaking Changes:

  1. Removed provisioning endpoints - Correctly removed dedicated Dagster provisioning jobs
  2. Environment variable cleanup - DIRECT_GRAPH_PROVISIONING_ENABLED removed (now always true)
  3. Webhook signatures - Admin webhook handlers have new signatures (added setup_intent handling)

⚠️ Migration Considerations:

  • Any external systems calling provisioning-specific APIs will need updates
  • Deployments should ensure the new direct provisioning flow is working before removing old environment variables
  • Webhook testing should verify all event types still work correctly

🎯 Recommendations

High Priority:

  1. Fix Stripe subscription cleanup on provisioning failure (Issue Enhance SEC pipeline security and add reconsolidation support #3 above)
  2. Add tests for direct provisioning functions (Issue Remove unused Account and Disclosure transaction nodes from graph schema #5 above)

Medium Priority:

  1. Refactor audit log webhook handling (Issue Inital open source release of robosystems #1 above)
  2. Document the architectural decision to remove sensor-based provisioning

Low Priority:

  1. Refactor webhook context object to use standard logger
  2. Clarify rate limit comments for billing endpoints

✅ Verdict

APPROVE with minor concerns

This is a well-executed refactoring that significantly simplifies the codebase architecture. The consolidation of provisioning logic and subscription handling is clean and maintainable. The issues identified are mostly minor and can be addressed in follow-up PRs.

The code quality is high, the tests are appropriately updated, and the PR achieves its stated goals. The net reduction of 1,000+ lines while maintaining functionality is impressive.

Suggested merge strategy: Merge after addressing Issue #3 (Stripe subscription cleanup on failure) to prevent customer billing issues.


- Enhanced error handling in the graph provisioning and user repository provisioning processes to cancel Stripe subscriptions when provisioning fails.
- Added a new abstract method `cancel_subscription` in the PaymentProvider class and implemented it in the StripePaymentProvider.
- Updated tests to verify that Stripe subscriptions are canceled correctly upon provisioning failures, ensuring customers are not charged for unsuccessful operations.
…Provider class to simplify product naming in non-production environments.
- Introduced a new demo-sec-query target in the justfile to run preset queries for the SEC demo.
- The target allows for conditional execution based on the 'all' parameter, enhancing demo capabilities.
…oml and uv.lock files for improved functionality and compatibility.
- Updated the logic for retrieving the user ID from credentials to make it optional, allowing authentication with just the API key.
- Simplified the output by removing the error message for missing user ID and adjusting the print statements for clarity.
- Introduced a new function `_get_plan_display_name` to convert internal plan names into human-readable display names based on resource type (graph or repository).
- Updated the `subscription_to_response` function to include the display name in the response, enhancing clarity for subscription details.
@jfrench9 jfrench9 merged commit 7a76887 into main Feb 10, 2026
7 checks passed
@jfrench9 jfrench9 deleted the feature/subscription-improvements branch February 13, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant