Skip to content

Integrate compute service with Milo quota system #90

@scotwells

Description

@scotwells

Parent Issue

Tracked by datum-cloud/enhancements#682 (Launch Workload Compute Service — "UFOs")

Summary

Integrate the compute service with the Milo quota service (quota.miloapis.com) so that resource consumption is tracked and enforced per project. This covers Instance, WorkloadDeployment, and the underlying compute dimensions (instance count, vCPUs, memory, and eventually GPUs).

Motivation

Without quota integration, there is no mechanism to:

  • Enforce per-project resource limits set by the pricing tier
  • Give users visibility into how much of their quota they are consuming
  • Prevent runaway workloads from monopolizing platform capacity
  • Gate new projects to safe default limits during onboarding

Resources to Quota

The following resource dimensions need ResourceRegistration entries:

Resource Type Unit Description
compute.datumapis.com/instances count Total running instances per project
compute.datumapis.com/vcpus millicores Total vCPU allocation across all instances
compute.datumapis.com/memory MiB Total memory allocation across all instances
compute.datumapis.com/gpus count GPU units (future — phase 2)

Integration Pattern

Compute should use Pattern 2: Service-Managed Claims (not admission-blocking ClaimCreationPolicy). Reasons:

  • WorkloadDeployment creates Instance objects ahead of scheduling — creation should not be blocked, but provisioning should be gated
  • Quota denial state needs to be visible to users as a resource condition, not a silent API error
  • Auto-scaling scenarios need instances to exist in a pending state while quota is being negotiated

The instance controller creates a ResourceClaim when an Instance is created, watches for Granted/Denied on the claim, and gates actual VM provisioning on claim grant.

The ConsumerRef on each ResourceClaim will reference the Project the instance belongs to.

User Experience: Quota Exceeded States

This is the primary UX concern to resolve before implementation.

Scenario 1: User creates an Instance that exceeds their quota

$ kubectl apply -f my-instance.yaml
instance.compute.datumapis.com/my-instance created   # Creation succeeds

The instance is created, but it immediately surfaces a non-ready condition:

$ kubectl get instance my-instance
NAME          READY   REASON          NETWORK IP   EXTERNAL IP
my-instance   False   QuotaExceeded                
$ kubectl describe instance my-instance
...
Conditions:
  Type           Status  Reason          Message
  ----           ------  ------          -------
  QuotaGranted   False   QuotaExceeded   Quota for compute.datumapis.com/vcpus exceeded: 
                                         requested 4 vCPUs, 0 available (16/16 vCPUs used)
  Programmed     False   PendingQuota    Waiting for quota claim to be granted
  Running        False   PendingQuota    Waiting for quota claim to be granted
  Ready          False   PendingQuota    Waiting for quota claim to be granted

Open questions:

  • Should the instance be auto-deleted after some TTL when quota is denied, or left in place for the user to clean up?
  • Should we surface a link/annotation pointing the user to quota management UI or docs?

Scenario 2: WorkloadDeployment scales up and partially exceeds quota

If a WorkloadDeployment has replicas: 10 but only 3 additional instances fit within quota:

$ kubectl get workloaddeployment my-wld
NAME      REPLICAS  READY  DESIRED  UP-TO-DATE
my-wld    7         7      10       7

The 3 new instances exist but are stuck in QuotaExceeded state. The WorkloadDeployment status should reflect the shortfall and surface a condition:

Conditions:
  Type             Status  Reason          Message
  ----             ------  ------          -------
  ReplicasReady    False   QuotaExceeded   3 of 10 desired replicas are pending quota 
                                           (compute.datumapis.com/instances: 7/7 used)

Open questions:

  • Should quota-blocked instances count against currentReplicas or be in a separate status field?
  • Should the deployment controller stop creating new quota-blocked instances once the first denial is observed, or keep trying (important for when quota is later increased)?

Scenario 3: User has quota restored (upgrade or admin grant)

When a project's quota is increased (tier upgrade, manual grant), the pending ResourceClaim should be re-evaluated automatically by the quota system. The instance controller watches the claim and resumes provisioning when Granted=True flips — no user action needed.

This should be tested explicitly: quota increase → instances automatically begin provisioning without user intervention.

Scenario 4: Instance deleted before claim is granted

When a QuotaExceeded instance is deleted, the ResourceClaim (owned by the instance) is garbage collected. No quota was ever consumed, so no release is needed. The controller should handle this gracefully and not attempt to provision after deletion begins.

Status Condition Design

Add QuotaGranted as a new condition type on Instance:

Condition Status Reason Meaning
QuotaGranted Unknown PendingEvaluation Claim created, awaiting evaluation
QuotaGranted True QuotaAvailable Quota allocated, provisioning may proceed
QuotaGranted False QuotaExceeded Quota denied — instance will not be provisioned
QuotaGranted False ValidationFailed Misconfiguration in quota policy (platform error)

The QuotaGranted=False state should cascade to block Programmed, Running, and Ready conditions from ever becoming True.

Error Message Quality

Messages on denied claims (sourced from ResourceClaim.status.allocations[].message) should be user-actionable. The goal is for a user to understand:

  1. What was exceeded (vCPUs, instance count, memory)
  2. How much they're currently using vs. their limit
  3. What to do — upgrade tier, delete unused resources, or contact support

Example good message:

"Quota for compute.datumapis.com/vcpus exceeded: requested 4, available 0 (16/16 vCPUs allocated). To increase your limit, upgrade your plan or contact support."

Example bad message (avoid):

"QuotaExceeded"

Operator Alerting

As operators, we need internal visibility when consumers are approaching or exceeding their quota limits — both to proactively reach out to users who may be hitting growth ceilings, and to catch misconfigured grants or runaway usage before it becomes a support escalation.

This requires surfacing AllowanceBucket consumption data into our observability stack. Concretely:

  • Metrics: Export per-project, per-resource-type utilization from AllowanceBucket status as Prometheus metrics (e.g., quota_used, quota_limit, quota_utilization_ratio). This enables alerting rules like "any project above 80% vCPU utilization for more than 10 minutes."
  • Alerts: Define alerting thresholds at ~80% and 100% per quota dimension. The 80% threshold gives the ops team and GTM a lead time to reach out to users approaching upgrade triggers before they hit a hard wall.
  • Dashboards: An operator-facing view showing quota utilization across all projects, sortable by utilization ratio, to identify which projects are closest to their limits at a glance.

Open questions:

  • Where do these metrics live — in the compute service's observability stack, or should Milo expose AllowanceBucket metrics directly?
  • What is the right escalation path when the 100% threshold fires — automated notification to the user, a GTM alert, or both?
  • Should operators be able to configure per-tenant alert thresholds (e.g., VIP customers get alerted at 70%)?

Implementation Tasks

  • Register ResourceRegistration resources for each quota dimension above (consumerType: Project)
  • Create GrantCreationPolicy to provision default quota grants when a Project is created (tier-based defaults TBD with GTM)
  • Add QuotaGranted condition type to Instance API
  • Update instance_controller.go to create/watch ResourceClaim with consumerRef pointing to the instance's Project, and gate provisioning on grant
  • Update workloaddeployment_controller.go to surface quota-shortfall condition when instances are quota-blocked
  • Export AllowanceBucket utilization as Prometheus metrics for operator alerting
  • Define alerting rules at ~80% and 100% utilization per quota dimension
  • Write integration tests covering: quota granted flow, quota exceeded flow, quota increase unblocks pending instances
  • Determine default quota values per tier (requires GTM input)

Open Questions

  1. Claim namespace: Where do ResourceClaim objects live — in the same namespace as the Instance, or in a dedicated quota-system namespace?
  2. Partial WorkloadDeployment scaling: Should the controller back off from creating more instances after the first quota denial, or create all desired instances and let them all show quota status?
  3. Auto-cleanup TTL: Should quota-denied instances be auto-deleted after a period, or left indefinitely for user cleanup?
  4. Quota visibility API: Should there be a read API (or AllowanceBucket surfacing) so users can check remaining quota before attempting to create resources?
  5. Tier defaults: What are the default vCPU/instance/memory limits per tier? (Needs GTM/commercial input)
  6. Operator alerting ownership: Do AllowanceBucket metrics come from Milo directly, or does the compute service own this instrumentation?
  7. Operator alert escalation: When a project hits 100% quota, what is the automated response — user notification, GTM alert, both?
  8. Configurable operator thresholds: Should alert thresholds be configurable per project (e.g., VIP customers alerted earlier)?

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions