Parent Issue
Tracked by datum-cloud/enhancements#682 (Launch Workload Compute Service — "UFOs")
Summary
Integrate the compute service with the Milo quota service (quota.miloapis.com) so that resource consumption is tracked and enforced per project. This covers Instance, WorkloadDeployment, and the underlying compute dimensions (instance count, vCPUs, memory, and eventually GPUs).
Motivation
Without quota integration, there is no mechanism to:
- Enforce per-project resource limits set by the pricing tier
- Give users visibility into how much of their quota they are consuming
- Prevent runaway workloads from monopolizing platform capacity
- Gate new projects to safe default limits during onboarding
Resources to Quota
The following resource dimensions need ResourceRegistration entries:
| Resource Type |
Unit |
Description |
compute.datumapis.com/instances |
count |
Total running instances per project |
compute.datumapis.com/vcpus |
millicores |
Total vCPU allocation across all instances |
compute.datumapis.com/memory |
MiB |
Total memory allocation across all instances |
compute.datumapis.com/gpus |
count |
GPU units (future — phase 2) |
Integration Pattern
Compute should use Pattern 2: Service-Managed Claims (not admission-blocking ClaimCreationPolicy). Reasons:
WorkloadDeployment creates Instance objects ahead of scheduling — creation should not be blocked, but provisioning should be gated
- Quota denial state needs to be visible to users as a resource condition, not a silent API error
- Auto-scaling scenarios need instances to exist in a pending state while quota is being negotiated
The instance controller creates a ResourceClaim when an Instance is created, watches for Granted/Denied on the claim, and gates actual VM provisioning on claim grant.
The ConsumerRef on each ResourceClaim will reference the Project the instance belongs to.
User Experience: Quota Exceeded States
This is the primary UX concern to resolve before implementation.
Scenario 1: User creates an Instance that exceeds their quota
$ kubectl apply -f my-instance.yaml
instance.compute.datumapis.com/my-instance created # Creation succeeds
The instance is created, but it immediately surfaces a non-ready condition:
$ kubectl get instance my-instance
NAME READY REASON NETWORK IP EXTERNAL IP
my-instance False QuotaExceeded
$ kubectl describe instance my-instance
...
Conditions:
Type Status Reason Message
---- ------ ------ -------
QuotaGranted False QuotaExceeded Quota for compute.datumapis.com/vcpus exceeded:
requested 4 vCPUs, 0 available (16/16 vCPUs used)
Programmed False PendingQuota Waiting for quota claim to be granted
Running False PendingQuota Waiting for quota claim to be granted
Ready False PendingQuota Waiting for quota claim to be granted
Open questions:
- Should the instance be auto-deleted after some TTL when quota is denied, or left in place for the user to clean up?
- Should we surface a link/annotation pointing the user to quota management UI or docs?
Scenario 2: WorkloadDeployment scales up and partially exceeds quota
If a WorkloadDeployment has replicas: 10 but only 3 additional instances fit within quota:
$ kubectl get workloaddeployment my-wld
NAME REPLICAS READY DESIRED UP-TO-DATE
my-wld 7 7 10 7
The 3 new instances exist but are stuck in QuotaExceeded state. The WorkloadDeployment status should reflect the shortfall and surface a condition:
Conditions:
Type Status Reason Message
---- ------ ------ -------
ReplicasReady False QuotaExceeded 3 of 10 desired replicas are pending quota
(compute.datumapis.com/instances: 7/7 used)
Open questions:
- Should quota-blocked instances count against
currentReplicas or be in a separate status field?
- Should the deployment controller stop creating new quota-blocked instances once the first denial is observed, or keep trying (important for when quota is later increased)?
Scenario 3: User has quota restored (upgrade or admin grant)
When a project's quota is increased (tier upgrade, manual grant), the pending ResourceClaim should be re-evaluated automatically by the quota system. The instance controller watches the claim and resumes provisioning when Granted=True flips — no user action needed.
This should be tested explicitly: quota increase → instances automatically begin provisioning without user intervention.
Scenario 4: Instance deleted before claim is granted
When a QuotaExceeded instance is deleted, the ResourceClaim (owned by the instance) is garbage collected. No quota was ever consumed, so no release is needed. The controller should handle this gracefully and not attempt to provision after deletion begins.
Status Condition Design
Add QuotaGranted as a new condition type on Instance:
| Condition |
Status |
Reason |
Meaning |
QuotaGranted |
Unknown |
PendingEvaluation |
Claim created, awaiting evaluation |
QuotaGranted |
True |
QuotaAvailable |
Quota allocated, provisioning may proceed |
QuotaGranted |
False |
QuotaExceeded |
Quota denied — instance will not be provisioned |
QuotaGranted |
False |
ValidationFailed |
Misconfiguration in quota policy (platform error) |
The QuotaGranted=False state should cascade to block Programmed, Running, and Ready conditions from ever becoming True.
Error Message Quality
Messages on denied claims (sourced from ResourceClaim.status.allocations[].message) should be user-actionable. The goal is for a user to understand:
- What was exceeded (vCPUs, instance count, memory)
- How much they're currently using vs. their limit
- What to do — upgrade tier, delete unused resources, or contact support
Example good message:
"Quota for compute.datumapis.com/vcpus exceeded: requested 4, available 0 (16/16 vCPUs allocated). To increase your limit, upgrade your plan or contact support."
Example bad message (avoid):
"QuotaExceeded"
Operator Alerting
As operators, we need internal visibility when consumers are approaching or exceeding their quota limits — both to proactively reach out to users who may be hitting growth ceilings, and to catch misconfigured grants or runaway usage before it becomes a support escalation.
This requires surfacing AllowanceBucket consumption data into our observability stack. Concretely:
- Metrics: Export per-project, per-resource-type utilization from
AllowanceBucket status as Prometheus metrics (e.g., quota_used, quota_limit, quota_utilization_ratio). This enables alerting rules like "any project above 80% vCPU utilization for more than 10 minutes."
- Alerts: Define alerting thresholds at ~80% and 100% per quota dimension. The 80% threshold gives the ops team and GTM a lead time to reach out to users approaching upgrade triggers before they hit a hard wall.
- Dashboards: An operator-facing view showing quota utilization across all projects, sortable by utilization ratio, to identify which projects are closest to their limits at a glance.
Open questions:
- Where do these metrics live — in the compute service's observability stack, or should Milo expose
AllowanceBucket metrics directly?
- What is the right escalation path when the 100% threshold fires — automated notification to the user, a GTM alert, or both?
- Should operators be able to configure per-tenant alert thresholds (e.g., VIP customers get alerted at 70%)?
Implementation Tasks
Open Questions
- Claim namespace: Where do
ResourceClaim objects live — in the same namespace as the Instance, or in a dedicated quota-system namespace?
- Partial WorkloadDeployment scaling: Should the controller back off from creating more instances after the first quota denial, or create all desired instances and let them all show quota status?
- Auto-cleanup TTL: Should quota-denied instances be auto-deleted after a period, or left indefinitely for user cleanup?
- Quota visibility API: Should there be a read API (or
AllowanceBucket surfacing) so users can check remaining quota before attempting to create resources?
- Tier defaults: What are the default vCPU/instance/memory limits per tier? (Needs GTM/commercial input)
- Operator alerting ownership: Do
AllowanceBucket metrics come from Milo directly, or does the compute service own this instrumentation?
- Operator alert escalation: When a project hits 100% quota, what is the automated response — user notification, GTM alert, both?
- Configurable operator thresholds: Should alert thresholds be configurable per project (e.g., VIP customers alerted earlier)?
Related
Parent Issue
Tracked by datum-cloud/enhancements#682 (Launch Workload Compute Service — "UFOs")
Summary
Integrate the compute service with the Milo quota service (
quota.miloapis.com) so that resource consumption is tracked and enforced per project. This coversInstance,WorkloadDeployment, and the underlying compute dimensions (instance count, vCPUs, memory, and eventually GPUs).Motivation
Without quota integration, there is no mechanism to:
Resources to Quota
The following resource dimensions need
ResourceRegistrationentries:compute.datumapis.com/instancescompute.datumapis.com/vcpuscompute.datumapis.com/memorycompute.datumapis.com/gpusIntegration Pattern
Compute should use Pattern 2: Service-Managed Claims (not admission-blocking ClaimCreationPolicy). Reasons:
WorkloadDeploymentcreatesInstanceobjects ahead of scheduling — creation should not be blocked, but provisioning should be gatedThe instance controller creates a
ResourceClaimwhen anInstanceis created, watches forGranted/Deniedon the claim, and gates actual VM provisioning on claim grant.The
ConsumerRefon eachResourceClaimwill reference theProjectthe instance belongs to.User Experience: Quota Exceeded States
This is the primary UX concern to resolve before implementation.
Scenario 1: User creates an Instance that exceeds their quota
The instance is created, but it immediately surfaces a non-ready condition:
Open questions:
Scenario 2: WorkloadDeployment scales up and partially exceeds quota
If a
WorkloadDeploymenthasreplicas: 10but only 3 additional instances fit within quota:The 3 new instances exist but are stuck in
QuotaExceededstate. TheWorkloadDeploymentstatus should reflect the shortfall and surface a condition:Open questions:
currentReplicasor be in a separate status field?Scenario 3: User has quota restored (upgrade or admin grant)
When a project's quota is increased (tier upgrade, manual grant), the pending
ResourceClaimshould be re-evaluated automatically by the quota system. The instance controller watches the claim and resumes provisioning whenGranted=Trueflips — no user action needed.This should be tested explicitly: quota increase → instances automatically begin provisioning without user intervention.
Scenario 4: Instance deleted before claim is granted
When a
QuotaExceededinstance is deleted, theResourceClaim(owned by the instance) is garbage collected. No quota was ever consumed, so no release is needed. The controller should handle this gracefully and not attempt to provision after deletion begins.Status Condition Design
Add
QuotaGrantedas a new condition type onInstance:QuotaGrantedQuotaGrantedQuotaGrantedQuotaGrantedThe
QuotaGranted=Falsestate should cascade to blockProgrammed,Running, andReadyconditions from ever becoming True.Error Message Quality
Messages on denied claims (sourced from
ResourceClaim.status.allocations[].message) should be user-actionable. The goal is for a user to understand:Example good message:
Example bad message (avoid):
Operator Alerting
As operators, we need internal visibility when consumers are approaching or exceeding their quota limits — both to proactively reach out to users who may be hitting growth ceilings, and to catch misconfigured grants or runaway usage before it becomes a support escalation.
This requires surfacing
AllowanceBucketconsumption data into our observability stack. Concretely:AllowanceBucketstatus as Prometheus metrics (e.g.,quota_used,quota_limit,quota_utilization_ratio). This enables alerting rules like "any project above 80% vCPU utilization for more than 10 minutes."Open questions:
AllowanceBucketmetrics directly?Implementation Tasks
ResourceRegistrationresources for each quota dimension above (consumerType:Project)GrantCreationPolicyto provision default quota grants when a Project is created (tier-based defaults TBD with GTM)QuotaGrantedcondition type toInstanceAPIinstance_controller.goto create/watchResourceClaimwithconsumerRefpointing to the instance's Project, and gate provisioning on grantworkloaddeployment_controller.goto surface quota-shortfall condition when instances are quota-blockedAllowanceBucketutilization as Prometheus metrics for operator alertingOpen Questions
ResourceClaimobjects live — in the same namespace as theInstance, or in a dedicatedquota-systemnamespace?AllowanceBucketsurfacing) so users can check remaining quota before attempting to create resources?AllowanceBucketmetrics come from Milo directly, or does the compute service own this instrumentation?Related