Integrate compute service with Milo quota system

## Parent Issue

Tracked by https://github.com/datum-cloud/enhancements/issues/682 (Launch Workload Compute Service — "UFOs")

## Summary

Integrate the compute service with the Milo quota service (`quota.miloapis.com`) so that resource consumption is tracked and enforced per project. This covers `Instance`, `WorkloadDeployment`, and the underlying compute dimensions (instance count, vCPUs, memory, and eventually GPUs).

## Motivation

Without quota integration, there is no mechanism to:
- Enforce per-project resource limits set by the pricing tier
- Give users visibility into how much of their quota they are consuming
- Prevent runaway workloads from monopolizing platform capacity
- Gate new projects to safe default limits during onboarding

## Resources to Quota

The following resource dimensions need `ResourceRegistration` entries:

| Resource Type | Unit | Description |
|---|---|---|
| `compute.datumapis.com/instances` | count | Total running instances per project |
| `compute.datumapis.com/vcpus` | millicores | Total vCPU allocation across all instances |
| `compute.datumapis.com/memory` | MiB | Total memory allocation across all instances |
| `compute.datumapis.com/gpus` | count | GPU units (future — phase 2) |

## Integration Pattern

Compute should use **Pattern 2: Service-Managed Claims** (not admission-blocking ClaimCreationPolicy). Reasons:

- `WorkloadDeployment` creates `Instance` objects ahead of scheduling — creation should not be blocked, but provisioning should be gated
- Quota denial state needs to be visible to users as a resource condition, not a silent API error
- Auto-scaling scenarios need instances to exist in a pending state while quota is being negotiated

The instance controller creates a `ResourceClaim` when an `Instance` is created, watches for `Granted`/`Denied` on the claim, and gates actual VM provisioning on claim grant.

The `ConsumerRef` on each `ResourceClaim` will reference the `Project` the instance belongs to.

## User Experience: Quota Exceeded States

This is the primary UX concern to resolve before implementation.

### Scenario 1: User creates an Instance that exceeds their quota

```
$ kubectl apply -f my-instance.yaml
instance.compute.datumapis.com/my-instance created   # Creation succeeds
```

The instance is created, but it immediately surfaces a non-ready condition:

```
$ kubectl get instance my-instance
NAME          READY   REASON          NETWORK IP   EXTERNAL IP
my-instance   False   QuotaExceeded                
```

```
$ kubectl describe instance my-instance
...
Conditions:
  Type           Status  Reason          Message
  ----           ------  ------          -------
  QuotaGranted   False   QuotaExceeded   Quota for compute.datumapis.com/vcpus exceeded: 
                                         requested 4 vCPUs, 0 available (16/16 vCPUs used)
  Programmed     False   PendingQuota    Waiting for quota claim to be granted
  Running        False   PendingQuota    Waiting for quota claim to be granted
  Ready          False   PendingQuota    Waiting for quota claim to be granted
```

**Open questions:**
- Should the instance be auto-deleted after some TTL when quota is denied, or left in place for the user to clean up?
- Should we surface a link/annotation pointing the user to quota management UI or docs?

### Scenario 2: WorkloadDeployment scales up and partially exceeds quota

If a `WorkloadDeployment` has `replicas: 10` but only 3 additional instances fit within quota:

```
$ kubectl get workloaddeployment my-wld
NAME      REPLICAS  READY  DESIRED  UP-TO-DATE
my-wld    7         7      10       7
```

The 3 new instances exist but are stuck in `QuotaExceeded` state. The `WorkloadDeployment` status should reflect the shortfall and surface a condition:

```
Conditions:
  Type             Status  Reason          Message
  ----             ------  ------          -------
  ReplicasReady    False   QuotaExceeded   3 of 10 desired replicas are pending quota 
                                           (compute.datumapis.com/instances: 7/7 used)
```

**Open questions:**
- Should quota-blocked instances count against `currentReplicas` or be in a separate status field?
- Should the deployment controller stop creating new quota-blocked instances once the first denial is observed, or keep trying (important for when quota is later increased)?

### Scenario 3: User has quota restored (upgrade or admin grant)

When a project's quota is increased (tier upgrade, manual grant), the pending `ResourceClaim` should be re-evaluated automatically by the quota system. The instance controller watches the claim and resumes provisioning when `Granted=True` flips — no user action needed.

This should be tested explicitly: quota increase → instances automatically begin provisioning without user intervention.

### Scenario 4: Instance deleted before claim is granted

When a `QuotaExceeded` instance is deleted, the `ResourceClaim` (owned by the instance) is garbage collected. No quota was ever consumed, so no release is needed. The controller should handle this gracefully and not attempt to provision after deletion begins.

## Status Condition Design

Add `QuotaGranted` as a new condition type on `Instance`:

| Condition | Status | Reason | Meaning |
|---|---|---|---|
| `QuotaGranted` | Unknown | PendingEvaluation | Claim created, awaiting evaluation |
| `QuotaGranted` | True | QuotaAvailable | Quota allocated, provisioning may proceed |
| `QuotaGranted` | False | QuotaExceeded | Quota denied — instance will not be provisioned |
| `QuotaGranted` | False | ValidationFailed | Misconfiguration in quota policy (platform error) |

The `QuotaGranted=False` state should cascade to block `Programmed`, `Running`, and `Ready` conditions from ever becoming True.

## Error Message Quality

Messages on denied claims (sourced from `ResourceClaim.status.allocations[].message`) should be user-actionable. The goal is for a user to understand:

1. **What** was exceeded (vCPUs, instance count, memory)
2. **How much** they're currently using vs. their limit
3. **What to do** — upgrade tier, delete unused resources, or contact support

Example good message:
> "Quota for compute.datumapis.com/vcpus exceeded: requested 4, available 0 (16/16 vCPUs allocated). To increase your limit, upgrade your plan or contact support."

Example bad message (avoid):
> "QuotaExceeded"

## Operator Alerting

As operators, we need internal visibility when consumers are approaching or exceeding their quota limits — both to proactively reach out to users who may be hitting growth ceilings, and to catch misconfigured grants or runaway usage before it becomes a support escalation.

This requires surfacing `AllowanceBucket` consumption data into our observability stack. Concretely:

- **Metrics**: Export per-project, per-resource-type utilization from `AllowanceBucket` status as Prometheus metrics (e.g., `quota_used`, `quota_limit`, `quota_utilization_ratio`). This enables alerting rules like "any project above 80% vCPU utilization for more than 10 minutes."
- **Alerts**: Define alerting thresholds at ~80% and 100% per quota dimension. The 80% threshold gives the ops team and GTM a lead time to reach out to users approaching upgrade triggers before they hit a hard wall.
- **Dashboards**: An operator-facing view showing quota utilization across all projects, sortable by utilization ratio, to identify which projects are closest to their limits at a glance.

**Open questions:**
- Where do these metrics live — in the compute service's observability stack, or should Milo expose `AllowanceBucket` metrics directly?
- What is the right escalation path when the 100% threshold fires — automated notification to the user, a GTM alert, or both?
- Should operators be able to configure per-tenant alert thresholds (e.g., VIP customers get alerted at 70%)?

## Implementation Tasks

- [ ] Register `ResourceRegistration` resources for each quota dimension above (consumerType: `Project`)
- [ ] Create `GrantCreationPolicy` to provision default quota grants when a Project is created (tier-based defaults TBD with GTM)
- [ ] Add `QuotaGranted` condition type to `Instance` API
- [ ] Update `instance_controller.go` to create/watch `ResourceClaim` with `consumerRef` pointing to the instance's Project, and gate provisioning on grant
- [ ] Update `workloaddeployment_controller.go` to surface quota-shortfall condition when instances are quota-blocked
- [ ] Export `AllowanceBucket` utilization as Prometheus metrics for operator alerting
- [ ] Define alerting rules at ~80% and 100% utilization per quota dimension
- [ ] Write integration tests covering: quota granted flow, quota exceeded flow, quota increase unblocks pending instances
- [ ] Determine default quota values per tier (requires GTM input)

## Open Questions

1. **Claim namespace**: Where do `ResourceClaim` objects live — in the same namespace as the `Instance`, or in a dedicated `quota-system` namespace?
2. **Partial WorkloadDeployment scaling**: Should the controller back off from creating more instances after the first quota denial, or create all desired instances and let them all show quota status?
3. **Auto-cleanup TTL**: Should quota-denied instances be auto-deleted after a period, or left indefinitely for user cleanup?
4. **Quota visibility API**: Should there be a read API (or `AllowanceBucket` surfacing) so users can check remaining quota before attempting to create resources?
5. **Tier defaults**: What are the default vCPU/instance/memory limits per tier? (Needs GTM/commercial input)
6. **Operator alerting ownership**: Do `AllowanceBucket` metrics come from Milo directly, or does the compute service own this instrumentation?
7. **Operator alert escalation**: When a project hits 100% quota, what is the automated response — user notification, GTM alert, both?
8. **Configurable operator thresholds**: Should alert thresholds be configurable per project (e.g., VIP customers alerted earlier)?

## Related

- [Milo quota API](https://github.com/datum-cloud/milo/tree/main/pkg/apis/quota/v1alpha1)
- [capability-quota skill](https://github.com/datum-cloud/datum-claude-code-plugins) — implementation patterns
- datum-cloud/enhancements#681 — Metering integration (quota and metering will share resource dimensions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate compute service with Milo quota system #90

Parent Issue

Summary

Motivation

Resources to Quota

Integration Pattern

User Experience: Quota Exceeded States

Scenario 1: User creates an Instance that exceeds their quota

Scenario 2: WorkloadDeployment scales up and partially exceeds quota

Scenario 3: User has quota restored (upgrade or admin grant)

Scenario 4: Instance deleted before claim is granted

Status Condition Design

Error Message Quality

Operator Alerting

Implementation Tasks

Open Questions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resource Type	Unit	Description
`compute.datumapis.com/instances`	count	Total running instances per project
`compute.datumapis.com/vcpus`	millicores	Total vCPU allocation across all instances
`compute.datumapis.com/memory`	MiB	Total memory allocation across all instances
`compute.datumapis.com/gpus`	count	GPU units (future — phase 2)

Condition	Status	Reason	Meaning
`QuotaGranted`	Unknown	PendingEvaluation	Claim created, awaiting evaluation
`QuotaGranted`	True	QuotaAvailable	Quota allocated, provisioning may proceed
`QuotaGranted`	False	QuotaExceeded	Quota denied — instance will not be provisioned
`QuotaGranted`	False	ValidationFailed	Misconfiguration in quota policy (platform error)

Integrate compute service with Milo quota system #90

Description

Parent Issue

Summary

Motivation

Resources to Quota

Integration Pattern

User Experience: Quota Exceeded States

Scenario 1: User creates an Instance that exceeds their quota

Scenario 2: WorkloadDeployment scales up and partially exceeds quota

Scenario 3: User has quota restored (upgrade or admin grant)

Scenario 4: Instance deleted before claim is granted

Status Condition Design

Error Message Quality

Operator Alerting

Implementation Tasks

Open Questions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions