dstack-cloud: add gcp_config.provisioning_model for SPOT instances by kvinwang · Pull Request #15 · Phala-Network/meta-dstack-cloud

kvinwang · 2026-05-27T04:19:06Z

Summary

Add a provisioning_model field to gcp_config (STANDARD / SPOT), and pass the corresponding flags through to gcloud compute instances create.

Why

Many GCP projects only ship preemptible (SPOT) quota for newer GPUs. For example, on a typical project PREEMPTIBLE-NVIDIA-H100-GPUS-per-project-{region,zone} is granted while NVIDIA-H100-GPUS-per-project-region is zero. Without on-demand quota, the only way to launch a3-highgpu-1g (H100) in a Confidential TDX VM is to ask for --provisioning-model=SPOT. Currently dstack-cloud deploy hard-codes STANDARD and the launch fails with QUOTA_EXCEEDED.

Behavior

provisioning_model defaults to STANDARD — fully backwards-compatible.
When set to SPOT, the deploy adds:
- --provisioning-model=SPOT
- --instance-termination-action=STOP (so the LUKS-encrypted data disk survives preemption and dstack-cloud start can resume the instance — gcloud's default is DELETE)
Any other value raises RuntimeError early instead of silently dropping through.

Example app.json:

```json
"gcp_config": {
"machine_type": "a3-highgpu-1g",
"zone": "us-central1-a",
"provisioning_model": "SPOT"
}
```

Test plan

dstack-cloud new: template now includes "provisioning_model": "STANDARD"
dstack-cloud deploy with default (STANDARD) — unchanged gcloud invocation
dstack-cloud deploy with SPOT — emits --provisioning-model=SPOT --instance-termination-action=STOP; verified on GCP a3-highgpu-1g
dstack-cloud deploy with bogus value — raises Unsupported provisioning_model

Many GCP projects only ship preemptible (SPOT) quota for newer GPUs — in particular `PREEMPTIBLE-NVIDIA-H100-GPUS-per-project-{region,zone}` is granted by default while `NVIDIA-H100-GPUS-per-project-region` is zero. Without on-demand quota, the only way to launch H100 in a Confidential TDX VM is to request `--provisioning-model=SPOT`. Expose a `provisioning_model` field in `gcp_config` (default `STANDARD`, backwards-compatible). When set to `SPOT`, also emit `--instance-termination-action=STOP` so the boot/data disks survive preemption and the instance can be resumed via `dstack-cloud start` (important for the LUKS-encrypted data disk, which is keyed by the KMS-provisioned per-instance secret). Anything other than `STANDARD`/`SPOT` raises an early error rather than silently dropping through. Example `app.json` snippet for an H100 deploy: "gcp_config": { "machine_type": "a3-highgpu-1g", "zone": "us-central1-a", "provisioning_model": "SPOT" }

Copilot

Copilot wasn't able to review any files in this pull request.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings May 27, 2026 04:19

Copilot AI reviewed May 27, 2026

View reviewed changes

kvinwang merged commit 5a1bfea into main May 27, 2026

kvinwang mentioned this pull request May 27, 2026

bump DISTRO_VERSION to 0.6.1 #17

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dstack-cloud: add gcp_config.provisioning_model for SPOT instances#15

dstack-cloud: add gcp_config.provisioning_model for SPOT instances#15
kvinwang merged 1 commit into
mainfrom
kvin/dstack-cloud-spot-provisioning

kvinwang commented May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvinwang commented May 27, 2026

Summary

Why

Behavior

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants