Dev environments
Zed
dstack now supports Zed as a dev environment IDE:
type: dev-environment
ide: zed
resources:
gpu: L4Once the dev environment is up, the CLI prints a zed:// link that opens the remote project in Zed over SSH. Since Zed doesn't require any plugins, no server pre-installation is needed — the Zed server is installed automatically on first connect.
✗ dstack apply
...
Submit a new run? [y/n]: y
NAME BACKEND GPU PRICE STATUS SUBMITTED
fast-fly-1 aws (us-east-2) gpu=L4:24GB:1 $0.1838 running 16:36
(spot)
fast-fly-1 provisioning completed (running)
pip install ipykernel...
To open in Zed, use link below:
zed://ssh/fast-fly-1/dstack/run
To connect via SSH, use: `ssh fast-fly-1`
To exit, press Ctrl+C.
Services
Replica groups
The spot_policy and reservation properties can now be specified at the replica group level. This allows distributing replicas across reserved and spot capacity, e.g., running baseline replicas on a reservation while autoscaling overflow replicas on spot instances:
type: service
image: my-image
port: 80
replicas:
- name: baseline
reservation: my-reservation
count: 1
- name: overflow
spot_policy: auto
count: 0..3
scaling:
metric: rps
target: 1Shepherd Model Gateway
Services using Shepherd Model Gateway now support gRPC communication with both vLLM and SGLang workers. Previously, only the SGLang runtime with the HTTP connection mode was supported.
Below is an example service configuration running vLLM gRPC workers:
type: service
name: prefill-decode
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
image: python:3.12-slim
commands:
- pip install smg
- |
smg launch \
--pd-disaggregation \
--model-path $MODEL_ID \
--enable-igw \
--host 0.0.0.0 \
--port 8000 \
--prefill-policy cache_aware
router:
type: sglang
resources:
cpu: 4
- count: 1
image: vllm/vllm-openai:latest
commands:
- pip install -U "vllm[grpc]"
- |
python3 -m vllm.entrypoints.grpc_server \
--model $MODEL_ID \
--host 0.0.0.0 \
--port 8000 \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer"}'
resources:
gpu: H200
- count: 1
image: vllm/vllm-openai:latest
commands:
- pip install -U "vllm[grpc]"
- |
python3 -m vllm.entrypoints.grpc_server \
--model $MODEL_ID \
--host 0.0.0.0 \
--port 8000 \
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer"}'
resources:
gpu: H200
port: 8000dstack automatically detects each worker's runtime (vLLM or SGLang) and connection mode (HTTP or gRPC) by probing it. With gRPC, the SMG router tokenizes requests once and routes on tokens instead of raw text, reducing duplicate work and making cache_aware routing more effective.
JarvisLabs
The jarvislabs backend now supports offers with RTXPRO6000 GPUs.
Azure
subnet_ids
Similarly to vpc_ids, the azure backend now allows selecting specific subnets to be attached to dstack VMs via the new subnet_ids property, mapping regions to subnets in the <resource-group>/<vnet>/<subnet> format:
projects:
- name: main
backends:
- type: azure
subscription_id: ...
tenant_id: ...
creds:
type: default
regions: [westeurope]
subnet_ids:
westeurope: my-resource-group/my-vnet/my-subnetThis is useful when the VNet contains subnets that dstack shouldn't pick automatically, e.g. subnets delegated to other Azure services.
What's changed
- Fix zero scaled services assigned to wrong fleets by @r4victor in #3939
- Set runner/shim default compiled versions to
latestby @r4victor in #3941 - Implement SSH connection pool for runner instances by @r4victor in #3936
- [chore]: Move
format_backend()to common utils by @jvstme in #3942 - Drop non-linux runner builds and local backend by @r4victor in #3944
- Support Zed as dev-environment IDE by @r4victor in #3947
- Fix dropping ssh connections to non-provisioned terminating instances by @r4victor in #3948
- Replica group
spot_policyandreservationby @jvstme in #3932 - Fix jpd.hostname AssertionError on container stop by @r4victor in #3951
- Add NVIDIA Dynamo blog post by @peterschmidt85 in #3949
- Support gRPC communication with SMG (Shepherd Model Gateway) workers by @Bihan in #3946
- Allow configuring
subnet_idsin Azure settings by @jvstme in #3955 - [JarvisLabs] Support RTX PRO 6000; update gpuhunt dependency by @peterschmidt85 in #3943
Full changelog: 0.20.23...0.20.24