Support nvidia dynamo by Bihan · Pull Request #3868 · dstackai/dstack

Bihan · 2026-05-08T17:52:46Z

Service Configuration example

type: service
name: dynamo-pd


env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct

replicas: 
  - name: router
    count: 1
    docker: true
    router:
      type: dynamo
    commands:
      # DIND ships docker but not pip — set up a venv for ai-dynamo.
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install --pre "ai-dynamo[sglang]"
      # Pull Dynamo and start the supporting compose stack (NATS, etcd, ...).
      - git clone https://github.com/ai-dynamo/dynamo.git
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      # Run the Dynamo frontend (was the commented dev command).
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4

  # ── prefill worker ─────────────────────────────────────────────────────
  # dstackai/base + Python 3.12 + NVCC (needed for CUDA kernels in sglang).
  - name: prefill
    count: 1..2
    scaling:
      metric: rps
      target: 4 
    python: "3.12"
    nvcc: true
    commands:
      # dstack injected DSTACK_ROUTER_INTERNAL_IP after the router replica
      # was provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting
      # connections — having the IP isn't the same as having the services up.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

  # ── decode worker ──────────────────────────────────────────────────────
  - name: decode
    count: 1
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

port: 8000
model: meta-llama/Llama-3.2-3B-Instruct

probes:
 - type: http
   url: /health
   interval: 15s

jvstme · 2026-05-13T18:50:22Z

+    Using an enum (rather than empty-dict sentinels) means callers can rely
+    on either `is` or `==` to compare — both yield correct, unambiguous
+    results — and stray dicts from elsewhere can never accidentally match.


(nit) I think this comparison between enums and empty dicts won't make much sense to a reader who hasn't seen the previous implementation with empty dicts.

Overall, many comments and docstrings in the PR look a bit too verbose and redundant to me. I'd only keep minimal comments that add important context which is not otherwise clear from the naming or the implementation

jvstme

Looks good overall.

Some comments (especially 1 and 2) may still be worth addressing, although there doesn't seem to be anything critical.

Bihan Rana added 2 commits May 8, 2026 22:31

Resolve Merge Conflict

9dcc361

Support NVIDIA dynamo

dcd72e3

Bihan requested review from jvstme and r4victor and removed request for r4victor May 8, 2026 17:53

jvstme reviewed May 11, 2026

View reviewed changes

Comment thread src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py

Resolve Review Comments

3bfd45c

jvstme reviewed May 13, 2026

View reviewed changes

jvstme approved these changes May 13, 2026

View reviewed changes

Bihan Rana added 2 commits May 14, 2026 15:16

Resolve new comments

0fac4b7

Resolve new comments

98a9483

jvstme approved these changes May 14, 2026

View reviewed changes

Bihan merged commit d454f19 into dstackai:master May 14, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support nvidia dynamo#3868

Support nvidia dynamo#3868
Bihan merged 5 commits into
dstackai:masterfrom
Bihan:support_nvidia_dynamo

Bihan commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvstme May 13, 2026

Uh oh!

Uh oh!

jvstme left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Bihan commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvstme May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jvstme left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bihan commented May 8, 2026 •

edited

Loading