Skip to content

Support nvidia dynamo#3868

Merged
Bihan merged 5 commits into
dstackai:masterfrom
Bihan:support_nvidia_dynamo
May 14, 2026
Merged

Support nvidia dynamo#3868
Bihan merged 5 commits into
dstackai:masterfrom
Bihan:support_nvidia_dynamo

Conversation

@Bihan
Copy link
Copy Markdown
Collaborator

@Bihan Bihan commented May 8, 2026

Service Configuration example

type: service
name: dynamo-pd


env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-3B-Instruct

replicas: 
  - name: router
    count: 1
    docker: true
    router:
      type: dynamo
    commands:
      # DIND ships docker but not pip — set up a venv for ai-dynamo.
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install --pre "ai-dynamo[sglang]"
      # Pull Dynamo and start the supporting compose stack (NATS, etcd, ...).
      - git clone https://github.com/ai-dynamo/dynamo.git
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      # Run the Dynamo frontend (was the commented dev command).
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4

  # ── prefill worker ─────────────────────────────────────────────────────
  # dstackai/base + Python 3.12 + NVCC (needed for CUDA kernels in sglang).
  - name: prefill
    count: 1..2
    scaling:
      metric: rps
      target: 4 
    python: "3.12"
    nvcc: true
    commands:
      # dstack injected DSTACK_ROUTER_INTERNAL_IP after the router replica
      # was provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting
      # connections — having the IP isn't the same as having the services up.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

  # ── decode worker ──────────────────────────────────────────────────────
  - name: decode
    count: 1
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_HOST="0.0.0.0"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install --pre "ai-dynamo[sglang]"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: L4

port: 8000
model: meta-llama/Llama-3.2-3B-Instruct

probes:
 - type: http
   url: /health
   interval: 15s

@Bihan Bihan requested review from jvstme and r4victor and removed request for r4victor May 8, 2026 17:53
Comment thread src/dstack/_internal/server/services/runs/spec.py
Comment thread src/dstack/_internal/server/services/runs/replicas.py Outdated
Comment thread src/dstack/_internal/server/services/runs/replicas.py Outdated
Comment thread src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py Outdated
Comment thread src/dstack/_internal/server/background/pipeline_tasks/jobs_running.py Outdated
Comment thread src/dstack/_internal/server/services/runs/replicas.py Outdated
Comment on lines +30 to +32
Using an enum (rather than empty-dict sentinels) means callers can rely
on either `is` or `==` to compare — both yield correct, unambiguous
results — and stray dicts from elsewhere can never accidentally match.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I think this comparison between enums and empty dicts won't make much sense to a reader who hasn't seen the previous implementation with empty dicts.

Overall, many comments and docstrings in the PR look a bit too verbose and redundant to me. I'd only keep minimal comments that add important context which is not otherwise clear from the naming or the implementation

Comment thread src/tests/_internal/core/models/test_run_spec_validators.py Outdated
Copy link
Copy Markdown
Collaborator

@jvstme jvstme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Some comments (especially 1 and 2) may still be worth addressing, although there doesn't seem to be anything critical.

@Bihan Bihan merged commit d454f19 into dstackai:master May 14, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants