```yaml
type: fleet
@@ -296,7 +296,7 @@ Once you've configured the `gcp` backend, create the fleet configuration:
```shell
- $ dstack apply -f examples/clusters/gcp/a3high-fleet.dstack.yml
+ $ dstack apply -f a3high-fleet.dstack.yml
FLEET INSTANCE BACKEND GPU PRICE STATUS CREATED
a3mega-fleet 1 gcp (europe-west4) H100:80GB:8 $20.5688 (spot) idle 9 mins ago
@@ -324,7 +324,7 @@ Use a distributed task that runs NCCL tests to validate cluster network bandwidt
```shell
- $ dstack apply -f examples/clusters/nccl-tests/.dstack.yml
+ $ dstack apply -f nccl-tests.dstack.yml
Provisioning...
---> 100%
@@ -351,15 +351,70 @@ Use a distributed task that runs NCCL tests to validate cluster network bandwidt
=== "A3 Mega"
- !!! info "Source code"
- The source code of the task can be found at [examples/clusters/gcp/a3mega-nccl-tests.dstack.yml](https://github.com/dstackai/dstack/blob/master/examples/clusters/gcp/a3mega-nccl-tests.dstack.yml).
+
+
+
+ ```yaml
+ type: task
+ name: nccl-tests
+ nodes: 2
+ image: nvcr.io/nvidia/pytorch:24.04-py3
+ entrypoint: "bash -c" # Need to use bash instead of default dash for nccl-env-profile.sh
+ commands:
+ - |
+ # Setup TCPXO NCCL env variables
+ NCCL_LIB_DIR="/var/lib/tcpxo/lib64"
+ source ${NCCL_LIB_DIR}/nccl-env-profile-ll128.sh
+ export NCCL_FASTRAK_CTRL_DEV=enp0s12
+ export NCCL_FASTRAK_IFNAME=enp6s0,enp7s0,enp13s0,enp14s0,enp134s0,enp135s0,enp141s0,enp142s0
+ export NCCL_SOCKET_IFNAME=enp0s12
+ export NCCL_FASTRAK_LLCM_DEVICE_DIRECTORY="/dev/aperture_devices"
+ export LD_LIBRARY_PATH="${NCCL_LIB_DIR}:${LD_LIBRARY_PATH}"
+ # Build NCCL Tests
+ git clone https://github.com/NVIDIA/nccl-tests.git
+ cd nccl-tests
+ MPI=1 CC=mpicc CXX=mpicxx make -j
+ cd build
+ # We use FIFO for inter-node communication
+ FIFO=/tmp/dstack_job
+ if [ ${DSTACK_NODE_RANK} -eq 0 ]; then
+ sleep 10
+ echo "${DSTACK_NODES_IPS}" > hostfile
+ MPIRUN='mpirun --allow-run-as-root --hostfile hostfile'
+ # Wait for other nodes
+ while true; do
+ if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
+ break
+ fi
+ echo 'Waiting for nodes...'
+ sleep 5
+ done
+ # Run NCCL Tests
+ ${MPIRUN} \
+ -n ${DSTACK_GPUS_NUM} -N ${DSTACK_GPUS_PER_NODE} \
+ --mca btl tcp,self --mca btl_tcp_if_exclude lo,docker0 \
+ $(env | awk -F= '{print "-x", $1}' | xargs) \
+ ./all_gather_perf -b 8M -e 8G -f 2 -g 1 -w 5 --iters 200 -c 0;
+ # Notify nodes the job is done
+ ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
+ else
+ mkfifo ${FIFO}
+ # Wait for a message from the first node
+ cat ${FIFO}
+ fi
+ spot_policy: auto
+ resources:
+ shm_size: 16GB
+ ```
+
+
Pass the configuration to `dstack apply`:
```shell
- $ dstack apply -f examples/clusters/gcp/a3mega-nccl-tests.dstack.yml
+ $ dstack apply -f nccl-tests.dstack.yml
nccl-tests provisioning completed (running)
nThread 1 nGpus 1 minBytes 8388608 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 200 agg iters: 1 validation: 0 graph: 0
@@ -385,15 +440,57 @@ Use a distributed task that runs NCCL tests to validate cluster network bandwidt
=== "A3 High/Edge"
- !!! info "Source code"
- The source code of the task can be found at [examples/clusters/nccl-tests/.dstack.yml](https://github.com/dstackai/dstack/blob/master/examples/clusters/nccl-tests/.dstack.yml).
-
+
+
+
+ ```yaml
+ type: task
+ name: nccl-tests
+ nodes: 2
+ image: us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/nccl-plugin-gpudirecttcpx
+ commands:
+ - |
+ export NCCL_DEBUG=INFO
+ export LD_LIBRARY_PATH=/usr/local/tcpx/lib64:$LD_LIBRARY_PATH
+ # We use FIFO for inter-node communication
+ FIFO=/tmp/dstack_job
+ if [ ${DSTACK_NODE_RANK} -eq 0 ]; then
+ mkdir -p /scripts/hostfiles2
+ : > /scripts/hostfiles2/hostfile8
+ for ip in ${DSTACK_NODES_IPS}; do
+ echo "${ip} slots=${DSTACK_GPUS_PER_NODE}" >> /scripts/hostfiles2/hostfile8
+ done
+ MPIRUN='mpirun --allow-run-as-root --hostfile /scripts/hostfiles2/hostfile8'
+ # Wait for other nodes
+ while true; do
+ if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
+ break
+ fi
+ echo 'Waiting for nodes...'
+ sleep 5
+ done
+ # Run NCCL Tests
+ NCCL_GPUDIRECTTCPX_FORCE_ACK=0 /scripts/run-allgather.sh 8 eth1,eth2,eth3,eth4 8M 8GB 2
+ # Notify nodes the job is done
+ ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
+ else
+ mkfifo ${FIFO}
+ # Wait for a message from the first node
+ cat ${FIFO}
+ fi
+ spot_policy: auto
+ resources:
+ shm_size: 16GB
+ ```
+
+
+
Pass the configuration to `dstack apply`:
```shell
- $ dstack apply -f examples/clusters/gcp/a3high-nccl-tests.dstack.yml
+ $ dstack apply -f nccl-tests.dstack.yml
nccl-tests provisioning completed (running)
nThread 1 nGpus 1 minBytes 8388608 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 200 agg iters: 1 validation: 0 graph: 0
@@ -418,16 +515,13 @@ Use a distributed task that runs NCCL tests to validate cluster network bandwidt
- !!! info "Source code"
- The source code of the task can be found at [examples/clusters/gcp/a3high-nccl-tests.dstack.yml](https://github.com/dstackai/dstack/blob/master/examples/clusters/gcp/a3high-nccl-tests.dstack.yml).
-
### Distributed training
=== "A4"
- You can use the standard [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) example to run distributed training on A4 instances.
+ You can use the standard [distributed task](../../concepts/tasks.md#distributed-tasks) example to run distributed training on A4 instances.
=== "A3 Mega"
- You can use the standard [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) example to run distributed training on A3 Mega instances. To enable GPUDirect-TCPX, make sure the required [NCCL environment variables](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot#environment-variables-nccl) are properly set, for example by adding the following commands at the beginning:
+ You can use the standard [distributed task](../../concepts/tasks.md#distributed-tasks) example to run distributed training on A3 Mega instances. To enable GPUDirect-TCPX, make sure the required [NCCL environment variables](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot#environment-variables-nccl) are properly set, for example by adding the following commands at the beginning:
```shell
# ...
@@ -446,7 +540,7 @@ Use a distributed task that runs NCCL tests to validate cluster network bandwidt
```
=== "A3 High/Edge"
- You can use the standard [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) example to run distributed training on A3 High/Edge instances. To enable GPUDirect-TCPX0, make sure the required [NCCL environment variables](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot#environment-variables-nccl) are properly set, for example by adding the following commands at the beginning:
+ You can use the standard [distributed task](../../concepts/tasks.md#distributed-tasks) example to run distributed training on A3 High/Edge instances. To enable GPUDirect-TCPX0, make sure the required [NCCL environment variables](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot#environment-variables-nccl) are properly set, for example by adding the following commands at the beginning:
```shell
# ...
@@ -483,6 +577,6 @@ In addition to distributed training, you can of course run regular tasks, dev en
## What's new
-1. Learn about [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), [services](https://dstack.ai/docs/concepts/services)
-2. Read the [Clusters](https://dstack.ai/docs/guides/clusters) guide
+1. Learn about [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), [services](../../concepts/services.md)
+2. Read about [cluster placement](../../concepts/fleets.md#cluster-placement)
3. Check GCP's docs on using [A4](https://docs.cloud.google.com/compute/docs/gpus/create-gpu-vm-a3u-a4), and [A3 Mega/High/Edge](https://docs.cloud.google.com/compute/docs/gpus/gpudirect) instances
diff --git a/examples/clusters/lambda/README.md b/mkdocs/docs/examples/clusters/lambda.md
similarity index 84%
rename from examples/clusters/lambda/README.md
rename to mkdocs/docs/examples/clusters/lambda.md
index 07fb0ce926..1ebe35ce76 100644
--- a/examples/clusters/lambda/README.md
+++ b/mkdocs/docs/examples/clusters/lambda.md
@@ -19,7 +19,7 @@ description: Setting up Lambda clusters using Kubernetes or 1-Click Clusters wit
### Configure the backend
-Follow the standard instructions for setting up a [Kubernetes](https://dstack.ai/docs/concepts/backends/#kubernetes) backend:
+Follow the standard instructions for setting up a [Kubernetes](../../concepts/backends.md#kubernetes) backend:
@@ -68,11 +68,11 @@ $ dstack apply -f lambda-fleet.dstack.yml
-Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
+Once the fleet is created, you can run [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md).
## 1-Click Clusters
-Another way to work with Lambda clusters is through [1CC](https://lambda.ai/1-click-clusters). While `dstack` supports automated cluster provisioning via [VM-based backends](https://dstack.ai/docs/concepts/backends#vm-based), there is currently no programmatic way to provision Lambda 1CCs. As a result, to use a 1CC cluster with `dstack`, you must use [SSH fleets](https://dstack.ai/docs/concepts/fleets).
+Another way to work with Lambda clusters is through [1CC](https://lambda.ai/1-click-clusters). While `dstack` supports automated cluster provisioning via [VM-based backends](../../concepts/backends.md#vm-based), there is currently no programmatic way to provision Lambda 1CCs. As a result, to use a 1CC cluster with `dstack`, you must use [SSH fleets](../../concepts/fleets.md).
### Prerequsisites
@@ -80,7 +80,7 @@ Another way to work with Lambda clusters is through [1CC](https://lambda.ai/1-cl
### Create a fleet
-Follow the standard instructions for setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets/#ssh-fleets):
+Follow the standard instructions for setting up an [SSH fleet](../../concepts/fleets.md#ssh-fleets):
@@ -116,11 +116,11 @@ $ dstack apply -f lambda-fleet.dstack.yml
-Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
+Once the fleet is created, you can run [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md).
## Run tasks
-To run tasks on a cluster, you must use [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-task).
+To run tasks on a cluster, you must use [distributed tasks](../../concepts/tasks.md#distributed-task).
### Run NCCL tests
@@ -213,6 +213,6 @@ Provisioning...
## What's next
-1. Learn about [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), [services](https://dstack.ai/docs/concepts/services)
-2. Read the [Kuberentes](https://dstack.ai/docs/guides/kubernetes), and [Clusters](https://dstack.ai/docs/guides/clusters) guides
+1. Learn about [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), [services](../../concepts/services.md)
+2. Read about the [Kubernetes backend](../../concepts/backends.md#kubernetes) and [cluster placement](../../concepts/fleets.md#cluster-placement)
3. Check Lambda's docs on [Kubernetes](https://docs.lambda.ai/public-cloud/1-click-clusters/managed-kubernetes/#accessing-mk8s) and [1CC](https://docs.lambda.ai/public-cloud/1-click-clusters/)
diff --git a/examples/clusters/nccl-rccl-tests/README.md b/mkdocs/docs/examples/clusters/nccl-rccl-tests.md
similarity index 82%
rename from examples/clusters/nccl-rccl-tests/README.md
rename to mkdocs/docs/examples/clusters/nccl-rccl-tests.md
index a9cadd82e8..196f08d495 100644
--- a/examples/clusters/nccl-rccl-tests/README.md
+++ b/mkdocs/docs/examples/clusters/nccl-rccl-tests.md
@@ -5,10 +5,10 @@ description: Running NCCL and RCCL tests to validate cluster network bandwidth
# NCCL/RCCL tests
-This example shows how to run [NCCL](https://github.com/NVIDIA/nccl-tests) or [RCCL](https://github.com/ROCm/rccl-tests) tests on a cluster using [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks).
+This example shows how to run [NCCL](https://github.com/NVIDIA/nccl-tests) or [RCCL](https://github.com/ROCm/rccl-tests) tests on a cluster using [distributed tasks](../../concepts/tasks.md#distributed-tasks).
!!! info "Prerequisites"
- Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#cluster-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
+ Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](../../concepts/fleets.md#cluster-placement) or an [SSH fleet](../../concepts/fleets.md#ssh-placement)).
## Running as a task
@@ -16,7 +16,7 @@ Here's an example of a task that runs AllReduce test on 2 nodes, each with 4 GPU
=== "NCCL tests"
-
+
```yaml
type: task
@@ -59,7 +59,7 @@ Here's an example of a task that runs AllReduce test on 2 nodes, each with 4 GPU
=== "RCCL tests"
-
+
```yaml
type: task
@@ -120,12 +120,12 @@ Here's an example of a task that runs AllReduce test on 2 nodes, each with 4 GPU
### Apply a configuration
-To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply/) command.
+To run a configuration, use the [`dstack apply`](../../reference/cli/dstack/apply.md) command.
```shell
-$ dstack apply -f examples/clusters/nccl-rccl-tests/nccl-tests.dstack.yml
+$ dstack apply -f nccl-tests.dstack.yml
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 aws us-east-1 g4dn.12xlarge 48xCPU, 192GB, 4xT4 (16GB), 100.0GB (disk) no $3.912
@@ -139,5 +139,5 @@ Submit the run nccl-tests? [y/n]: y
## What's next?
-1. Check [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
- [services](https://dstack.ai/docsconcepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets).
+1. Check [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md),
+ [services](../../concepts/services.md), and [fleets](../../concepts/fleets.md).
diff --git a/examples/clusters/nebius/README.md b/mkdocs/docs/examples/clusters/nebius.md
similarity index 90%
rename from examples/clusters/nebius/README.md
rename to mkdocs/docs/examples/clusters/nebius.md
index 9f8bd349a0..20b1a47555 100644
--- a/examples/clusters/nebius/README.md
+++ b/mkdocs/docs/examples/clusters/nebius.md
@@ -75,7 +75,7 @@ $ dstack apply -f nebius-fleet.dstack.yml
This will automatically create a Nebius cluster and provision instances.
-Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
+Once the fleet is created, you can run [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md).
> If you want instances to be provisioned on demand, you can set `nodes` to `0..2`. In this case, `dstack` will create instances only when you run workloads.
@@ -107,7 +107,7 @@ $ nebius mk8s cluster get-credentials --id <cluster id> --external
### Configure a backend
-Follow the standard instructions for setting up a [`kubernetes`](https://dstack.ai/docs/concepts/backends/#kubernetes) backend:
+Follow the standard instructions for setting up a [`kubernetes`](../../concepts/backends.md#kubernetes) backend:
@@ -154,11 +154,11 @@ $ dstack apply -f nebius-fleet.dstack.yml
-Once the fleet is created, you can run [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), and [services](https://dstack.ai/docs/concepts/services).
+Once the fleet is created, you can run [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md).
## NCCL tests
-Use a [distributed task](https://dstack.ai/docs/concepts/tasks#distributed-tasks) to run NCCL tests and validate the cluster’s network bandwidth.
+Use a [distributed task](../../concepts/tasks.md#distributed-tasks) to run NCCL tests and validate the cluster’s network bandwidth.
@@ -252,6 +252,6 @@ nccl-tests provisioning completed (running)
## What's next
-1. Learn about [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks), [services](https://dstack.ai/docs/concepts/services)
-2. Check out [backends](https://dstack.ai/docs/concepts/backends) and [fleets](https://dstack.ai/docs/concepts/fleets)
+1. Learn about [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), [services](../../concepts/services.md)
+2. Check out [backends](../../concepts/backends.md) and [fleets](../../concepts/fleets.md)
3. Read Nebius' docs on [networking for VMs](https://docs.nebius.com/compute/clusters/gpu) and the [managed Kubernetes service](https://docs.nebius.com/kubernetes).
diff --git a/examples/inference/nim/README.md b/mkdocs/docs/examples/inference/nim.md
similarity index 80%
rename from examples/inference/nim/README.md
rename to mkdocs/docs/examples/inference/nim.md
index 680c51f498..f7d1c03edf 100644
--- a/examples/inference/nim/README.md
+++ b/mkdocs/docs/examples/inference/nim.md
@@ -8,7 +8,7 @@ description: Deploying Nemotron-3-Super-120B-A12B using NVIDIA NIM
This example shows how to deploy Nemotron-3-Super-120B-A12B using [NVIDIA NIM](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html) and `dstack`.
??? info "Prerequisites"
- Once `dstack` is [installed](https://dstack.ai/docs/installation), clone the repo with examples.
+ Once `dstack` is [installed](../../installation.md), clone the repo with examples.
@@ -23,7 +23,7 @@ This example shows how to deploy Nemotron-3-Super-120B-A12B using [NVIDIA NIM](h
Here's an example of a service that deploys Nemotron-3-Super-120B-A12B using NIM.
-
+
```yaml
type: service
@@ -54,13 +54,13 @@ resources:
### Running a configuration
Save the configuration above as `nemotron120.dstack.yml`, then use the
-[`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
+[`dstack apply`](../../reference/cli/dstack/apply.md) command.
```shell
$ NGC_API_KEY=...
-$ dstack apply -f nemotron120.dstack.yml
+$ dstack apply -f service.dstack.yml
```
@@ -91,9 +91,9 @@ $ curl http://127.0.0.1:3000/proxy/services/main/nemotron120/v1/chat/completions
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://nemotron120.
/`.
+When a [gateway](../../concepts/gateways.md) is configured, the service endpoint will be available at `https://nemotron120./`.
## What's next?
-1. Check [services](https://dstack.ai/docs/services)
+1. Check [services](../../concepts/services.md)
2. Browse the [Nemotron-3-Super-120B-A12B model page](https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b)
diff --git a/examples/inference/sglang/README.md b/mkdocs/docs/examples/inference/sglang.md
similarity index 89%
rename from examples/inference/sglang/README.md
rename to mkdocs/docs/examples/inference/sglang.md
index 3f2694c655..775dcedd48 100644
--- a/examples/inference/sglang/README.md
+++ b/mkdocs/docs/examples/inference/sglang.md
@@ -9,7 +9,7 @@ This example shows how to deploy `Qwen/Qwen3.6-27B` using
[SGLang](https://github.com/sgl-project/sglang) and `dstack`.
> For a `DeepSeek-V4-Pro` deployment on `B200:8`, see the
-[DeepSeek V4](../../models/deepseek-v4/index.md) model page.
+[DeepSeek V4](../models/deepseek-v4.md) model page.
## Apply a configuration
@@ -18,7 +18,7 @@ Here's an example of a service that deploys
=== "NVIDIA"
-
+
```yaml
type: service
@@ -53,7 +53,7 @@ Here's an example of a service that deploys
=== "AMD"
-
+
```yaml
type: service
@@ -94,13 +94,13 @@ guidance: a pinned ROCm image, tensor parallelism across all four GPUs, and the
standard `qwen3` reasoning parser without extra ROCm-specific tuning flags.
The first startup on MI300X can take longer while SGLang compiles ROCm kernels.
-Save one of the configurations above as `qwen36.dstack.yml`, then use the
-[`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
+Save one of the configurations above as `service.dstack.yml`, then use the
+[`dstack apply`](../../reference/cli/dstack/apply.md) command.
```shell
-$ dstack apply -f qwen36.dstack.yml
+$ dstack apply -f service.dstack.yml
```
@@ -132,7 +132,7 @@ Qwen3.6 uses thinking mode by default. To disable thinking, pass
`"chat_template_kwargs": {"enable_thinking": false}` in the request body. To
enable tool calling, add `--tool-call-parser qwen3_coder` to the serve command.
-> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling, HTTPS, rate limits, etc.), the service endpoint will be available at `https://qwen36.
/`.
+> If a [gateway](../../concepts/gateways.md) is configured (e.g. to enable auto-scaling, HTTPS, rate limits, etc.), the service endpoint will be available at `https://qwen36./`.
## Configuration options
@@ -221,5 +221,5 @@ Currently, auto-scaling only supports `rps` as the metric. TTFT and ITL metrics
## What's next?
-1. Read about [services](https://dstack.ai/docs/concepts/services) and [gateways](https://dstack.ai/docs/concepts/gateways)
+1. Read about [services](../../concepts/services.md) and [gateways](../../concepts/gateways.md)
2. Browse the [Qwen 3.6 SGLang cookbook](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.6) and the [SGLang server arguments reference](https://docs.sglang.ai/advanced_features/server_arguments.html)
diff --git a/examples/inference/trtllm/README.md b/mkdocs/docs/examples/inference/trtllm.md
similarity index 83%
rename from examples/inference/trtllm/README.md
rename to mkdocs/docs/examples/inference/trtllm.md
index ae3666d225..c058820b0a 100644
--- a/examples/inference/trtllm/README.md
+++ b/mkdocs/docs/examples/inference/trtllm.md
@@ -13,7 +13,7 @@ This example shows how to deploy `nvidia/Qwen3-235B-A22B-FP8` using
Here's an example of a service that deploys
`nvidia/Qwen3-235B-A22B-FP8` using TensorRT-LLM.
-
+
```yaml
type: service
@@ -53,12 +53,12 @@ resources:
```
-Apply it with [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md):
+Apply it with [`dstack apply`](../../reference/cli/dstack/apply.md):
```shell
-$ dstack apply -f qwen235.dstack.yml
+$ dstack apply -f service.dstack.yml
```
@@ -90,10 +90,10 @@ $ curl http://127.0.0.1:3000/proxy/services/main/qwen235/v1/chat/completions \
-When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://qwen235./`.
+When a [gateway](../../concepts/gateways.md) is configured, the service endpoint will be available at `https://qwen235./`.
## What's next?
-1. Read about [services](https://dstack.ai/docs/concepts/services) and [gateways](https://dstack.ai/docs/concepts/gateways)
+1. Read about [services](../../concepts/services.md) and [gateways](../../concepts/gateways.md)
2. Browse the [TensorRT-LLM deployment guides](https://nvidia.github.io/TensorRT-LLM/deployment-guide/index.html) and the [Qwen3 deployment guide](https://nvidia.github.io/TensorRT-LLM/deployment-guide/deployment-guide-for-qwen3-on-trtllm.html)
3. See the [`trtllm-serve` reference](https://nvidia.github.io/TensorRT-LLM/commands/trtllm-serve/trtllm-serve.html)
diff --git a/examples/inference/vllm/README.md b/mkdocs/docs/examples/inference/vllm.md
similarity index 77%
rename from examples/inference/vllm/README.md
rename to mkdocs/docs/examples/inference/vllm.md
index 75d6add9be..b5b83c4664 100644
--- a/examples/inference/vllm/README.md
+++ b/mkdocs/docs/examples/inference/vllm.md
@@ -15,7 +15,7 @@ Here's an example of a service that deploys
=== "NVIDIA"
-
+
```yaml
type: service
@@ -49,7 +49,7 @@ Here's an example of a service that deploys
=== "AMD"
-
+
```yaml
type: service
@@ -88,13 +88,13 @@ Qwen3.6-27B is a multimodal model. For text-only workloads, add
`--language-model-only` to free more memory for the KV cache. To enable tool
calling, add `--enable-auto-tool-choice --tool-call-parser qwen3_coder`.
-Save one of the configurations above as `qwen36.dstack.yml`, then use the
-[`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
+Save one of the configurations above as `service.dstack.yml`, then use the
+[`dstack apply`](../../reference/cli/dstack/apply.md) command.
```shell
-$ dstack apply -f qwen36.dstack.yml
+$ dstack apply -f service.dstack.yml
```
@@ -122,9 +122,9 @@ curl http://127.0.0.1:3000/proxy/services/main/qwen36/v1/chat/completions \
-> If a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured (e.g. to enable auto-scaling, HTTPS, rate limits, etc.), the service endpoint will be available at `https://qwen36.
/`.
+> If a [gateway](../../concepts/gateways.md) is configured (e.g. to enable auto-scaling, HTTPS, rate limits, etc.), the service endpoint will be available at `https://qwen36./`.
## What's next?
-1. Read about [services](https://dstack.ai/docs/concepts/services) and [gateways](https://dstack.ai/docs/concepts/gateways)
-2. Browse the [Qwen 3.5 & 3.6 vLLM recipe](https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3.5.html) and the [SGLang](https://dstack.ai/examples/inference/sglang/) example
+1. Read about [services](../../concepts/services.md) and [gateways](../../concepts/gateways.md)
+2. Browse the [Qwen 3.5 & 3.6 vLLM recipe](https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3.5.html) and the [SGLang](../inference/sglang.md) example
diff --git a/docs/examples/accelerators/intel/index.md b/mkdocs/docs/examples/llms/deepseek/index.md
similarity index 100%
rename from docs/examples/accelerators/intel/index.md
rename to mkdocs/docs/examples/llms/deepseek/index.md
diff --git a/docs/examples/accelerators/tenstorrent/index.md b/mkdocs/docs/examples/llms/llama/index.md
similarity index 100%
rename from docs/examples/accelerators/tenstorrent/index.md
rename to mkdocs/docs/examples/llms/llama/index.md
diff --git a/docs/examples/accelerators/tpu/index.md b/mkdocs/docs/examples/misc/docker-compose/index.md
similarity index 100%
rename from docs/examples/accelerators/tpu/index.md
rename to mkdocs/docs/examples/misc/docker-compose/index.md
diff --git a/examples/models/deepseek-v4/README.md b/mkdocs/docs/examples/models/deepseek-v4.md
similarity index 93%
rename from examples/models/deepseek-v4/README.md
rename to mkdocs/docs/examples/models/deepseek-v4.md
index b36a343018..833e5163d7 100644
--- a/examples/models/deepseek-v4/README.md
+++ b/mkdocs/docs/examples/models/deepseek-v4.md
@@ -6,7 +6,7 @@ description: Deploying DeepSeek-V4-Pro using SGLang on NVIDIA B200:8
# DeepSeek V4
This example shows how to deploy `deepseek-ai/DeepSeek-V4-Pro` as a
-[service](https://dstack.ai/docs/services) using
+[service](../../concepts/services.md) using
[SGLang](https://github.com/sgl-project/sglang) and `dstack`.
## Apply a configuration
@@ -64,7 +64,7 @@ This configuration uses the single-node Blackwell `DeepSeek-V4-Pro` recipe
shape for `8 x NVIDIA B200`.
Export your Hugging Face token and apply the configuration with
-[`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md).
+[`dstack apply`](../../reference/cli/dstack/apply.md).
@@ -151,4 +151,4 @@ This returns both:
1. Read the [DeepSeek-V4-Pro model card](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro)
2. Read the [DeepSeek-V4 SGLang cookbook](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4)
-3. Browse the dedicated [SGLang](https://dstack.ai/examples/inference/sglang/) and [vLLM](https://dstack.ai/examples/inference/vllm/) examples
+3. Browse the dedicated [SGLang](../inference/sglang.md) and [vLLM](../inference/vllm.md) examples
diff --git a/examples/models/qwen36/README.md b/mkdocs/docs/examples/models/qwen36.md
similarity index 91%
rename from examples/models/qwen36/README.md
rename to mkdocs/docs/examples/models/qwen36.md
index bc92271b27..35ea72fd11 100644
--- a/examples/models/qwen36/README.md
+++ b/mkdocs/docs/examples/models/qwen36.md
@@ -6,7 +6,7 @@ description: Deploying Qwen3.6-27B using SGLang on NVIDIA and AMD GPUs
# Qwen 3.6
This example shows how to deploy `Qwen/Qwen3.6-27B` as a
-[service](https://dstack.ai/docs/services) using
+[service](../../concepts/services.md) using
[SGLang](https://github.com/sgl-project/sglang) and `dstack`.
## Apply a configuration
@@ -92,7 +92,7 @@ The NVIDIA and AMD configurations above use pinned SGLang images and the same
straightforward 4-GPU layout used across the Qwen 3.6 docs and examples.
Apply the configuration with
-[`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md).
+[`dstack apply`](../../reference/cli/dstack/apply.md).
@@ -162,7 +162,7 @@ curl http://127.0.0.1:3000/proxy/services/main/qwen36/v1/chat/completions \
1. Read the [Qwen/Qwen3.6-27B model card](https://huggingface.co/Qwen/Qwen3.6-27B)
2. Read the [Qwen 3.6 SGLang cookbook](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.6)
3. Read the [Qwen 3.5 & 3.6 vLLM recipe](https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3.5.html)
-4. Browse the dedicated [SGLang](https://dstack.ai/examples/inference/sglang/)
- and [vLLM](https://dstack.ai/examples/inference/vllm/) examples
-5. Check the [AMD](https://dstack.ai/examples/accelerators/amd/) example for
+4. Browse the dedicated [SGLang](../inference/sglang.md)
+ and [vLLM](../inference/vllm.md) examples
+5. Check the [AMD](../accelerators/amd.md) example for
more AMD deployment and training configurations
diff --git a/docs/examples/clusters/aws/index.md b/mkdocs/docs/examples/models/wan22/index.md
similarity index 100%
rename from docs/examples/clusters/aws/index.md
rename to mkdocs/docs/examples/models/wan22/index.md
diff --git a/mkdocs/docs/examples/training/axolotl.md b/mkdocs/docs/examples/training/axolotl.md
new file mode 100644
index 0000000000..5266a86745
--- /dev/null
+++ b/mkdocs/docs/examples/training/axolotl.md
@@ -0,0 +1,185 @@
+---
+title: Axolotl
+description: Fine-tuning Llama models with Axolotl — single-node SFT with FSDP and QLoRA, or distributed across multiple nodes
+---
+
+# Axolotl
+
+This example shows how to use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) with `dstack` to fine-tune Llama models — on a single node with SFT, FSDP, and QLoRA, or distributed across multiple nodes.
+
+## Single-node training
+
+This section walks through fine-tuning 4-bit quantized `Llama-4-Scout-17B-16E` using SFT with FSDP and QLoRA.
+
+### Define a configuration
+
+Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a [`scout-qlora-flexattn-fsdp2.yaml`](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-4/scout-qlora-flexattn-fsdp2.yaml) file. The configuration uses 4-bit axolotl quantized version of `meta-llama/Llama-4-Scout-17B-16E`, requiring only ~43GB VRAM/GPU with 4K context length.
+
+Below is a task configuration that does fine-tuning.
+
+
+
+```yaml
+type: task
+# The name is optional, if not specified, generated randomly
+name: axolotl-nvidia-llama-scout-train
+
+# Using the official Axolotl's Docker image
+image: axolotlai/axolotl:main-latest
+
+# Required environment variables
+env:
+ - HF_TOKEN
+ - WANDB_API_KEY
+ - WANDB_PROJECT
+ - HUB_MODEL_ID
+# Commands of the task
+commands:
+ - wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-flexattn-fsdp2.yaml
+ - |
+ axolotl train scout-qlora-flexattn-fsdp2.yaml \
+ --wandb-project $WANDB_PROJECT \
+ --wandb-name $DSTACK_RUN_NAME \
+ --hub-model-id $HUB_MODEL_ID
+
+resources:
+ # Four GPU (required by FSDP)
+ gpu: H100:4
+ # Shared memory size for inter-process communication
+ shm_size: 64GB
+ disk: 500GB..
+```
+
+
+
+The task uses Axolotl's Docker image, where Axolotl is already pre-installed.
+
+!!! info "AMD"
+ The example above uses NVIDIA accelerators. To use it with AMD, check out [AMD](../accelerators/amd.md#axolotl).
+
+### Run the configuration
+
+Once the configuration is ready, run `dstack apply -f
`, and `dstack` will automatically provision the
+cloud resources and run the configuration.
+
+
+
+```shell
+$ HF_TOKEN=...
+$ WANDB_API_KEY=...
+$ WANDB_PROJECT=...
+$ HUB_MODEL_ID=...
+$ dstack apply -f train.dstack.yml
+
+ # BACKEND RESOURCES INSTANCE TYPE PRICE
+ 1 vastai (cz-czechia) cpu=64 mem=128GB H100:80GB:2 18794506 $3.8907
+ 2 vastai (us-texas) cpu=52 mem=64GB H100:80GB:2 20442365 $3.6926
+ 3 vastai (fr-france) cpu=64 mem=96GB H100:80GB:2 20379984 $3.7389
+
+Submit the run axolotl-nvidia-llama-scout-train? [y/n]:
+
+Provisioning...
+---> 100%
+```
+
+
+
+## Distributed training
+
+!!! info "Prerequisites"
+ Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](../../concepts/fleets.md#cluster-placement) or an [SSH fleet](../../concepts/fleets.md#ssh-placement)).
+
+This section walks through running distributed fine-tuning of `Llama-3.1-70B` with QLoRA and FSDP across multiple nodes.
+
+### Define a configuration
+
+Once the fleet is created, define a distributed task configuration. Here's an example of a distributed `QLoRA` task using `FSDP`.
+
+
+
+```yaml
+type: task
+name: axolotl-multi-node-qlora-llama3-70b
+
+nodes: 2
+
+image: nvcr.io/nvidia/pytorch:25.01-py3
+
+env:
+ - HF_TOKEN
+ - WANDB_API_KEY
+ - WANDB_PROJECT
+ - HUB_MODEL_ID
+ - CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+ - NCCL_DEBUG=INFO
+ - ACCELERATE_LOG_LEVEL=info
+
+commands:
+ # Replacing the default Torch and FlashAttention in the NCG container with Axolotl-compatible versions.
+ # The preinstalled versions are incompatible with Axolotl.
+ - pip uninstall -y torch flash-attn
+ - pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/test/cu124
+ - pip install --no-build-isolation axolotl[flash-attn,deepspeed]
+ - wget https://raw.githubusercontent.com/huggingface/trl/main/examples/accelerate_configs/fsdp1.yaml
+ - wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-3/qlora-fsdp-70b.yaml
+ # Axolotl includes hf-xet version 1.1.0, which fails during downloads. Replacing it with the latest version (1.1.2).
+ - pip uninstall -y hf-xet
+ - pip install hf-xet --no-cache-dir
+ - |
+ accelerate launch \
+ --config_file=fsdp1.yaml \
+ -m axolotl.cli.train qlora-fsdp-70b.yaml \
+ --hub-model-id $HUB_MODEL_ID \
+ --output-dir /checkpoints/qlora-llama3-70b \
+ --wandb-project $WANDB_PROJECT \
+ --wandb-name $DSTACK_RUN_NAME \
+ --main_process_ip=$DSTACK_MASTER_NODE_IP \
+ --main_process_port=8008 \
+ --machine_rank=$DSTACK_NODE_RANK \
+ --num_processes=$DSTACK_GPUS_NUM \
+ --num_machines=$DSTACK_NODES_NUM
+
+resources:
+ gpu: 80GB:8
+ shm_size: 128GB
+
+volumes:
+ - /checkpoints:/checkpoints
+```
+
+
+
+!!! info "Docker image"
+ We are using `nvcr.io/nvidia/pytorch:25.01-py3` from NGC because it includes the necessary libraries and packages for RDMA and InfiniBand support.
+
+### Run the configuration
+
+To run a configuration, use the [`dstack apply`](../../reference/cli/dstack/apply.md) command.
+
+
+
+```shell
+$ HF_TOKEN=...
+$ WANDB_API_KEY=...
+$ WANDB_PROJECT=...
+$ HUB_MODEL_ID=...
+$ dstack apply -f train-distrib.dstack.yml
+
+ # BACKEND RESOURCES INSTANCE TYPE PRICE
+ 1 ssh (remote) cpu=208 mem=1772GB H100:80GB:8 instance $0 idle
+ 2 ssh (remote) cpu=208 mem=1772GB H100:80GB:8 instance $0 idle
+
+Submit the run axolotl-multi-node-qlora-llama3-70b? [y/n]: y
+
+Provisioning...
+---> 100%
+```
+
+
+
+## What's next?
+
+1. Check [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md),
+ [services](../../concepts/services.md), and [fleets](../../concepts/fleets.md)
+2. Read about [cluster placement](../../concepts/fleets.md#cluster-placement)
+3. See the [AMD](../accelerators/amd.md#axolotl) example
diff --git a/examples/distributed-training/ray-ragen/README.md b/mkdocs/docs/examples/training/ray-ragen.md
similarity index 86%
rename from examples/distributed-training/ray-ragen/README.md
rename to mkdocs/docs/examples/training/ray-ragen.md
index f7bd80d5c2..73e8749e83 100644
--- a/examples/distributed-training/ray-ragen/README.md
+++ b/mkdocs/docs/examples/training/ray-ragen.md
@@ -11,7 +11,7 @@ to fine-tune an agent on multiple nodes.
Under the hood `RAGEN` uses [verl](https://github.com/volcengine/verl) for Reinforcement Learning and [Ray](https://docs.ray.io/en/latest/) for distributed training.
!!! info "Prerequisites"
- Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](https://dstack.ai/docs/concepts/fleets#cluster-placement) or an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh-placement)).
+ Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](../../concepts/fleets.md#cluster-placement) or an [SSH fleet](../../concepts/fleets.md#ssh-placement)).
## Run a Ray cluster
@@ -19,11 +19,11 @@ If you want to use Ray with `dstack`, you have to first run a Ray cluster.
The task below runs a Ray cluster on an existing fleet:
-
+
```yaml
type: task
-name: ray-ragen-cluster
+name: ray-cluster
nodes: 2
@@ -76,7 +76,7 @@ Now, if you run this task via `dstack apply`, it will automatically forward the
```shell
-$ dstack apply -f examples/distributed-training/ray-ragen/.dstack.yml
+$ dstack apply -f ray-cluster.dstack.yml
```
@@ -130,6 +130,5 @@ $ ray job submit \
Using Ray via `dstack` is a powerful way to get access to the rich Ray ecosystem while benefiting from `dstack`'s provisioning capabilities.
!!! info "What's next"
- 1. Check the [Clusters](https://dstack.ai/docs/guides/clusters) guide
- 2. Read about [distributed tasks](https://dstack.ai/docs/concepts/tasks#distributed-tasks) and [fleets](https://dstack.ai/docs/concepts/fleets)
- 3. Browse Ray's [docs](https://docs.ray.io/en/latest/train/examples.html) for other examples.
+ 1. Read about [distributed tasks](../../concepts/tasks.md#distributed-tasks), [fleets](../../concepts/fleets.md), and [cluster placement](../../concepts/fleets.md#cluster-placement)
+ 2. Browse Ray's [docs](https://docs.ray.io/en/latest/train/examples.html) for other examples.
diff --git a/mkdocs/docs/examples/training/trl.md b/mkdocs/docs/examples/training/trl.md
new file mode 100644
index 0000000000..ffeb3766f8
--- /dev/null
+++ b/mkdocs/docs/examples/training/trl.md
@@ -0,0 +1,272 @@
+---
+title: TRL
+description: Fine-tuning Llama with TRL — single-node SFT with QLoRA, or distributed across multiple nodes with FSDP and DeepSpeed
+---
+
+# TRL
+
+This example walks you through how to use [TRL](https://github.com/huggingface/trl) with `dstack` to fine-tune `Llama-3.1-8B` — on a single node with SFT and QLoRA, or distributed across multiple nodes with [Accelerate](https://github.com/huggingface/accelerate) and [DeepSpeed](https://github.com/deepspeedai/DeepSpeed).
+
+## Single-node training
+
+### Define a configuration
+
+Below is a task configuration that does fine-tuning.
+
+
+
+```yaml
+type: task
+name: trl-train
+
+python: 3.12
+# Ensure nvcc is installed (req. for Flash Attention)
+nvcc: true
+
+env:
+ - HF_TOKEN
+ - WANDB_API_KEY
+ - HUB_MODEL_ID
+commands:
+ # Pin torch==2.6.0 to avoid building Flash Attention from source.
+ # Prebuilt Flash Attention wheels are not available for the latest torch==2.7.0.
+ - uv pip install torch==2.6.0
+ - uv pip install transformers bitsandbytes peft wandb
+ - uv pip install flash_attn --no-build-isolation
+ - git clone https://github.com/huggingface/trl
+ - cd trl
+ - uv pip install .
+ - |
+ accelerate launch \
+ --config_file=examples/accelerate_configs/multi_gpu.yaml \
+ --num_processes $DSTACK_GPUS_PER_NODE \
+ trl/scripts/sft.py \
+ --model_name meta-llama/Meta-Llama-3.1-8B \
+ --dataset_name OpenAssistant/oasst_top1_2023-08-25 \
+ --dataset_text_field="text" \
+ --per_device_train_batch_size 1 \
+ --per_device_eval_batch_size 1 \
+ --gradient_accumulation_steps 4 \
+ --learning_rate 2e-4 \
+ --report_to wandb \
+ --bf16 \
+ --max_seq_length 1024 \
+ --lora_r 16 \
+ --lora_alpha 32 \
+ --lora_target_modules q_proj k_proj v_proj o_proj \
+ --load_in_4bit \
+ --use_peft \
+ --attn_implementation "flash_attention_2" \
+ --logging_steps=10 \
+ --output_dir models/llama31 \
+ --hub_model_id peterschmidt85/FineLlama-3.1-8B
+
+resources:
+ gpu:
+ # 24GB or more VRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
+ # Shared memory (for multi-gpu)
+ shm_size: 24GB
+```
+
+
+
+Change the `resources` property to specify more GPUs.
+
+!!! info "AMD"
+ The example above uses NVIDIA accelerators. To use it with AMD, check out [AMD](../accelerators/amd.md#trl).
+
+??? info "DeepSpeed"
+ For more memory-efficient use of multiple GPUs, consider using DeepSpeed and ZeRO Stage 3.
+
+ To do this, use the `examples/accelerate_configs/deepspeed_zero3.yaml` configuration file instead of
+ `examples/accelerate_configs/multi_gpu.yaml`.
+
+### Run the configuration
+
+Once the configuration is ready, run `dstack apply -f
`, and `dstack` will automatically provision the
+cloud resources and run the configuration.
+
+
+
+```shell
+$ HF_TOKEN=...
+$ WANDB_API_KEY=...
+$ HUB_MODEL_ID=...
+$ dstack apply -f train.dstack.yml
+
+ # BACKEND RESOURCES INSTANCE TYPE PRICE
+ 1 vastai (cz-czechia) cpu=64 mem=128GB H100:80GB:2 18794506 $3.8907
+ 2 vastai (us-texas) cpu=52 mem=64GB H100:80GB:2 20442365 $3.6926
+ 3 vastai (fr-france) cpu=64 mem=96GB H100:80GB:2 20379984 $3.7389
+
+Submit the run trl-train? [y/n]:
+
+Provisioning...
+---> 100%
+```
+
+
+
+## Distributed training
+
+!!! info "Prerequisites"
+ Before running a distributed task, make sure to create a fleet with `placement` set to `cluster` (can be a [managed fleet](../../concepts/fleets.md#cluster-placement) or an [SSH fleet](../../concepts/fleets.md#ssh-placement)).
+
+### Define a configuration
+
+Once the fleet is created, define a distributed task configuration. Here's an example using either FSDP or DeepSpeed ZeRO-3.
+
+=== "FSDP"
+
+
+
+ ```yaml
+ type: task
+ name: trl-train-fsdp-distrib
+
+ nodes: 2
+
+ image: nvcr.io/nvidia/pytorch:25.01-py3
+
+ env:
+ - HF_TOKEN
+ - ACCELERATE_LOG_LEVEL=info
+ - WANDB_API_KEY
+ - MODEL_ID=meta-llama/Llama-3.1-8B
+ - HUB_MODEL_ID
+
+ commands:
+ - pip install transformers bitsandbytes peft wandb
+ - git clone https://github.com/huggingface/trl
+ - cd trl
+ - pip install .
+ - |
+ accelerate launch \
+ --config_file=examples/accelerate_configs/fsdp1.yaml \
+ --main_process_ip=$DSTACK_MASTER_NODE_IP \
+ --main_process_port=8008 \
+ --machine_rank=$DSTACK_NODE_RANK \
+ --num_processes=$DSTACK_GPUS_NUM \
+ --num_machines=$DSTACK_NODES_NUM \
+ trl/scripts/sft.py \
+ --model_name $MODEL_ID \
+ --dataset_name OpenAssistant/oasst_top1_2023-08-25 \
+ --dataset_text_field="text" \
+ --per_device_train_batch_size 1 \
+ --per_device_eval_batch_size 1 \
+ --gradient_accumulation_steps 4 \
+ --learning_rate 2e-4 \
+ --report_to wandb \
+ --bf16 \
+ --max_seq_length 1024 \
+ --attn_implementation flash_attention_2 \
+ --logging_steps=10 \
+ --output_dir /checkpoints/llama31-ft \
+ --hub_model_id $HUB_MODEL_ID \
+ --torch_dtype bfloat16
+
+ resources:
+ gpu: 80GB:8
+ shm_size: 128GB
+
+ volumes:
+ - /checkpoints:/checkpoints
+ ```
+
+
+
+=== "DeepSpeed ZeRO-3"
+
+
+
+ ```yaml
+ type: task
+ name: trl-train-deepspeed-distrib
+
+ nodes: 2
+
+ image: nvcr.io/nvidia/pytorch:25.01-py3
+
+ env:
+ - HF_TOKEN
+ - WANDB_API_KEY
+ - HUB_MODEL_ID
+ - MODEL_ID=meta-llama/Llama-3.1-8B
+ - ACCELERATE_LOG_LEVEL=info
+
+ commands:
+ - pip install transformers bitsandbytes peft wandb deepspeed
+ - git clone https://github.com/huggingface/trl
+ - cd trl
+ - pip install .
+ - |
+ accelerate launch \
+ --config_file=examples/accelerate_configs/deepspeed_zero3.yaml \
+ --main_process_ip=$DSTACK_MASTER_NODE_IP \
+ --main_process_port=8008 \
+ --machine_rank=$DSTACK_NODE_RANK \
+ --num_processes=$DSTACK_GPUS_NUM \
+ --num_machines=$DSTACK_NODES_NUM \
+ trl/scripts/sft.py \
+ --model_name $MODEL_ID \
+ --dataset_name OpenAssistant/oasst_top1_2023-08-25 \
+ --dataset_text_field="text" \
+ --per_device_train_batch_size 1 \
+ --per_device_eval_batch_size 1 \
+ --gradient_accumulation_steps 4 \
+ --learning_rate 2e-4 \
+ --report_to wandb \
+ --bf16 \
+ --max_seq_length 1024 \
+ --attn_implementation flash_attention_2 \
+ --logging_steps=10 \
+ --output_dir /checkpoints/llama31-ft \
+ --hub_model_id $HUB_MODEL_ID \
+ --torch_dtype bfloat16
+
+ resources:
+ gpu: 80GB:8
+ shm_size: 128GB
+
+ volumes:
+ - /checkpoints:/checkpoints
+ ```
+
+
+
+!!! info "Docker image"
+ We are using `nvcr.io/nvidia/pytorch:25.01-py3` from NGC because it includes the necessary libraries and packages for RDMA and InfiniBand support.
+
+### Run the configuration
+
+To run a configuration, use the [`dstack apply`](../../reference/cli/dstack/apply.md) command.
+
+
+
+```shell
+$ HF_TOKEN=...
+$ WANDB_API_KEY=...
+$ HUB_MODEL_ID=...
+$ dstack apply -f train-distrib.dstack.yml
+
+ # BACKEND RESOURCES INSTANCE TYPE PRICE
+ 1 ssh (remote) cpu=208 mem=1772GB H100:80GB:8 instance $0 idle
+ 2 ssh (remote) cpu=208 mem=1772GB H100:80GB:8 instance $0 idle
+
+Submit the run trl-train-fsdp-distrib? [y/n]: y
+
+Provisioning...
+---> 100%
+```
+
+
+
+## What's next?
+
+1. Check [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md),
+ [services](../../concepts/services.md), and [fleets](../../concepts/fleets.md)
+2. Read about [cluster placement](../../concepts/fleets.md#cluster-placement)
+3. See the [AMD](../accelerators/amd.md#trl) example
diff --git a/docs/docs/guides/migration/slurm.md b/mkdocs/docs/guides/migration/slurm.md
similarity index 99%
rename from docs/docs/guides/migration/slurm.md
rename to mkdocs/docs/guides/migration/slurm.md
index 97b4546b58..2791075e8d 100644
--- a/docs/docs/guides/migration/slurm.md
+++ b/mkdocs/docs/guides/migration/slurm.md
@@ -1847,4 +1847,4 @@ fi
1. Check out [Quickstart](../../quickstart.md)
2. Read about [dev environments](../../concepts/dev-environments.md), [tasks](../../concepts/tasks.md), and [services](../../concepts/services.md)
-3. Browse the [examples](../../../examples.md)
+3. Browse the [examples](../../examples.md)
diff --git a/docs/docs/guides/protips.md b/mkdocs/docs/guides/protips.md
similarity index 100%
rename from docs/docs/guides/protips.md
rename to mkdocs/docs/guides/protips.md
diff --git a/docs/docs/guides/server-deployment.md b/mkdocs/docs/guides/server-deployment.md
similarity index 100%
rename from docs/docs/guides/server-deployment.md
rename to mkdocs/docs/guides/server-deployment.md
diff --git a/docs/docs/guides/troubleshooting.md b/mkdocs/docs/guides/troubleshooting.md
similarity index 100%
rename from docs/docs/guides/troubleshooting.md
rename to mkdocs/docs/guides/troubleshooting.md
diff --git a/docs/docs/guides/upgrade.md b/mkdocs/docs/guides/upgrade.md
similarity index 100%
rename from docs/docs/guides/upgrade.md
rename to mkdocs/docs/guides/upgrade.md
diff --git a/docs/docs/index.md b/mkdocs/docs/index.md
similarity index 100%
rename from docs/docs/index.md
rename to mkdocs/docs/index.md
diff --git a/docs/docs/installation.md b/mkdocs/docs/installation.md
similarity index 100%
rename from docs/docs/installation.md
rename to mkdocs/docs/installation.md
diff --git a/docs/docs/quickstart.md b/mkdocs/docs/quickstart.md
similarity index 99%
rename from docs/docs/quickstart.md
rename to mkdocs/docs/quickstart.md
index 80a98f79bf..da37d46ded 100644
--- a/docs/docs/quickstart.md
+++ b/mkdocs/docs/quickstart.md
@@ -277,5 +277,5 @@ Something not working? See the [troubleshooting](guides/troubleshooting.md) guid
!!! info "What's next?"
1. Read about [backends](concepts/backends.md), [dev environments](concepts/dev-environments.md), [tasks](concepts/tasks.md), [services](concepts/services.md), and [fleets](concepts/services.md)
- 2. Browse [examples](../examples.md)
+ 2. Browse [examples](examples.md)
3. Join [Discord](https://discord.gg/u8SmfwPpMd)
diff --git a/docs/docs/reference/api/http/index.md b/mkdocs/docs/reference/api/http/index.md
similarity index 100%
rename from docs/docs/reference/api/http/index.md
rename to mkdocs/docs/reference/api/http/index.md
diff --git a/docs/docs/reference/api/python/index.md b/mkdocs/docs/reference/api/python/index.md
similarity index 100%
rename from docs/docs/reference/api/python/index.md
rename to mkdocs/docs/reference/api/python/index.md
diff --git a/docs/docs/reference/cli/dstack/apply.md b/mkdocs/docs/reference/cli/dstack/apply.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/apply.md
rename to mkdocs/docs/reference/cli/dstack/apply.md
diff --git a/docs/docs/reference/cli/dstack/attach.md b/mkdocs/docs/reference/cli/dstack/attach.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/attach.md
rename to mkdocs/docs/reference/cli/dstack/attach.md
diff --git a/docs/docs/reference/cli/dstack/delete.md b/mkdocs/docs/reference/cli/dstack/delete.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/delete.md
rename to mkdocs/docs/reference/cli/dstack/delete.md
diff --git a/docs/docs/reference/cli/dstack/event.md b/mkdocs/docs/reference/cli/dstack/event.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/event.md
rename to mkdocs/docs/reference/cli/dstack/event.md
diff --git a/docs/docs/reference/cli/dstack/export.md b/mkdocs/docs/reference/cli/dstack/export.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/export.md
rename to mkdocs/docs/reference/cli/dstack/export.md
diff --git a/docs/docs/reference/cli/dstack/fleet.md b/mkdocs/docs/reference/cli/dstack/fleet.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/fleet.md
rename to mkdocs/docs/reference/cli/dstack/fleet.md
diff --git a/docs/docs/reference/cli/dstack/gateway.md b/mkdocs/docs/reference/cli/dstack/gateway.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/gateway.md
rename to mkdocs/docs/reference/cli/dstack/gateway.md
diff --git a/docs/docs/reference/cli/dstack/import.md b/mkdocs/docs/reference/cli/dstack/import.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/import.md
rename to mkdocs/docs/reference/cli/dstack/import.md
diff --git a/docs/docs/reference/cli/dstack/init.md b/mkdocs/docs/reference/cli/dstack/init.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/init.md
rename to mkdocs/docs/reference/cli/dstack/init.md
diff --git a/docs/docs/reference/cli/dstack/login.md b/mkdocs/docs/reference/cli/dstack/login.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/login.md
rename to mkdocs/docs/reference/cli/dstack/login.md
diff --git a/docs/docs/reference/cli/dstack/logs.md b/mkdocs/docs/reference/cli/dstack/logs.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/logs.md
rename to mkdocs/docs/reference/cli/dstack/logs.md
diff --git a/docs/docs/reference/cli/dstack/metrics.md b/mkdocs/docs/reference/cli/dstack/metrics.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/metrics.md
rename to mkdocs/docs/reference/cli/dstack/metrics.md
diff --git a/docs/docs/reference/cli/dstack/offer.md b/mkdocs/docs/reference/cli/dstack/offer.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/offer.md
rename to mkdocs/docs/reference/cli/dstack/offer.md
diff --git a/docs/docs/reference/cli/dstack/project.md b/mkdocs/docs/reference/cli/dstack/project.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/project.md
rename to mkdocs/docs/reference/cli/dstack/project.md
diff --git a/docs/docs/reference/cli/dstack/ps.md b/mkdocs/docs/reference/cli/dstack/ps.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/ps.md
rename to mkdocs/docs/reference/cli/dstack/ps.md
diff --git a/docs/docs/reference/cli/dstack/secret.md b/mkdocs/docs/reference/cli/dstack/secret.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/secret.md
rename to mkdocs/docs/reference/cli/dstack/secret.md
diff --git a/docs/docs/reference/cli/dstack/server.md b/mkdocs/docs/reference/cli/dstack/server.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/server.md
rename to mkdocs/docs/reference/cli/dstack/server.md
diff --git a/docs/docs/reference/cli/dstack/stop.md b/mkdocs/docs/reference/cli/dstack/stop.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/stop.md
rename to mkdocs/docs/reference/cli/dstack/stop.md
diff --git a/docs/docs/reference/cli/dstack/volume.md b/mkdocs/docs/reference/cli/dstack/volume.md
similarity index 100%
rename from docs/docs/reference/cli/dstack/volume.md
rename to mkdocs/docs/reference/cli/dstack/volume.md
diff --git a/docs/docs/reference/dstack.yml.md b/mkdocs/docs/reference/dstack.yml.md
similarity index 100%
rename from docs/docs/reference/dstack.yml.md
rename to mkdocs/docs/reference/dstack.yml.md
diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/mkdocs/docs/reference/dstack.yml/dev-environment.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/dev-environment.md
rename to mkdocs/docs/reference/dstack.yml/dev-environment.md
diff --git a/docs/docs/reference/dstack.yml/fleet.md b/mkdocs/docs/reference/dstack.yml/fleet.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/fleet.md
rename to mkdocs/docs/reference/dstack.yml/fleet.md
diff --git a/docs/docs/reference/dstack.yml/gateway.md b/mkdocs/docs/reference/dstack.yml/gateway.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/gateway.md
rename to mkdocs/docs/reference/dstack.yml/gateway.md
diff --git a/docs/docs/reference/dstack.yml/service.md b/mkdocs/docs/reference/dstack.yml/service.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/service.md
rename to mkdocs/docs/reference/dstack.yml/service.md
diff --git a/docs/docs/reference/dstack.yml/task.md b/mkdocs/docs/reference/dstack.yml/task.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/task.md
rename to mkdocs/docs/reference/dstack.yml/task.md
diff --git a/docs/docs/reference/dstack.yml/volume.md b/mkdocs/docs/reference/dstack.yml/volume.md
similarity index 100%
rename from docs/docs/reference/dstack.yml/volume.md
rename to mkdocs/docs/reference/dstack.yml/volume.md
diff --git a/docs/docs/reference/environment-variables.md b/mkdocs/docs/reference/environment-variables.md
similarity index 100%
rename from docs/docs/reference/environment-variables.md
rename to mkdocs/docs/reference/environment-variables.md
diff --git a/docs/docs/reference/plugins/python/index.md b/mkdocs/docs/reference/plugins/python/index.md
similarity index 100%
rename from docs/docs/reference/plugins/python/index.md
rename to mkdocs/docs/reference/plugins/python/index.md
diff --git a/docs/docs/reference/plugins/rest/index.md b/mkdocs/docs/reference/plugins/rest/index.md
similarity index 100%
rename from docs/docs/reference/plugins/rest/index.md
rename to mkdocs/docs/reference/plugins/rest/index.md
diff --git a/docs/docs/reference/profiles.yml.md b/mkdocs/docs/reference/profiles.yml.md
similarity index 100%
rename from docs/docs/reference/profiles.yml.md
rename to mkdocs/docs/reference/profiles.yml.md
diff --git a/docs/docs/reference/server/config.yml.md b/mkdocs/docs/reference/server/config.yml.md
similarity index 100%
rename from docs/docs/reference/server/config.yml.md
rename to mkdocs/docs/reference/server/config.yml.md
diff --git a/docs/index.md b/mkdocs/index.md
similarity index 100%
rename from docs/index.md
rename to mkdocs/index.md
diff --git a/docs/layouts/custom.yml b/mkdocs/layouts/custom.yml
similarity index 100%
rename from docs/layouts/custom.yml
rename to mkdocs/layouts/custom.yml
diff --git a/docs/overrides/.icons/custom/colored/discord.svg b/mkdocs/overrides/.icons/custom/colored/discord.svg
similarity index 100%
rename from docs/overrides/.icons/custom/colored/discord.svg
rename to mkdocs/overrides/.icons/custom/colored/discord.svg
diff --git a/docs/overrides/.icons/custom/colored/github.svg b/mkdocs/overrides/.icons/custom/colored/github.svg
similarity index 100%
rename from docs/overrides/.icons/custom/colored/github.svg
rename to mkdocs/overrides/.icons/custom/colored/github.svg
diff --git a/docs/overrides/.icons/custom/colored/twitter.svg b/mkdocs/overrides/.icons/custom/colored/twitter.svg
similarity index 100%
rename from docs/overrides/.icons/custom/colored/twitter.svg
rename to mkdocs/overrides/.icons/custom/colored/twitter.svg
diff --git a/docs/overrides/.icons/custom/github.svg b/mkdocs/overrides/.icons/custom/github.svg
similarity index 100%
rename from docs/overrides/.icons/custom/github.svg
rename to mkdocs/overrides/.icons/custom/github.svg
diff --git a/docs/overrides/assets/images/github-logo.png b/mkdocs/overrides/assets/images/github-logo.png
similarity index 100%
rename from docs/overrides/assets/images/github-logo.png
rename to mkdocs/overrides/assets/images/github-logo.png
diff --git a/docs/overrides/assets/images/hero.svg b/mkdocs/overrides/assets/images/hero.svg
similarity index 100%
rename from docs/overrides/assets/images/hero.svg
rename to mkdocs/overrides/assets/images/hero.svg
diff --git a/docs/overrides/assets/images/new.svg b/mkdocs/overrides/assets/images/new.svg
similarity index 100%
rename from docs/overrides/assets/images/new.svg
rename to mkdocs/overrides/assets/images/new.svg
diff --git a/docs/overrides/assets/images/quotes/alvarobartt.jpg b/mkdocs/overrides/assets/images/quotes/alvarobartt.jpg
similarity index 100%
rename from docs/overrides/assets/images/quotes/alvarobartt.jpg
rename to mkdocs/overrides/assets/images/quotes/alvarobartt.jpg
diff --git a/docs/overrides/assets/images/quotes/chansung.jpg b/mkdocs/overrides/assets/images/quotes/chansung.jpg
similarity index 100%
rename from docs/overrides/assets/images/quotes/chansung.jpg
rename to mkdocs/overrides/assets/images/quotes/chansung.jpg
diff --git a/docs/overrides/assets/images/quotes/cudopete.png b/mkdocs/overrides/assets/images/quotes/cudopete.png
similarity index 100%
rename from docs/overrides/assets/images/quotes/cudopete.png
rename to mkdocs/overrides/assets/images/quotes/cudopete.png
diff --git a/docs/overrides/assets/images/quotes/eckart.png b/mkdocs/overrides/assets/images/quotes/eckart.png
similarity index 100%
rename from docs/overrides/assets/images/quotes/eckart.png
rename to mkdocs/overrides/assets/images/quotes/eckart.png
diff --git a/docs/overrides/assets/images/quotes/movchan.jpg b/mkdocs/overrides/assets/images/quotes/movchan.jpg
similarity index 100%
rename from docs/overrides/assets/images/quotes/movchan.jpg
rename to mkdocs/overrides/assets/images/quotes/movchan.jpg
diff --git a/docs/overrides/assets/images/quotes/spott.jpg b/mkdocs/overrides/assets/images/quotes/spott.jpg
similarity index 100%
rename from docs/overrides/assets/images/quotes/spott.jpg
rename to mkdocs/overrides/assets/images/quotes/spott.jpg
diff --git a/docs/overrides/assets/images/slack.png b/mkdocs/overrides/assets/images/slack.png
similarity index 100%
rename from docs/overrides/assets/images/slack.png
rename to mkdocs/overrides/assets/images/slack.png
diff --git a/docs/overrides/assets/images/twitter.png b/mkdocs/overrides/assets/images/twitter.png
similarity index 100%
rename from docs/overrides/assets/images/twitter.png
rename to mkdocs/overrides/assets/images/twitter.png
diff --git a/docs/overrides/header-2.html b/mkdocs/overrides/header-2.html
similarity index 100%
rename from docs/overrides/header-2.html
rename to mkdocs/overrides/header-2.html
diff --git a/docs/overrides/header.html b/mkdocs/overrides/header.html
similarity index 100%
rename from docs/overrides/header.html
rename to mkdocs/overrides/header.html
diff --git a/docs/overrides/home.html b/mkdocs/overrides/home.html
similarity index 100%
rename from docs/overrides/home.html
rename to mkdocs/overrides/home.html
diff --git a/docs/overrides/landing.html b/mkdocs/overrides/landing.html
similarity index 100%
rename from docs/overrides/landing.html
rename to mkdocs/overrides/landing.html
diff --git a/docs/overrides/main.html b/mkdocs/overrides/main.html
similarity index 95%
rename from docs/overrides/main.html
rename to mkdocs/overrides/main.html
index 3ae52c2be3..342e29303b 100644
--- a/docs/overrides/main.html
+++ b/mkdocs/overrides/main.html
@@ -223,11 +223,11 @@