```yaml
type: fleet
- name: my-fleet
+ # The name is optional, if not specified, generated randomly
+ name: ah-fleet-distrib
+ # Size of the cluster
nodes: 2
+ # Ensure instances are interconnected
placement: cluster
- backends: [aws]
+ # Use either spot or on-demand instances
+ spot_policy: auto
resources:
- gpu: 24GB
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
```
@@ -41,14 +49,17 @@ are both acceptable).
To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH.
-
+
```yaml
type: fleet
- name: my-fleet
+ # The name is optional, if not specified, generated randomly
+ name: my-on-prem-fleet
+ # Ensure instances are interconnected
placement: cluster
+ # The user, private SSH key, and hostnames of the on-prem servers
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
@@ -65,21 +76,22 @@ are both acceptable).
Set `placement` to `cluster` if the nodes are interconnected (e.g. if you'd like to use them for multi-node tasks).
In that case, by default, `dstack` will automatically detect the private network.
- You can specify the [`network`](../reference/dstack.yml/fleet.md#network) parameter manually.
+ You can specify the [`network`](reference/dstack.yml/fleet.md#network) parameter manually.
!!! info "Reference"
- See the [.dstack.yml reference](reference/dstack.yml/fleet.md)
- for all supported configuration options and examples.
+ See [.dstack.yml](reference/dstack.yml/fleet.md) for all the options supported by
+ fleets, along with multiple examples.
-## Creating and updating fleets
+## Create or update a fleet
To create or update the fleet, simply call the [`dstack apply`](reference/cli/index.md#dstack-apply) command:
```shell
-$ dstack apply -f examples/fleets/cluster.dstack.yml
-Fleet my-fleet does not exist yet. Create the fleet? [y/n]: y
+$ dstack apply -f examples/fine-tuning/alignment-handbook/fleet-distributed.dstack.yml
+Fleet ah-fleet-distrib does not exist yet. Create the fleet? [y/n]: y
+
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED
my-fleet 0 pending now
1 pending now
@@ -87,22 +99,26 @@ Fleet my-fleet does not exist yet. Create the fleet? [y/n]: y
-Once the status of instances change to `idle`, they can be used by `dstack run`.
+Once the status of instances change to `idle`, they can be used by dev environments, tasks, and services.
## Creation policy
-> By default, `dstack run` tries to reuse `idle` instances from existing fleets.
-If no `idle` instances meet the requirements, `dstack run` creates a new fleet automatically.
-To avoid creating new fleet, specify pass `--reuse` to `dstack run`.
+By default, when running dev environments, tasks, and services, `dstack apply` tries to reuse `idle`
+instances from existing fleets.
+If no `idle` instances meet the requirements, it creates a new fleet automatically.
+To avoid creating new fleet, specify pass `--reuse` to `dstack apply` or (or set [
+`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the configuration).
## Termination policy
-> If you want a fleet to be automatically deleted after a certain idle time, you can set the
-you can set the [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) property.
+> If you want a fleet to be automatically deleted after a certain idle time, you can set the
+> you can set the [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) property.
+
+[//]: # (Add Idle time example to the reference page)
-## Managing fleets
+## Manage fleets
-### Listing fleets
+### List fleets
The [`dstack fleet`](reference/cli/index.md#dstack-gateway-list) command lists fleet instances and theri status:
@@ -117,7 +133,7 @@ $ dstack fleet
-### Deleting fleets
+### Delete fleets
When a fleet isn't used by run, you can delete it via `dstack delete`:
@@ -133,4 +149,14 @@ Fleet my-gcp-fleet deleted
You can pass either the path to the configuration file or the fleet name directly.
-To terminate and delete specific instances from a fleet, pass `-i INSTANCE_NUM`.
\ No newline at end of file
+To terminate and delete specific instances from a fleet, pass `-i INSTANCE_NUM`.
+
+## What's next?
+
+1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), and
+ [services](services.md)
+2. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
+
+!!! info "Reference"
+ See [.dstack.yml](reference/dstack.yml/fleet.md) for all the options supported by
+ fleets, along with multiple examples.
\ No newline at end of file
diff --git a/docs/docs/guides/dstack-sky.md b/docs/docs/guides/dstack-sky.md
new file mode 100644
index 000000000..345b6276d
--- /dev/null
+++ b/docs/docs/guides/dstack-sky.md
@@ -0,0 +1,44 @@
+# dstack Sky
+
+If you don't want to host the `dstack` server or would like to access GPU from the `dstack` marketplace,
+sign up with [dstack Sky](../guides/dstack-sky.md).
+
+### Set up the CLI
+
+If you've signed up, open your project settings, and copy the `dstack config` command to point the CLI to the project.
+
+{ width=800 }
+
+Then, install the CLI on your machine and use the copied command.
+
+
+
+```shell
+$ pip install dstack
+$ dstack config --url https://sky.dstack.ai \
+ --project peterschmidt85 \
+ --token bbae0f28-d3dd-4820-bf61-8f4bb40815da
+
+Configuration is updated at ~/.dstack/config.yml
+```
+
+
+
+### Configure clouds
+
+By default, [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}
+uses the GPU from its marketplace, which requires a credit card to be attached in your account
+settings.
+
+To use your own cloud accounts, click the settings icon of the corresponding backend and specify credentials:
+
+{ width=800 }
+
+For more details on how to configure your own cloud accounts, check
+the [server/config.yml reference](../reference/server/config.yml.md).
+
+## What's next?
+
+1. Follow [quickstart](../quickstart.md)
+2. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples)
+3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
diff --git a/docs/docs/index.md b/docs/docs/index.md
index 49fdb0ddb..44299a711 100644
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@@ -1,13 +1,12 @@
# What is dstack?
-`dstack` is an open-source container orchestration engine for AI.
-It accelerates the development, training, and deployment of AI models, and simplifies the management of clusters.
+`dstack` is a lightweight alternative to Kubernetes, designed specifically for managing the development, training, and
+deployment of AI models at any scale.
-#### Cloud and on-prem
+`dstack` is easy to use with any cloud provider (AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, etc.) or
+any on-prem clusters.
-`dstack` is easy to use with any cloud or on-prem servers.
-Supported cloud providers include AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, and CUDO.
-For using `dstack` with on-prem servers, see [fleets](fleets.md#__tabbed_1_2).
+If you already use Kubernetes, `dstack` can be used with it.
#### Accelerators
@@ -15,35 +14,31 @@ For using `dstack` with on-prem servers, see [fleets](fleets.md#__tabbed_1_2).
## How does it work?
-> Before using `dstack`, [install](installation/index.md) the server and configure
-backends for each cloud account (or Kubernetes cluster) that you intend to use.
+> Before using `dstack`, [install](installation/index.md) the server and configure backends.
-#### 1. Define run configurations
+#### 1. Define configurations
-`dstack` supports three types of run configurations:
+`dstack` supports the following configurations:
* [Dev environments](dev-environments.md) — for interactive development using a desktop IDE
-* [Tasks](tasks.md) — for any kind of batch jobs or web applications (supports distributed jobs)
-* [Services](services.md)— for production-grade deployment (supports auto-scaling and authorization)
-
-Each type of run configuration allows you to specify commands for execution, required compute resources, retry policies, auto-scaling rules, authorization settings, and more.
+* [Tasks](tasks.md) — for scheduling jobs (incl. distributed jobs) or running web apps
+* [Services](services.md) — for deployment of models and web apps (with auto-scaling and authorization)
+* [Fleets](fleets.md) — for managing cloud and on-prem clusters
+* [Volumes](concepts/volumes.md) — for managing persisted volumes
+* [Gateways](concepts/volumes.md) — for configuring the ingress traffic and public endpoints
Configuration can be defined as YAML files within your repo.
-#### 2. Run configurations
-
-Run any defined configuration either via `dstack` CLI or API.
-
-`dstack` automatically handles provisioning, interruptions, port-forwarding, auto-scaling, network, volumes,
-run failures, out-of-capacity errors, and more.
+#### 2. Apply configurations
-#### 3. Manage fleets
+Apply the configuration either via the `dstack apply` CLI command or through a programmatic API.
-Use [fleets](fleets.md) to provision and manage clusters and instances, both in the cloud and on-prem.
+`dstack` automatically manages provisioning, job queuing, auto-scaling, networking, volumes, run failures,
+out-of-capacity errors, port-forwarding, and more — across clouds and on-prem clusters.
## Where do I start?
1. Proceed to [installation](installation/index.md)
2. See [quickstart](quickstart.md)
-3. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"}
+3. Browse [examples](/docs/examples)
4. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
\ No newline at end of file
diff --git a/docs/docs/installation/index.md b/docs/docs/installation/index.md
index 46725cad5..8d984d441 100644
--- a/docs/docs/installation/index.md
+++ b/docs/docs/installation/index.md
@@ -13,7 +13,7 @@ Follow the steps below to set up the server.
### 1. Configure backends
-> If you want the `dstack` server to run containers or manage clusters in your cloud accounts (or use Kubernetes),
+If you want the `dstack` server to run containers or manage clusters in your cloud accounts (or use Kubernetes),
create the [~/.dstack/server/config.yml](../reference/server/config.yml.md) file and configure backends.
### 2. Start the server
@@ -55,16 +55,16 @@ Once the `~/.dstack/server/config.yml` file is configured, proceed to start the
> For more details on how to deploy `dstack` using Docker, check its [Docker repo](https://hub.docker.com/r/dstackai/dstack).
-> By default, the `dstack` server stores its state in `~/.dstack/server/data` using SQLite.
-> To use a database, set the [`DSTACK_DATABASE_URL`](../reference/cli/index.md#environment-variables) environment variable.
+By default, the `dstack` server stores its state in `~/.dstack/server/data` using SQLite.
+To use a database, set the [`DSTACK_DATABASE_URL`](../reference/cli/index.md#environment-variables) environment variable.
-The server can be set up anywhere: on your laptop, a dedicated server, or in the cloud.
-Once the `dstack` server is up, you can use the CLI or API.
+The `dstack` server can run anywhere: on your laptop, a dedicated server, or in the cloud. Once it's up, you
+can use either the CLI or the API.
### 3. Set up the CLI
To point the CLI to the `dstack` server, configure it
-with the server address, user token and project name:
+with the server address, user token, and project name:
@@ -81,55 +81,18 @@ Configuration is updated at ~/.dstack/config.yml
This configuration is stored in `~/.dstack/config.yml`.
-### 4. Add on-prem servers
+### 4. Create on-prem fleets
-!!! info "Fleets"
- If you want the `dstack` server to run containers on your on-prem servers,
- use [fleets](../fleets.md#__tabbed_1_2).
-
-## dstack Sky
-
-If you don't want to host the `dstack` server yourself or would like to access GPU from the `dstack` marketplace, sign up with
-[dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
-
-### Set up the CLI
-
-If you've signed up,
-open your project settings, and copy the `dstack config` command to point the CLI to the project.
-
-{ width=800 }
-
-Then, install the CLI on your machine and use the copied command.
-
-
-
-```shell
-$ pip install dstack
-$ dstack config --url https://sky.dstack.ai \
- --project peterschmidt85 \
- --token bbae0f28-d3dd-4820-bf61-8f4bb40815da
-
-Configuration is updated at ~/.dstack/config.yml
-```
-
-
-
-### Configure clouds
-
-By default, [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}
-uses the GPU from its marketplace, which requires a credit card to be attached in your account
-settings.
-
-To use your own cloud accounts, click the settings icon of the corresponding backend and specify credentials:
-
-{ width=800 }
-
-[//]: # (The `dstack server` command automatically updates `~/.dstack/config.yml`)
-[//]: # (with the `main` project.)
+If you want the `dstack` server to run containers on your on-prem servers,
+use [fleets](../fleets.md#__tabbed_1_2).
## What's next?
1. Check the [server/config.yml reference](../reference/server/config.yml.md) on how to configure backends
2. Follow [quickstart](../quickstart.md)
3. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples)
-4. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
+4. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
+
+!!! info "dstack Sky"
+ If you don't want to host the `dstack` server or would like to access GPU from the `dstack` marketplace,
+ check [dstack Sky](../guides/dstack-sky.md).
\ No newline at end of file
diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md
index 5f4700810..81e3fb9ce 100644
--- a/docs/docs/quickstart.md
+++ b/docs/docs/quickstart.md
@@ -1,7 +1,6 @@
# Quickstart
-> Before using `dstack`, [install](installation/index.md) the server and configure
-backends.
+> Before using `dstack`, [install](installation/index.md) the server.
## Initialize a repo
@@ -18,118 +17,220 @@ $ dstack init
Your folder can be a regular local folder or a Git repo.
-## Define a configuration
-
-Define what you want to run as a YAML file. The filename must end with `.dstack.yml` (e.g., `.dstack.yml`
-or `train.dstack.yml` are both acceptable).
+## Run a configuration
=== "Dev environment"
- Dev environments allow you to quickly provision a machine with a pre-configured environment, resources, IDE, code, etc.
+ A dev environment lets you provision a remote machine with your code, dependencies, and resources, and access it
+ with your desktop IDE.
+
+ ##### Define a configuration
+
+ Create the following configuration file inside the repo:
```yaml
type: dev-environment
-
- # Use either `python` or `image` to configure environment
+ # The name is optional, if not specified, generated randomly
+ name: vscode
+
python: "3.11"
- # image: ghcr.io/huggingface/text-generation-inference:latest
+ # Uncomment to use a custom Docker image
+ #image: dstackai/base:py3.10-0.4-cuda-12.1
ide: vscode
-
- # (Optional) Configure `gpu`, `memory`, `disk`, etc
- resources:
- gpu: 24GB
+
+ # Use either spot or on-demand instances
+ spot_policy: auto
+
+ # Uncomment to request resources
+ #resources:
+ # gpu: 24GB
```
+ ##### Run the configuration
+
+ Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply):
+
+
+
+ ```shell
+ $ dstack apply -f .dstack.yml
+
+ # BACKEND REGION RESOURCES SPOT PRICE
+ 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052
+ 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132
+ 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248
+
+ Submit the run vscode? [y/n]: y
+
+ Launching `vscode`...
+ ---> 100%
+
+ To open in VS Code Desktop, use this link:
+ vscode://vscode-remote/ssh-remote+vscode/workflow
+ ```
+
+
+
+ Open the link to access the dev environment using your desktop IDE.
+
=== "Task"
- Tasks make it very easy to run any scripts, be it for training, data processing, or web apps. They allow you to pre-configure the environment, resources, code, etc.
+ A task allows you to schedule a job or run a web app. It lets you configure
+ dependencies, resources, ports, the number of nodes (if you want to run the task on a cluster), etc.
-
+ ##### Define a configuration
+
+ Create the following configuration file inside the repo:
+
+
```yaml
type: task
-
+ # The name is optional, if not specified, generated randomly
+ name: streamlit
+
python: "3.11"
- env:
- - HF_HUB_ENABLE_HF_TRANSFER=1
+ # Uncomment to use a custom Docker image
+ #image: dstackai/base:py3.10-0.4-cuda-12.1
+
+ # Commands of the task
commands:
- - pip install -r fine-tuning/qlora/requirements.txt
- - python fine-tuning/qlora/train.py
-
- # (Optional) Configure `gpu`, `memory`, `disk`, etc
- resources:
- gpu: 24GB
+ - pip install streamlit
+ - streamlit hello
+ # Ports to forward
+ ports:
+ - 8501
+
+ # Use either spot or on-demand instances
+ spot_policy: auto
+
+ # Uncomment to request resources
+ #resources:
+ # gpu: 24GB
```
- Ensure `requirements.txt` and `train.py` are in your folder. You can take them from [`examples`](https://github.com/dstackai/dstack/tree/master/examples/fine-tuning/qlora).
+ ##### Run the configuration
+
+ Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply):
+
+
+
+ ```shell
+ $ dstack apply -f streamlit.dstack.yml
+
+ # BACKEND REGION RESOURCES SPOT PRICE
+ 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052
+ 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132
+ 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248
+
+ Submit the run streamlit? [y/n]: y
+
+ Continue? [y/n]: y
+
+ Provisioning `streamlit`...
+ ---> 100%
+
+ Welcome to Streamlit. Check out our demo in your browser.
+
+ Local URL: http://localhost:8501
+ ```
+
+
+
+ `dstack apply` automatically forwards the remote ports to `localhost` for convenient access.
=== "Service"
- Services make it easy to deploy models and apps cost-effectively as public endpoints, allowing you to use any frameworks.
+ A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure
+ dependencies, resources, authorizarion, auto-scaling rules, etc.
+
+ ??? info "Prerequisites
+ If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service.
-
+ If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
+ the gateway is already set up for you.
+
+ ##### Define a configuration
+
+ Create the following configuration file inside the repo:
+
+
```yaml
type: service
-
- image: ghcr.io/huggingface/text-generation-inference:latest
- env:
- - HUGGING_FACE_HUB_TOKEN # required to run gated models
- - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
+ # The name is optional, if not specified, generated randomly
+ name: streamlit-service
+
+ python: "3.11"
+ # Uncomment to use a custom Docker image
+ #image: dstackai/base:py3.10-0.4-cuda-12.1
+
+ # Commands of the service
commands:
- - text-generation-launcher --port 8000 --trust-remote-code
- port: 8000
+ - pip install streamlit
+ - streamlit hello
+ # Port of the service
+ port: 8501
- # (Optional) Configure `gpu`, `memory`, `disk`, etc
- resources:
- gpu: 24GB
+ # Comment to enable authorizartion
+ auth: False
+
+ # Use either spot or on-demand instances
+ spot_policy: auto
+
+ # Uncomment to request resources
+ #resources:
+ # gpu: 24GB
```
-## Run configuration
+ ##### Run the configuration
-Run a configuration using the [`dstack run`](reference/cli/index.md#dstack-run) command, followed by the working directory path (e.g., `.`),
-and the path to the configuration file.
+ Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply):
-
+
-```shell
-$ dstack run . -f train.dstack.yml
-
- BACKEND REGION RESOURCES SPOT PRICE
- tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
- azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673
- azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673
-
-Continue? [y/n]: y
-
-Provisioning...
----> 100%
+ ```shell
+ $ dstack apply -f streamlit.dstack.yml
+
+ # BACKEND REGION RESOURCES SPOT PRICE
+ 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052
+ 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132
+ 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248
+
+ Submit the run streamlit? [y/n]: y
+
+ Continue? [y/n]: y
+
+ Provisioning `streamlit`...
+ ---> 100%
-Epoch 0: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
-Epoch 1: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
-Epoch 2: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
-```
+ Welcome to Streamlit. Check out our demo in your browser.
-
+ Local URL: https://streamlit-service.example.com
+ ```
+
+
-The `dstack run` command automatically uploads your code, including any local uncommitted changes.
+ One the service is up, its endpoint is accessible at `https://
.`.
-!!! info "Fleets"
- By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md).
- If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends.
+> `dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes.
## What's next?
1. Read about [dev environments](dev-environments.md), [tasks](tasks.md),
[services](services.md), and [fleets](fleets.md)
2. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"}
-3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
\ No newline at end of file
+3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
+
+!!! info "Examples"
+ To see how dev environments, tasks, services, and fleets can be used for
+ training and deploying AI models, check out the [examples](examples/index.md).
\ No newline at end of file
diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/docs/docs/reference/dstack.yml/dev-environment.md
index 2a39c0788..c516099c2 100644
--- a/docs/docs/reference/dstack.yml/dev-environment.md
+++ b/docs/docs/reference/dstack.yml/dev-environment.md
@@ -2,9 +2,9 @@
The `dev-environment` configuration type allows running [dev environments](../../dev-environments.md).
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be run via [`dstack run`](../cli/index.md#dstack-run).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `dev.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
@@ -18,25 +18,46 @@ The `python` property determines which default Docker image is used.
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
-python: "3.11"
+# If `image` is not specified, dstack uses its base image
+python: "3.10"
ide: vscode
```
-!!! info "nvcc"
+??? info "nvcc"
Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels.
To install it, use `conda install cuda`.
+
+ ```yaml
+ type: dev-environment
+ # The name is optional, if not specified, generated randomly
+ name: vscode
+
+ python: "3.10"
+
+ ide: vscode
+
+ # Run this command on start
+ init:
+ - conda install cuda
+ ```
+
### Docker image
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
+# Any custom Docker image
image: ghcr.io/huggingface/text-generation-inference:latest
ide: vscode
@@ -50,8 +71,12 @@ ide: vscode
```yaml
type: dev-environment
-
+ # The name is optional, if not specified, generated randomly
+ name: vscode
+
+ # Any private Docker image
image: ghcr.io/huggingface/text-generation-inference:latest
+ # Credentials of the private Docker registry
registry_auth:
username: peterschmidt85
password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5
@@ -68,19 +93,19 @@ range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`).
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
ide: vscode
resources:
# 200GB or more RAM
memory: 200GB..
-
# 4 GPUs from 40GB to 80GB
gpu: 40GB..80GB:4
-
- # Shared memory
+ # Shared memory (required by multi-gpu)
shm_size: 16GB
-
+ # Disk size
disk: 500GB
```
@@ -96,6 +121,8 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10
```yaml
type: dev-environment
+ # The name is optional, if not specified, generated randomly
+ name: vscode
ide: vscode
@@ -115,7 +142,10 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
+# Environment variables
env:
- HUGGING_FACE_HUB_TOKEN
- HF_HUB_ENABLE_HF_TRANSFER=1
@@ -125,12 +155,12 @@ ide: vscode
-If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above),
+> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above),
`dstack` will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`.
-#### Default environment variables
+#### System environment variables
The following environment variables are available in any run and are passed by `dstack` by default:
@@ -148,9 +178,12 @@ You can choose whether to use spot instances, on-demand instances, or any availa
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
ide: vscode
+# Use either spot or on-demand instances
spot_policy: auto
```
@@ -166,9 +199,12 @@ By default, `dstack` provisions instances in all configured backends. However, y
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
ide: vscode
+# Use only listed backends
backends: [aws, gcp]
```
@@ -182,9 +218,12 @@ By default, `dstack` uses all configured regions. However, you can specify the l
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
ide: vscode
+# Use only listed regions
regions: [eu-west-1, eu-west-2]
```
@@ -199,9 +238,12 @@ To attach a volume, simply specify its name using the `volumes` property and spe
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: vscode
ide: vscode
+# Map the name of the volume to any path
volumes:
- name: my-new-volume
path: /volume_data
@@ -212,7 +254,7 @@ volumes:
Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the dev
environment, and its contents will persist across runs.
-!!! info "Limitations"
+??? info "Limitations"
When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents
to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to
attach volumes to `/workflow` or any of its subdirectories.
diff --git a/docs/docs/reference/dstack.yml/fleet.md b/docs/docs/reference/dstack.yml/fleet.md
index e763ccded..ccccd4c21 100644
--- a/docs/docs/reference/dstack.yml/fleet.md
+++ b/docs/docs/reference/dstack.yml/fleet.md
@@ -2,35 +2,52 @@
The `fleet` configuration type allows creating and updating fleets.
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `fleet.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `fleet.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
-### Creating a cloud fleet { #create-cloud-fleet }
+### Cloud fleet { #cloud-fleet }
-
+
```yaml
type: fleet
-name: my-gcp-fleet
+# The name is optional, if not specified, generated randomly
+name: my-fleet
+
+# The number of instances
nodes: 4
+# Ensure the instances are interconnected
placement: cluster
-backends: [gcp]
+
+# Use either spot or on-demand instances
+spot_policy: auto
+
resources:
- gpu: 1
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
```
-### Creating an on-prem fleet { #create-ssh-fleet }
+### On-prem fleet { #on-prem-fleet }
-
+
```yaml
type: fleet
-name: my-ssh-fleet
+# The name is optional, if not specified, generated randomly
+name: my-on-prem-fleet
+
+# Ensure instances are interconnected
+placement: cluster
+
+# The user, private SSH key, and hostnames of the on-prem servers
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
@@ -43,6 +60,8 @@ ssh_config:
[//]: # (TODO: a cluster, individual user and identity file, etc)
+[//]: # (TODO: other examples, for all properties like in dev-environment/task/service)
+
## Root reference
#SCHEMA# dstack._internal.core.models.fleets.FleetConfiguration
@@ -57,7 +76,6 @@ ssh_config:
overrides:
show_root_heading: false
-
## `ssh.hosts[n]`
#SCHEMA# dstack._internal.core.models.fleets.SSHHostParams
diff --git a/docs/docs/reference/dstack.yml/gateway.md b/docs/docs/reference/dstack.yml/gateway.md
index 66fc3c21f..c9b2cd4b9 100644
--- a/docs/docs/reference/dstack.yml/gateway.md
+++ b/docs/docs/reference/dstack.yml/gateway.md
@@ -2,25 +2,32 @@
The `gateway` configuration type allows creating and updating [gateways](../../services.md).
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `gateway.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `gateway.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
+### Creating a new gateway { #new-gateway }
+
```yaml
type: gateway
+# A name of the gateway
name: example-gateway
+# Gateways are bound to a specific backend and region
backend: aws
region: eu-west-1
+
+# This domain will be used to access the endpoint
domain: example.com
```
+[//]: # (TODO: other examples, e.g. private gateways)
## Root reference
diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md
index 9ca184d57..99c4101e5 100644
--- a/docs/docs/reference/dstack.yml/service.md
+++ b/docs/docs/reference/dstack.yml/service.md
@@ -2,9 +2,9 @@
The `service` configuration type allows running [services](../../services.md).
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be run via [`dstack run . -f PATH`](../cli/index.md#dstack-run).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `serve.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
@@ -14,16 +14,20 @@ If you don't specify `image`, `dstack` uses the default Docker image pre-configu
`python`, `pip`, `conda` (Miniforge), and essential CUDA drivers.
The `python` property determines which default Docker image is used.
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
-python: "3.11"
+# If `image` is not specified, dstack uses its base image
+python: "3.10"
+# Commands of the service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
```
@@ -31,20 +35,24 @@ port: 8000
!!! info "nvcc"
Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels.
- To install it, use `conda install cuda`.
+ To install it, use `conda install cuda` as the first command.
### Docker image
-
+
```yaml
type: service
+ # The name is optional, if not specified, generated randomly
+ name: http-server-service
+
+ # Any custom Docker image
+ image: dstackai/base:py3.10-0.4-cuda-12.1
- image: dstackai/base:py3.11-0.4-cuda-12.1
-
+ # Commands of the service
commands:
- python3 -m http.server
-
+ # The port of the service
port: 8000
```
@@ -56,47 +64,55 @@ port: 8000
```yaml
type: service
+ # The name is optional, if not specified, generated randomly
+ name: http-server-service
- image: dstackai/base:py3.11-0.4-cuda-12.1
-
- commands:
- - python3 -m http.server
+ # Any private Docker iamge
+ image: dstackai/base:py3.10-0.4-cuda-12.1
+ # Credentials of the private registry
registry_auth:
username: peterschmidt85
password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5
-
+
+ # Commands of the service
+ commands:
+ - python3 -m http.server
+ # The port of the service
port: 8000
```
-### OpenAI-compatible interface { #model-mapping }
+### Model gateway { #model-mapping }
By default, if you run a service, its endpoint is accessible at `https://
.`.
If you run a model, you can optionally configure the mapping to make it accessible via the
OpenAI-compatible interface.
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: llama31-service
-python: "3.11"
+python: "3.10"
-env:
- - MODEL=NousResearch/Llama-2-7b-chat-hf
+# Commands of the service
commands:
- - pip install vllm
- - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+ - install vllm==0.5.3.post1
+ - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+# Expose the port of the service
port: 8000
resources:
+ # Change to what is required
gpu: 24GB
-# Enable the OpenAI-compatible endpoint
+# Comment if you don't want to access the model via https://gateway.
model:
- format: openai
type: chat
- name: NousResearch/Llama-2-7b-chat-hf
+ name: meta-llama/Meta-Llama-3.1-8B-Instruct
+ format: openai
```
@@ -149,32 +165,32 @@ and `openai` (if you are using Text Generation Inference or vLLM with OpenAI-com
By default, `dstack` runs a single replica of the service.
You can configure the number of replicas as well as the auto-scaling rules.
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: llama31-service
-python: "3.11"
+python: "3.10"
-env:
- - MODEL=NousResearch/Llama-2-7b-chat-hf
+# Commands of the service
commands:
- - pip install vllm
- - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+ - install vllm==0.5.3.post1
+ - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+# Expose the port of the service
port: 8000
resources:
+ # Change to what is required
gpu: 24GB
-# Enable the OpenAI-compatible endpoint
-model:
- format: openai
- type: chat
- name: NousResearch/Llama-2-7b-chat-hf
-
+# Minimum and maximum number of replicas
replicas: 1..4
scaling:
+ # Requests per seconds
metric: rps
+ # Target metric value
target: 10
```
@@ -192,31 +208,31 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`).
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
+
+python: "3.10"
-python: "3.11"
+# Commands of the service
commands:
- pip install vllm
- python -m vllm.entrypoints.openai.api_server
--model mistralai/Mixtral-8X7B-Instruct-v0.1
--host 0.0.0.0
- --tensor-parallel-size 2 # Match the number of GPUs
+ --tensor-parallel-size $DSTACK_GPUS_NUM
+# Expose the port of the service
port: 8000
resources:
# 2 GPUs of 80GB
gpu: 80GB:2
+ # Minimum disk size
disk: 200GB
-
-# Enable the OpenAI-compatible endpoint
-model:
- type: chat
- name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
- format: openai
```
@@ -235,41 +251,51 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10
By default, the service endpoint requires the `Authorization` header with `"Bearer
"`.
Authorization can be disabled by setting `auth` to `false`.
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
-python: "3.11"
+# Disable authorization
+auth: false
+
+python: "3.10"
+# Commands of the service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
-
-auth: false
```
### Environment variables
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: llama-2-7b-service
-python: "3.11"
+python: "3.10"
+# Environment variables
env:
- HUGGING_FACE_HUB_TOKEN
- MODEL=NousResearch/Llama-2-7b-chat-hf
+# Commands of the service
commands:
- pip install vllm
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+# The port of the service
port: 8000
resources:
+ # Required GPU vRAM
gpu: 24GB
```
@@ -280,7 +306,7 @@ If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TO
For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`.
-#### Default environment variables
+#### System environment variables
The following environment variables are available in any run and are passed by `dstack` by default:
@@ -294,16 +320,19 @@ The following environment variables are available in any run and are passed by `
You can choose whether to use spot instances, on-demand instances, or any available type.
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
+# Use either spot or on-demand instances
spot_policy: auto
```
@@ -315,16 +344,20 @@ The `spot_policy` accepts `spot`, `on-demand`, and `auto`. The default for servi
By default, `dstack` provisions instances in all configured backends. However, you can specify the list of backends:
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
+# Commands of the service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
+# Use only listed backends
backends: [aws, gcp]
```
@@ -334,16 +367,20 @@ backends: [aws, gcp]
By default, `dstack` uses all configured regions. However, you can specify the list of regions:
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
+# Commands of the service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
+# Use only listed regions
regions: [eu-west-1, eu-west-2]
```
@@ -354,16 +391,20 @@ regions: [eu-west-1, eu-west-2]
Volumes allow you to persist data between runs.
To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents:
-
+
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: http-server-service
+# Commands of the service
commands:
- python3 -m http.server
-
+# The port of the service
port: 8000
+# Map the name of the volume to any path
volumes:
- name: my-new-volume
path: /volume_data
diff --git a/docs/docs/reference/dstack.yml/task.md b/docs/docs/reference/dstack.yml/task.md
index 0e0653e5f..330bfff08 100644
--- a/docs/docs/reference/dstack.yml/task.md
+++ b/docs/docs/reference/dstack.yml/task.md
@@ -2,9 +2,9 @@
The `task` configuration type allows running [tasks](../../tasks.md).
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be run via [`dstack run`](../cli/index.md#dstack-run).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `train.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
@@ -18,9 +18,13 @@ The `python` property determines which default Docker image is used.
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
-python: "3.11"
+# If `image` is not specified, dstack uses its base image
+python: "3.10"
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
@@ -28,10 +32,25 @@ commands:
-!!! info "nvcc"
+??? info "nvcc"
Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels.
To install it, use `conda install cuda`.
+
+ ```yaml
+ type: task
+ # The name is optional, if not specified, generated randomly
+ name: train
+
+ python: "3.10"
+
+ # Before other commands, install `nvcc` (via `conda install cuda`)
+ commands:
+ - conda install cuda
+ - pip install -r fine-tuning/qlora/requirements.txt
+ - python fine-tuning/qlora/train.py
+ ```
+
### Ports { #_ports }
A task can configure ports. In this case, if the task is running an application on a port, `dstack run`
@@ -41,14 +60,17 @@ will securely allow you to access this port from your local machine through port
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
-python: "3.11"
+python: "3.10"
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- tensorboard --logdir results/runs &
- python fine-tuning/qlora/train.py
-
+# Expose the port to access TensorBoard
ports:
- 6000
```
@@ -65,9 +87,13 @@ When running it, `dstack run` forwards `6000` port to `localhost:6000`, enabling
```yaml
type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: train
-image: dstackai/base:py3.11-0.4-cuda-12.1
+# Any custom Docker image
+image: dstackai/base:py3.10-0.4-cuda-12.1
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
@@ -80,12 +106,17 @@ commands:
```yaml
type: dev-environment
+ # The name is optional, if not specified, generated randomly
+ name: train
- image: dstackai/base:py3.11-0.4-cuda-12.1
+ # Any private Docker image
+ image: dstackai/base:py3.10-0.4-cuda-12.1
+ # Credentials of the private Docker registry
registry_auth:
username: peterschmidt85
password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5
+ # Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
@@ -100,7 +131,10 @@ range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`).
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
@@ -108,13 +142,11 @@ commands:
resources:
# 200GB or more RAM
memory: 200GB..
-
# 4 GPUs from 40GB to 80GB
gpu: 40GB..80GB:4
-
- # Shared memory
+ # Shared memory (required by multi-gpu)
shm_size: 16GB
-
+ # Disk size
disk: 500GB
```
@@ -130,9 +162,12 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10
```yaml
type: task
+ # The name is optional, if not specified, generated randomly
+ name: train
- python: "3.11"
+ python: "3.10"
+ # Commands of the task
commands:
- pip install torch~=2.3.0 torch_xla[tpu]~=2.3.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html
- git clone --recursive https://github.com/pytorch/xla.git
@@ -155,12 +190,14 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10
```yaml
type: task
-python: "3.11"
+python: "3.10"
+# Environment variables
env:
- HUGGING_FACE_HUB_TOKEN
- HF_HUB_ENABLE_HF_TRANSFER=1
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
@@ -168,12 +205,12 @@ commands:
-If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above),
+> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above),
`dstack` will require the value to be passed via the CLI or set in the current process.
For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`.
-##### Default environment variables
+##### System environment variables
The following environment variables are available in any run and are passed by `dstack` by default:
@@ -186,7 +223,7 @@ The following environment variables are available in any run and are passed by `
| `DSTACK_NODE_RANK` | The rank of the node |
| `DSTACK_MASTER_NODE_IP` | The internal IP address the master node |
-### Distributed tasks { #_nodes }
+### Distributed tasks
By default, the task runs on a single node. However, you can run it on a cluster of nodes.
@@ -194,13 +231,15 @@ By default, the task runs on a single node. However, you can run it on a cluster
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train-distrib
# The size of the cluster
nodes: 2
-python: "3.11"
-env:
- - HF_HUB_ENABLE_HF_TRANSFER=1
+python: "3.10"
+
+# Commands of the task
commands:
- pip install -r requirements.txt
- torchrun
@@ -220,41 +259,13 @@ resources:
If you run the task, `dstack` first provisions the master node and then runs the other nodes of the cluster.
All nodes are provisioned in the same region.
-`dstack` is easy to use with `accelerate`, `torchrun`, and other distributed frameworks. All you need to do
+> `dstack` is easy to use with `accelerate`, `torchrun`, and other distributed frameworks. All you need to do
is pass the corresponding environment variables such as `DSTACK_GPUS_PER_NODE`, `DSTACK_NODE_RANK`, `DSTACK_NODES_NUM`,
-`DSTACK_MASTER_NODE_IP`, and `DSTACK_GPUS_NUM` (see [System environment variables](#default-environment-variables)).
+`DSTACK_MASTER_NODE_IP`, and `DSTACK_GPUS_NUM` (see [System environment variables](#system-environment-variables)).
??? info "Backends"
- Running on multiple nodes is supported only with `aws`, `gcp`, `azure`, `oci`, and instances added via
- [`dstack pool add-ssh`](../../fleets.md#__tabbed_1_2).
-
-### Arguments
-
-You can parameterize tasks with user arguments using `${{ run.args }}` in the configuration.
-
-
-
-```yaml
-type: task
-
-python: "3.11"
-
-commands:
- - pip install -r fine-tuning/qlora/requirements.txt
- - python fine-tuning/qlora/train.py ${{ run.args }}
-```
-
-
-
-Now, you can pass your arguments to the `dstack run` command:
-
-
-
-```shell
-$ dstack run . -f train.dstack.yml --train_batch_size=1 --num_train_epochs=100
-```
-
-
+ Running on multiple nodes is supported only with the `aws`, `gcp`, `azure`, `oci` backends, or
+ [on-prem fleets](../../fleets.md#__tabbed_1_2).
### Web applications
@@ -264,13 +275,16 @@ Here's an example of using `ports` to run web apps with `tasks`.
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: streamlit-hello
-python: "3.11"
+python: "3.10"
+# Commands of the task
commands:
- pip3 install streamlit
- streamlit hello
-
+# Expose the port to access the web app
ports:
- 8501
@@ -286,11 +300,15 @@ You can choose whether to use spot instances, on-demand instances, or any availa
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
+# Use either spot or on-demand instances
spot_policy: auto
```
@@ -298,6 +316,34 @@ spot_policy: auto
The `spot_policy` accepts `spot`, `on-demand`, and `auto`. The default for tasks is `auto`.
+### Queueing tasks { #queueing-tasks }
+
+By default, if `dstack apply` cannot find capacity, the task fails.
+
+To queue the task and wait for capacity, specify the [`retry`](#retry)
+property:
+
+
+
+```yaml
+type: task
+# The name is optional, if not specified, generated randomly
+name: train
+
+# Commands of the task
+commands:
+ - pip install -r fine-tuning/qlora/requirements.txt
+ - python fine-tuning/qlora/train.py
+
+retry:
+ # Retry on no-capacity errors
+ on_events: [no-capacity]
+ # Retry within 1 day
+ duration: 1d
+```
+
+
+
### Backends
By default, `dstack` provisions instances in all configured backends. However, you can specify the list of backends:
@@ -306,11 +352,15 @@ By default, `dstack` provisions instances in all configured backends. However, y
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
+# Use only listed backends
backends: [aws, gcp]
```
@@ -324,11 +374,15 @@ By default, `dstack` uses all configured regions. However, you can specify the l
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: train
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
+# Use only listed regions
regions: [eu-west-1, eu-west-2]
```
@@ -343,13 +397,17 @@ To attach a volume, simply specify its name using the `volumes` property and spe
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: vscode
-python: "3.11"
+python: "3.10"
+# Commands of the task
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- python fine-tuning/qlora/train.py
+# Map the name of the volume to any path
volumes:
- name: my-new-volume
path: /volume_data
@@ -375,6 +433,15 @@ The `task` configuration type supports many other options. See below.
type:
required: true
+## `retry`
+
+#SCHEMA# dstack._internal.core.models.profiles.ProfileRetry
+ overrides:
+ show_root_heading: false
+ type:
+ required: true
+ item_id_prefix: retry-
+
## `resources`
#SCHEMA# dstack._internal.core.models.resources.ResourcesSpecSchema
diff --git a/docs/docs/reference/dstack.yml/volume.md b/docs/docs/reference/dstack.yml/volume.md
index 03351fb6e..26e75b8a8 100644
--- a/docs/docs/reference/dstack.yml/volume.md
+++ b/docs/docs/reference/dstack.yml/volume.md
@@ -2,35 +2,45 @@
The `volume` configuration type allows creating, registering, and updating volumes.
-> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `vol.dstack.yml` are both acceptable)
-> and can be located in the project's root directory or any nested folder.
-> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply).
+> Configuration files must be inside the project repo, and their names must end with `.dstack.yml`
+> (e.g. `.dstack.yml` or `fleet.dstack.yml` are both acceptable).
+> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply).
## Examples
-### Creating a new volume { #create-volume }
+### Creating a new volume { #new-volume }
```yaml
type: volume
-name: my-aws-volume
+# The name of the volume
+name: my-new-volume
+
+# Volumes are bound to a specific backend and region
backend: aws
region: eu-central-1
+
+# The size of the volume
size: 100GB
```
-### Registering an existing volume { #register-volume }
+### Registering an existing volume { #existing-volume }
-
+
```yaml
type: volume
-name: my-external-volume
+# The name of the volume
+name: my-existing-volume
+
+# Volumes are bound to a specific backend and region
backend: aws
region: eu-central-1
+
+# The ID of the volume in AWS
volume_id: vol1235
```
diff --git a/docs/docs/services.md b/docs/docs/services.md
index e755cd2c4..0c0120a4e 100644
--- a/docs/docs/services.md
+++ b/docs/docs/services.md
@@ -1,8 +1,10 @@
# Services
-Services make it easy to deploy models and web applications as public,
-secure, and scalable endpoints. They are provisioned behind a [gateway](concepts/gateways.md) that
-automatically provides an HTTPS domain, handles authentication, distributes load, and performs auto-scaling.
+A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure
+dependencies, resources, authorizarion, auto-scaling rules, etc.
+
+Services are provisioned behind a [gateway](concepts/gateways.md) which provides an HTTPS endpoint mapped to your domain,
+handles authentication, distributes load, and performs auto-scaling.
??? info "Gateways"
If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service.
@@ -10,32 +12,43 @@ automatically provides an HTTPS domain, handles authentication, distributes load
If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
the gateway is already set up for you.
-## Configuration
+## Define a configuration
-First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `serve.dstack.yml`
+First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or
+`serve.dstack.yml`
are both acceptable).
```yaml
type: service
+# The name is optional, if not specified, generated randomly
+name: llama31-service
+
+# If `image` is not specified, dstack uses its default image
+python: "3.10"
-python: "3.11"
+# Required environment variables
env:
- - MODEL=NousResearch/Llama-2-7b-chat-hf
+ - HUGGING_FACE_HUB_TOKEN
commands:
- - pip install vllm
- - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+ - install vllm==0.5.3.post1
+ - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+# Expose the vllm server port
port: 8000
+# Use either spot or on-demand instances
+spot_policy: auto
+
resources:
- gpu: 80GB
+ # Change to what is required
+ gpu: 24GB
-# (Optional) Enable the OpenAI-compatible endpoint
+# Comment if you don't to access the model via https://gateway.
model:
- format: openai
type: chat
- name: NousResearch/Llama-2-7b-chat-hf
+ name: meta-llama/Meta-Llama-3.1-8B-Instruct
+ format: openai
```
@@ -49,25 +62,26 @@ If you don't specify your Docker image, `dstack` uses the [base](https://hub.doc
In this case, `dstack` auto-scales it based on the load.
!!! info "Reference"
- See the [.dstack.yml reference](reference/dstack.yml/service.md)
- for all supported configuration options and examples.
+ See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by
+ services, along with multiple examples.
-## Running
+## Run a service
-To run a configuration, use the [`dstack run`](reference/cli/index.md#dstack-run) command followed by the working directory path,
-configuration file path, and any other options.
+To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-apply) command.
```shell
+$ HUGGING_FACE_HUB_TOKEN=...
+
$ dstack run . -f serve.dstack.yml
- BACKEND REGION RESOURCES SPOT PRICE
- tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
- azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673
- azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673
+ # BACKEND REGION RESOURCES SPOT PRICE
+ 1 runpod CA-MTL-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
+ 2 runpod EU-SE-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
+ 3 gcp us-west4 27xCPU, 150GB, A5000:24GB:3 yes $0.33
-Continue? [y/n]: y
+Submit the run llama31-service? [y/n]: y
Provisioning...
---> 100%
@@ -77,31 +91,14 @@ Service is published at https://yellow-cat-1.example.com
-When deploying the service, `dstack run` mounts the current folder's contents.
-
-[//]: # (TODO: Fleets and idle duration)
-
-??? info ".gitignore"
- If there are large files or folders you'd like to avoid uploading,
- you can list them in `.gitignore`.
-
-??? info "Fleets"
- By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md).
- If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends.
-
- To have the fleet deleted after a certain idle time automatically, set
- [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time).
- By default, it's set to `5min`.
-
-!!! info "Reference"
- See the [CLI reference](reference/cli/index.md#dstack-run) for more details
- on how `dstack run` works.
+`dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes.
+To avoid uploading large files, ensure they are listed in `.gitignore`.
-## Service endpoint
+## Access the endpoint
One the service is up, its endpoint is accessible at `https://
.`.
-By default, the service endpoint requires the `Authorization` header with `Bearer `.
+By default, the service endpoint requires the `Authorization` header with `Bearer `.
@@ -110,7 +107,7 @@ $ curl https://yellow-cat-1.example.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <dstack token>' \
-d '{
- "model": "NousResearch/Llama-2-7b-chat-hf",
+ "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"messages": [
{
"role": "user",
@@ -122,27 +119,50 @@ $ curl https://yellow-cat-1.example.com/v1/chat/completions \
-Authorization can be disabled by setting `auth` to `false` in the service configuration file.
+Authorization can be disabled by setting [`auth`](reference/dstack.yml/service.md#authorization) to `false` in the
+service configuration file.
+
+### Gateway endpoint
-### Model endpoint
+In case the service has the [model mapping](reference/dstack.yml/service.md#model-mapping) configured, you will also be
+able to access the model at `https://gateway.` via the OpenAI-compatible interface.
-In case the service has the [model mapping](reference/dstack.yml/service.md#model-mapping) configured, you will also be able
-to access the model at `https://gateway.` via the OpenAI-compatible interface.
+## Manage runs
-## Managing runs
+### List runs
-### Listing runs
+The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running jobs and their statuses.
+Use `--watch` (or `-w`) to monitor the live status of runs.
-The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running runs and their status.
+### Stop a run
-### Stopping runs
+Once the run exceeds the [`max_duration`](reference/dstack.yml/task.md#max_duration), or when you use [`dstack stop`](reference/cli/index.md#dstack-stop),
+the dev environment is stopped. Use `--abort` or `-x` to stop the run abruptly.
-When you use [`dstack stop`](reference/cli/index.md#dstack-stop), the service and its cloud resources are deleted.
+[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`)
+
+## Manage fleets
+
+By default, `dstack apply` reuses `idle` instances from one of the existing [fleets](fleets.md),
+or creates a new fleet through backends.
+
+!!! info "Idle duration"
+ To ensure the created fleets are deleted automatically, set
+ [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time).
+ By default, it's set to `5min`.
+
+!!! info "Creation policy"
+ To ensure `dstack apply` always reuses an existing fleet and doesn't create a new one,
+ pass `--reuse` to `dstack apply` (or set [`creation_policy`](reference/dstack.yml/task.md#creation_policy) to `reuse` in the task configuration).
+ The default policy is `reuse_or_create`.
## What's next?
-1. Check the [Text Generation Inference :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/tgi/README.md){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/vllm/README.md){:target="_blank"} examples
-2. Check the [`.dstack.yml` reference](reference/dstack.yml/service.md) for more details and examples
-3. See [gateways](concepts/gateways.md) on how to set up a gateway
-4. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"}
-5. See [fleets](fleets.md) on how to manage fleets
\ No newline at end of file
+1. Check the [TGI :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/tgi/README.md){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/vllm/README.md){:target="_blank"} examples
+2. See [gateways](concepts/gateways.md) on how to set up a gateway
+3. Browse [examples](/docs/examples)
+4. See [fleets](fleets.md) on how to manage fleets
+
+!!! info "Reference"
+ See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by
+ services, along with multiple examples.
\ No newline at end of file
diff --git a/docs/docs/tasks.md b/docs/docs/tasks.md
index acae1b7bc..330834093 100644
--- a/docs/docs/tasks.md
+++ b/docs/docs/tasks.md
@@ -1,34 +1,42 @@
# Tasks
-Tasks allow for convenient scheduling of various batch jobs, such as training, fine-tuning, or
-data processing. They can also be used to run web applications
-when features offered by [services](services.md) are not needed, such as for debugging.
+A task allows you to schedule a job or run a web app. It lets you configure dependencies, resources, ports, and more.
+Tasks can be distributed and run on clusters.
-You can run tasks on a single machine or on a cluster of nodes.
+Tasks are ideal for training and fine-tuning jobs. They can also be used instead of services if you want to run a web
+app but don't need a public endpoint.
-## Configuration
+## Define a configuration
First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `train.dstack.yml`
are both acceptable).
-
+[//]: # (TODO: Make tabs - single machine & distributed tasks & web app)
+
+
```yaml
type: task
+# The name is optional, if not specified, generated randomly
+name: axolotl-train
+
+# Using the official Axolotl's Docker image
+image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
-python: "3.11"
+# Required environment variables
env:
- - HF_HUB_ENABLE_HF_TRANSFER=1
+ - HUGGING_FACE_HUB_TOKEN
+ - WANDB_API_KEY
+# Commands of the task
commands:
- - pip install -r fine-tuning/qlora/requirements.txt
- - tensorboard --logdir results/runs &
- - python fine-tuning/qlora/train.py
-ports:
- - 6000
+ - accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
-# (Optional) Configure `gpu`, `memory`, `disk`, etc
resources:
- gpu: 80GB
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # Two or more GPU
+ count: 2..
```
@@ -36,84 +44,91 @@ resources:
If you don't specify your Docker image, `dstack` uses the [base](https://hub.docker.com/r/dstackai/base/tags) image
(pre-configured with Python, Conda, and essential CUDA drivers).
-
!!! info "Distributed tasks"
By default, tasks run on a single instance. However, you can specify
- the [number of nodes](reference/dstack.yml/task.md#_nodes).
- In this case, `dstack` provisions a cluster of instances.
+ the [number of nodes](reference/dstack.yml/task.md#distributed-tasks).
+ In this case, the task will run a cluster of instances.
!!! info "Reference"
- See the [.dstack.yml reference](reference/dstack.yml/task.md)
- for all supported configuration options and examples.
+ See [.dstack.yml](reference/dstack.yml/task.md) for all the options supported by
+ tasks, along with multiple examples.
-## Running
+## Run a configuration
-To run a configuration, use the [`dstack run`](reference/cli/index.md#dstack-run) command followed by the working directory path,
-configuration file path, and other options.
+To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-apply) command.
```shell
-$ dstack run . -f train.dstack.yml
+$ HUGGING_FACE_HUB_TOKEN=...
+$ WANDB_API_KEY=...
- BACKEND REGION RESOURCES SPOT PRICE
- tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595
- azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673
- azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673
-
-Continue? [y/n]: y
+$ dstack apply -f examples/.dstack.yml
-Provisioning...
----> 100%
+ # BACKEND REGION RESOURCES SPOT PRICE
+ 1 runpod CA-MTL-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
+ 2 runpod EU-SE-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22
+ 3 gcp us-west4 27xCPU, 150GB, A5000:24GB:3 yes $0.33
-TensorBoard 2.13.0 at http://localhost:6006/ (Press CTRL+C to quit)
+Submit the run axolotl-train? [y/n]: y
-Epoch 0: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
-Epoch 1: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
-Epoch 2: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969]
+Launching `axolotl-train`...
+---> 100%
+
+{'loss': 1.4967, 'grad_norm': 1.2734375, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
+ 0% 1/24680 [00:13<95:34:17, 13.94s/it]
+ 6% 73/1300 [00:48<13:57, 1.47it/s]
```
-If the task specifies `ports`, `dstack run` automatically forwards them to your local machine for
-convenient and secure access.
+`dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes.
+To avoid uploading large files, ensure they are listed in `.gitignore`.
-When running the task, `dstack run` mounts the current folder's contents.
+!!! info "Ports"
+ If the task specifies [`ports`](reference/dstack.yml/task.md#_ports), `dstack run` automatically forwards them to your
+ local machine for convenient and secure access.
-[//]: # (TODO: Fleets and idle duration)
+!!! info "Queueing tasks"
+ By default, if `dstack apply` cannot find capacity, the task fails.
+ To queue the task and wait for capacity, specify the [`retry`](reference/dstack.yml/task.md#queueing-tasks)
+ property in the task configuration.
-??? info ".gitignore"
- If there are large files or folders you'd like to avoid uploading,
- you can list them in `.gitignore`.
+## Manage runs
-??? info "Fleets"
- By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md).
- If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends.
+### List runs
- To have the fleet deleted after a certain idle time automatically, set
- [`termination_idle_time`](../reference/dstack.yml/fleet.md#termination_idle_time).
- By default, it's set to `5min`.
+The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running jobs and their statuses.
+Use `--watch` (or `-w`) to monitor the live status of runs.
-!!! info "Reference"
- See the [CLI reference](reference/cli/index.md#dstack-run) for more details
- on how `dstack run` works.
+### Stop a run
-## Managing runs
+Once the run exceeds the [`max_duration`](reference/dstack.yml/task.md#max_duration), or when you use [`dstack stop`](reference/cli/index.md#dstack-stop),
+the dev environment is stopped. Use `--abort` or `-x` to stop the run abruptly.
-### Listing runs
+[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`)
-The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running runs and their status.
+## Manage fleets
-### Stopping runs
+By default, `dstack apply` reuses `idle` instances from one of the existing [fleets](fleets.md),
+or creates a new fleet through backends.
-Once you use [`dstack stop`](reference/cli/index.md#dstack-stop) (or when the run exceeds the
-`max_duration`), the instances return to the [fleet](fleets.md).
+!!! info "Idle duration"
+ To ensure the created fleets are deleted automatically, set
+ [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time).
+ By default, it's set to `5min`.
-[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`)
+!!! info "Creation policy"
+ To ensure `dstack apply` always reuses an existing fleet and doesn't create a new one,
+ pass `--reuse` to `dstack apply` (or set [`creation_policy`](reference/dstack.yml/task.md#creation_policy) to `reuse` in the task configuration).
+ The default policy is `reuse_or_create`.
## What's next?
-1. Check the [QLoRA :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/qlora/README.md){:target="_blank"} example
-2. Check the [`.dstack.yml` reference](../reference/dstack.yml/task.md) for more details and examples
-3. Browse [all examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"}
-4. See [fleets](fleets.md) on how to manage fleets
\ No newline at end of file
+1. Check the [Axolotl](/docs/examples/fine-tuning/axolotl) example
+2. Browse [all examples](/docs/examples)
+3. See [fleets](fleets.md) on how to manage fleets
+
+!!! info "Reference"
+ See [.dstack.yml](reference/dstack.yml/task.md) for all the options supported by
+ tasks, along with multiple examples.
diff --git a/docs/overrides/home.html b/docs/overrides/home.html
index d65ddd63c..89083a996 100644
--- a/docs/overrides/home.html
+++ b/docs/overrides/home.html
@@ -112,8 +112,8 @@
AI container orchestration engine for everyone
- dstack is an open-source orchestration engine that simplifies developing, training, and deploying AI
- models, as well as managing clusters on any cloud or data center.
+ dstack is a lightweight alternative to Kubernetes for AI. It simplifies container orchestration for
+ AI on any cloud or on-premises, accelerating the development, training, and deployment.
@@ -228,10 +228,11 @@ Dev environments
Tasks
-
Tasks allow for convenient scheduling of various batch jobs, such as training, fine-tuning, or
- data processing, as well as running web applications.
+
A task allows you to schedule a job or run a web app. It lets you configure dependencies,
+ resources, ports, and more. Tasks can be distributed and run on clusters.
-
You can run tasks on a single machine or on a cluster of nodes.
+
Tasks are ideal for training and fine-tuning jobs or running apps
+ for development purposes.
Tasks
Services
- Services make it very easy to deploy any kind of model as public,
- secure, and scalable endpoints.
+ A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure
+ dependencies, resources, authorizarion, auto-scaling rules, etc.
@@ -343,9 +344,9 @@
-
-
+
Axolotl
diff --git a/examples/.dstack.yml b/examples/.dstack.yml
index 143c04aec..9a09e8641 100644
--- a/examples/.dstack.yml
+++ b/examples/.dstack.yml
@@ -1,10 +1,16 @@
type: dev-environment
+# The name is optional, if not specified, generated randomly
name: vscode
-# This configuration launches a blank dev environment
-
python: "3.11"
+# Uncomment to use a custom Docker image
+#image: dstackai/base:py3.10-0.4-cuda-12.1
ide: vscode
+# Use either spot or on-demand instances
spot_policy: auto
+
+# Uncomment to request resources
+#resources:
+# gpu: 24GB
\ No newline at end of file
diff --git a/examples/README.md b/examples/README.md
index 7bffa9914..23accbb3f 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -10,7 +10,9 @@ cd dstack
dstack init
```
-Now you are ready to run examples! Select any example from the left-hand sidebar.
+Now you are ready to run examples!
+
+> Browse the examples using the menu on the left.
## Source code
diff --git a/examples/fine-tuning/alignment-handbook/.dstack.yml b/examples/fine-tuning/alignment-handbook/.dstack.yml
index 5d121d1ef..fc97d6b96 100644
--- a/examples/fine-tuning/alignment-handbook/.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/.dstack.yml
@@ -1,4 +1,5 @@
type: dev-environment
+# The name is optional, if not specified, generated randomly
name: ah-vscode
# If `image` is not specified, dstack uses its default image
@@ -25,5 +26,8 @@ ide: vscode
spot_policy: auto
resources:
- # Minimum 24GB, one or more GPU
- gpu: 24GB..:1..
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/fine-tuning/alignment-handbook/config.yaml b/examples/fine-tuning/alignment-handbook/config.yaml
index fee2964b4..330c43c10 100644
--- a/examples/fine-tuning/alignment-handbook/config.yaml
+++ b/examples/fine-tuning/alignment-handbook/config.yaml
@@ -40,7 +40,7 @@ gradient_accumulation_steps: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
-hub_model_id: chansung/coding_llamaduo_60k_v0.2
+hub_model_id: peterschmidt85/coding_llamaduo_60k_v0.2
hub_strategy: every_save
learning_rate: 2.0e-04
log_level: info
diff --git a/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml
index 0fa773f16..10c049017 100644
--- a/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml
@@ -2,16 +2,19 @@ type: fleet
# The name is optional, if not specified, generated randomly
name: ah-fleet-distrib
+# Number of instances in fleet
+nodes: 2
+# Ensure instances are interconnected
+placement: cluster
+
# Use either spot or on-demand instances
spot_policy: auto
-# Terminate the instance if not used for one hour
+# Terminate instances if not used for one hour
termination_idle_time: 1h
resources:
- # Change to what is required
- gpu: 24GB
-
-# Specify a number of instances
-nodes: 2
-# Ensure instances are interconnected
-placement: cluster
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/fine-tuning/alignment-handbook/fleet.dstack.yml b/examples/fine-tuning/alignment-handbook/fleet.dstack.yml
index 2388cc745..d8ae8872d 100644
--- a/examples/fine-tuning/alignment-handbook/fleet.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/fleet.dstack.yml
@@ -2,14 +2,17 @@ type: fleet
# The name is optional, if not specified, generated randomly
name: ah-fleet
+# Number of instances in fleet
+nodes: 1
+
# Use either spot or on-demand instances
spot_policy: auto
# Terminate the instance if not used for one hour
termination_idle_time: 1h
resources:
- # Change to what is required
- gpu: 24GB
-
-# Need one instance only
-nodes: 1
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
index cf09c0290..b33902a5d 100644
--- a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
@@ -25,17 +25,20 @@ commands:
--machine_rank=$DSTACK_NODE_RANK
--num_processes=$DSTACK_GPUS_NUM
--num_machines=$DSTACK_NODES_NUM
- scripts/run_sft.py
+ scripts/run_sft.py
../examples/fine-tuning/alignment-handbook/config.yaml
# Expose 6006 to access TensorBoard
ports:
- 6006
-# The number of interconnected instances required
+# Number of instances in cluster
nodes: 2
resources:
- # Required resources
- gpu: 24GB
- # Shared memory size for inter-process communication
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
+ # Shared memory (for multi-gpu)
shm_size: 24GB
\ No newline at end of file
diff --git a/examples/fine-tuning/alignment-handbook/train.dstack.yml b/examples/fine-tuning/alignment-handbook/train.dstack.yml
index fc57a2adc..a52a3b08f 100644
--- a/examples/fine-tuning/alignment-handbook/train.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/train.dstack.yml
@@ -28,5 +28,8 @@ commands:
# - 6006
resources:
- # Required resources
- gpu: 24GB
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/fine-tuning/axolotl/.dstack.yml b/examples/fine-tuning/axolotl/.dstack.yml
index f419d6ac8..4b9096cfa 100644
--- a/examples/fine-tuning/axolotl/.dstack.yml
+++ b/examples/fine-tuning/axolotl/.dstack.yml
@@ -1,4 +1,5 @@
type: dev-environment
+# The name is optional, if not specified, generated randomly
name: axolotl-vscode
# Using the official Axolotl's Docker image
@@ -15,5 +16,8 @@ ide: vscode
spot_policy: auto
resources:
- # Two or more 24GB GPUs (required by FSDP)
- gpu: 24GB:2..
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # Two or more GPU
+ count: 2..
\ No newline at end of file
diff --git a/examples/fine-tuning/axolotl/README.md b/examples/fine-tuning/axolotl/README.md
index 85dc7d77a..c7ffd762b 100644
--- a/examples/fine-tuning/axolotl/README.md
+++ b/examples/fine-tuning/axolotl/README.md
@@ -47,13 +47,13 @@ env:
# Commands of the task
commands:
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
-# Expose 6006 to access TensorBoard
-ports:
- - 6006
resources:
- # Two or more 24GB GPUs (required by FSDP)
- gpu: 24GB:2..
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # Two or more GPU
+ count: 2..
```
The task uses Axolotl's Docker image, where Axolotl is already pre-installed.
@@ -67,9 +67,6 @@ WANDB_API_KEY=...
dstack apply -f examples/fine-tuning/axolotl/train.dstack.yml
```
-If you list `tensorbord` via `report_to` in [`examples/fine-tuning/axolotl/config.yaml`](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/config.yaml),
-you'll be able to access experiment metrics via `http://localhost:6006` (while the task is running).
-
## Fleets
> By default, `dstack run` reuses `idle` instances from one of the existing [fleets](https://dstack.ai/docs/fleets).
diff --git a/examples/fine-tuning/axolotl/config.yaml b/examples/fine-tuning/axolotl/config.yaml
index 5087bc434..7f3c08745 100644
--- a/examples/fine-tuning/axolotl/config.yaml
+++ b/examples/fine-tuning/axolotl/config.yaml
@@ -79,4 +79,4 @@ fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
-hub_model_id: chansung/axolotl_llama3_8b_fsdp_qlora
\ No newline at end of file
+hub_model_id: peterschmidt85/axolotl_llama3_8b_fsdp_qlora
\ No newline at end of file
diff --git a/examples/fine-tuning/axolotl/fleet.dstack.yml b/examples/fine-tuning/axolotl/fleet.dstack.yml
index b3aefe6a9..0a10d67e1 100644
--- a/examples/fine-tuning/axolotl/fleet.dstack.yml
+++ b/examples/fine-tuning/axolotl/fleet.dstack.yml
@@ -2,14 +2,16 @@ type: fleet
# The name is optional, if not specified, generated randomly
name: axolotl-fleet
+# Number of instances in fleet
+nodes: 1
+
# Use either spot or on-demand instances
spot_policy: auto
# Terminate the instance if not used for one hour
termination_idle_time: 1h
resources:
- # Two or more 24GB GPUs (required by FSDP)
- gpu: 24GB:2..
-
-# Need one instance only
-nodes: 1
\ No newline at end of file
+ # 24GB or more vRAM
+ memory: 24GB..
+ # Two or more GPU (required by FSDP)
+ count: 2..
\ No newline at end of file
diff --git a/examples/fine-tuning/axolotl/train.dstack.yaml b/examples/fine-tuning/axolotl/train.dstack.yaml
index 9accbe5fc..b81c5fc8c 100644
--- a/examples/fine-tuning/axolotl/train.dstack.yaml
+++ b/examples/fine-tuning/axolotl/train.dstack.yaml
@@ -12,10 +12,10 @@ env:
# Commands of the task
commands:
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
-# Uncomment to access TensorBoard
-#ports:
-# - 6006
resources:
- # Two or more 24GB GPUs (required by FSDP)
- gpu: 24GB:2..
\ No newline at end of file
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # Two or more GPU (required by FSDP)
+ count: 2..
\ No newline at end of file
diff --git a/examples/fine-tuning/qlora/train.dstack.yml b/examples/fine-tuning/qlora/train.dstack.yml
index 029c472de..a51bb1ff0 100644
--- a/examples/fine-tuning/qlora/train.dstack.yml
+++ b/examples/fine-tuning/qlora/train.dstack.yml
@@ -1,5 +1,4 @@
type: task
-# This task fine-tunes Llama 2 with QLoRA. Learn more at https://dstack.ai/examples/qlora/
python: "3.11"
diff --git a/examples/fine-tuning/trl/.dstack.yml b/examples/fine-tuning/trl/.dstack.yml
new file mode 100644
index 000000000..13685d624
--- /dev/null
+++ b/examples/fine-tuning/trl/.dstack.yml
@@ -0,0 +1,35 @@
+type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: trl-vscode
+
+# If `image` is not specified, dstack uses its default image
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+ - ACCELERATE_LOG_LEVEL=info
+ - WANDB_API_KEY
+# Uncomment if you want the environment to be pre-installed
+#init:
+# - conda install cuda
+# - pip install flash-attn --no-build-isolation
+# - pip install "transformers>=4.43.2"
+# - pip install bitsandbytes
+# - pip install peft
+# - pip install wandb
+# - git clone https://github.com/huggingface/trl
+# - cd trl
+# - pip install .
+
+ide: vscode
+
+# Use either spot or on-demand instances
+spot_policy: auto
+
+resources:
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/fine-tuning/trl/train-distrib.dstack.yml b/examples/fine-tuning/trl/train-distrib.dstack.yml
new file mode 100644
index 000000000..f17d42997
--- /dev/null
+++ b/examples/fine-tuning/trl/train-distrib.dstack.yml
@@ -0,0 +1,62 @@
+type: task
+# The name is optional, if not specified, generated randomly
+name: trl-train-distrib
+
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+ - ACCELERATE_LOG_LEVEL=info
+ - WANDB_API_KEY
+# Commands of the task
+commands:
+ - conda install cuda
+ - pip install "transformers>=4.43.2"
+ - pip install bitsandbytes
+ - pip install flash-attn --no-build-isolation
+ - pip install peft
+ - pip install wandb
+ - git clone https://github.com/huggingface/trl
+ - cd trl
+ - pip install .
+ - accelerate launch
+ --config_file=examples/accelerate_configs/fsdp_qlora.yaml
+ --main_process_ip=$DSTACK_MASTER_NODE_IP
+ --main_process_port=8008
+ --machine_rank=$DSTACK_NODE_RANK
+ --num_processes=$DSTACK_GPUS_NUM
+ --num_machines=$DSTACK_NODES_NUM
+ examples/scripts/sft.py
+ --model_name meta-llama/Meta-Llama-3.1-8B
+ --dataset_name OpenAssistant/oasst_top1_2023-08-25
+ --dataset_text_field="text"
+ --per_device_train_batch_size 1
+ --per_device_eval_batch_size 1
+ --gradient_accumulation_steps 4
+ --learning_rate 2e-4
+ --report_to wandb
+ --bf16
+ --max_seq_length 1024
+ --lora_r 16 --lora_alpha 32
+ --lora_target_modules q_proj k_proj v_proj o_proj
+ --load_in_4bit
+ --use_peft
+ --attn_implementation "flash_attention_2"
+ --logging_steps=10
+ --output_dir models/llama31
+ --hub_model_id peterschmidt85/FineLlama-3.1-8B
+ --torch_dtype bfloat16
+ --use_bnb_nested_quant
+
+# Size of the cluster
+nodes: 2
+
+resources:
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
+ # Shared memory (for multi-gpu)
+ shm_size: 24GB
\ No newline at end of file
diff --git a/examples/fine-tuning/trl/train.dstack.yml b/examples/fine-tuning/trl/train.dstack.yml
new file mode 100644
index 000000000..f965654ac
--- /dev/null
+++ b/examples/fine-tuning/trl/train.dstack.yml
@@ -0,0 +1,51 @@
+type: task
+# The name is optional, if not specified, generated randomly
+name: trl-train
+
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+ - ACCELERATE_LOG_LEVEL=info
+ - WANDB_API_KEY
+# Commands of the task
+commands:
+ - conda install cuda
+ - pip install "transformers>=4.43.2"
+ - pip install bitsandbytes
+ - pip install flash-attn --no-build-isolation
+ - pip install peft
+ - pip install wandb
+ - git clone https://github.com/huggingface/trl
+ - cd trl
+ - pip install .
+ - accelerate launch
+ --config_file=examples/accelerate_configs/multi_gpu.yaml
+ --num_processes $DSTACK_GPUS_PER_NODE
+ examples/scripts/sft.py
+ --model_name meta-llama/Meta-Llama-3.1-8B
+ --dataset_name OpenAssistant/oasst_top1_2023-08-25
+ --dataset_text_field="text"
+ --per_device_train_batch_size 1
+ --per_device_eval_batch_size 1
+ --gradient_accumulation_steps 4
+ --learning_rate 2e-4
+ --report_to wandb
+ --bf16
+ --max_seq_length 1024
+ --lora_r 16 --lora_alpha 32
+ --lora_target_modules q_proj k_proj v_proj o_proj
+ --load_in_4bit
+ --use_peft
+ --attn_implementation "flash_attention_2"
+ --logging_steps=10
+ --output_dir models/llama31
+ --hub_model_id peterschmidt85/FineLlama-3.1-8B
+
+resources:
+ gpu:
+ # 24GB or more vRAM
+ memory: 24GB..
+ # One or more GPU
+ count: 1..
\ No newline at end of file
diff --git a/examples/llms/llama31/.dstack.yml b/examples/llms/llama31/.dstack.yml
new file mode 100644
index 000000000..b9782c82a
--- /dev/null
+++ b/examples/llms/llama31/.dstack.yml
@@ -0,0 +1,20 @@
+type: dev-environment
+# The name is optional, if not specified, generated randomly
+name: llama31-vscode
+
+# If `image` is not specified, dstack uses its default image
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+ide: vscode
+
+# Use either spot or on-demand instances
+spot_policy: auto
+# Uncomment to ensure it doesn't create a new fleet
+#creation_policy: reuse
+
+resources:
+ # Required resources
+ gpu: 24GB
diff --git a/examples/llms/llama31/fleet.dstack.yml b/examples/llms/llama31/fleet.dstack.yml
new file mode 100644
index 000000000..51136e5cd
--- /dev/null
+++ b/examples/llms/llama31/fleet.dstack.yml
@@ -0,0 +1,15 @@
+type: fleet
+# The name is optional, if not specified, generated randomly
+name: llama31-fleet
+
+# Need one instance only
+nodes: 1
+
+# Use either spot or on-demand instances
+spot_policy: auto
+# Terminate the instance if not used for one hour
+termination_idle_time: 1h
+
+resources:
+ # Required resources
+ gpu: 24GB
diff --git a/examples/llms/llama31/service.dstack.yml b/examples/llms/llama31/service.dstack.yml
new file mode 100644
index 000000000..400d02379
--- /dev/null
+++ b/examples/llms/llama31/service.dstack.yml
@@ -0,0 +1,30 @@
+type: service
+# The name is optional, if not specified, generated randomly
+name: llama31-service
+
+# If `image` is not specified, dstack uses its base image
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+commands:
+ - install vllm==0.5.3.post1
+ - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+# Expose the vllm server port
+port: 8000
+
+# Use either spot or on-demand instances
+spot_policy: auto
+# Uncomment to ensure it doesn't create a new fleet
+#creation_policy: reuse
+
+resources:
+ # Change to what is required
+ gpu: 24GB
+
+# Comment if you don't want to access the model via https://gateway.
+model:
+ type: chat
+ name: meta-llama/Meta-Llama-3.1-8B-Instruct
+ format: openai
\ No newline at end of file
diff --git a/examples/llms/llama31/task.dstack.yml b/examples/llms/llama31/task.dstack.yml
new file mode 100644
index 000000000..1a8516927
--- /dev/null
+++ b/examples/llms/llama31/task.dstack.yml
@@ -0,0 +1,24 @@
+type: task
+name: llama31-task
+
+# If `image` is not specified, dstack uses its default image
+python: "3.10"
+
+# Required environment variables
+env:
+ - HUGGING_FACE_HUB_TOKEN
+commands:
+ - pip install vllm==0.5.3.post1
+ - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+# Expose the vllm server port
+ports:
+ - 8000
+
+# Use either spot or on-demand instances
+spot_policy: auto
+# Uncomment to ensure it doesn't create a new fleet
+#creation_policy: reuse
+
+resources:
+ # Required resources
+ gpu: 24GB
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index f07d87d0e..a75e8240d 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -206,6 +206,7 @@ nav:
- Volumes: docs/concepts/volumes.md
- Guides:
- Protips: docs/guides/protips.md
+ - dstack Sky: docs/guides/dstack-sky.md
- Examples: docs/examples
- Reference:
- server/config.yml: docs/reference/server/config.yml.md