diff --git a/README.md b/README.md index 67f24af4c..93227dbf5 100644 --- a/README.md +++ b/README.md @@ -14,14 +14,13 @@ -`dstack` is an open-source container orchestration engine for AI. -It accelerates the development, training, and deployment of AI models, and simplifies the management of clusters. +`dstack` is a lightweight alternative to Kubernetes, designed specifically for managing the development, training, and +deployment of AI models at any scale. -#### Cloud and on-prem +`dstack` is easy to use with any cloud provider (AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, etc.) or +any on-prem clusters. -`dstack` is easy to use with any cloud or on-prem servers. -Supported cloud providers include AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, and CUDO. -For using `dstack` with on-prem servers, see [fleets](https://dstack.ai/docs/fleets#__tabbed_1_2). +If you already use Kubernetes, `dstack` can be used with it. #### Accelerators @@ -29,40 +28,29 @@ For using `dstack` with on-prem servers, see [fleets](https://dstack.ai/docs/fle ## Major news ✨ -- [2024/07] [dstack 0.18.7: Fleets, RunPod Volumes, dstack apply, and more](https://github.com/dstackai/dstack/releases/tag/0.18.7) (Release) +- [2024/07] [dstack 0.18.8: GCP volumes](https://github.com/dstackai/dstack/releases/tag/0.18.8) (Release) +- [2024/07] [dstack 0.18.7: Fleets, RunPod volumes, dstack apply, and more](https://github.com/dstackai/dstack/releases/tag/0.18.7) (Release) - [2024/05] [dstack 0.18.4: Google Cloud TPU, and more](https://github.com/dstackai/dstack/releases/tag/0.18.4) (Release) - [2024/05] [dstack 0.18.3: OCI, and more](https://github.com/dstackai/dstack/releases/tag/0.18.3) (Release) - [2024/05] [dstack 0.18.2: On-prem clusters, private subnets, and more](https://github.com/dstackai/dstack/releases/tag/0.18.2) (Release) -- [2024/04] [dstack 0.18.0: RunPod, multi-node tasks, and more](https://github.com/dstackai/dstack/releases/tag/0.18.0) (Release) ## Installation Before using `dstack` through CLI or API, set up a `dstack` server. -### Install the server - -The easiest way to install the server, is via `pip`: - -```shell -pip install "dstack[all]" -U -``` - -### Configure backends +### 1. Configure backends -If you have default AWS, GCP, Azure, or OCI credentials on your machine, the `dstack` server will pick them up automatically. +If you want the `dstack` server to run containers or manage clusters in your cloud accounts (or use Kubernetes), +create the [~/.dstack/server/config.yml](https://dstack.ai/docs/reference/server/config.yml.md) file and configure backends. -Otherwise, you need to manually specify the cloud credentials in `~/.dstack/server/config.yml`. +### 2. Start the server -See the [server/config.yml reference](https://dstack.ai/docs/reference/server/config.yml.md#examples) -for details on how to configure backends for all supported cloud providers. - -### Start the server - -To start the server, use the `dstack server` command: +Once the `~/.dstack/server/config.yml` file is configured, proceed to start the server:
```shell +$ pip install "dstack[all]" -U $ dstack server Applying ~/.dstack/server/config.yml... @@ -76,42 +64,58 @@ The server is running at http://127.0.0.1:3000/ > **Note** > It's also possible to run the server via [Docker](https://hub.docker.com/r/dstackai/dstack). -### Add on-prem servers +The `dstack` server can run anywhere: on your laptop, a dedicated server, or in the cloud. Once it's up, you +can use either the CLI or the API. + +### 3. Set up the CLI + +To point the CLI to the `dstack` server, configure it +with the server address, user token, and project name: + +```shell +$ pip install dstack +$ dstack config --url http://127.0.0.1:3000 \ + --project main \ + --token bbae0f28-d3dd-4820-bf61-8f4bb40815da + +Configuration is updated at ~/.dstack/config.yml +``` + +### 4. Create on-prem fleets -If you'd like to use `dstack` to run workloads on your on-prem servers, -see [on-prem fleets](https://dstack.ai/docs/fleets#__tabbed_1_2) command. +> If you want the `dstack` server to run containers on your on-prem servers, +use [fleets](../fleets.md#__tabbed_1_2). ## How does it work? -### 1. Define run configurations +> Before using `dstack`, [install](https://dstack.ai/docs/installation/index.md) the server and configure backends. -`dstack` supports three types of run configurations: +### 1. Define configurations + +`dstack` supports the following configurations: * [Dev environments](https://dstack.ai/docs/dev-environments.md) — for interactive development using a desktop IDE -* [Tasks](https://dstack.ai/docs/tasks.md) — for any kind of batch jobs or web applications (supports distributed jobs) -* [Services](https://dstack.ai/docs/services.md)— for production-grade deployment (supports auto-scaling and authorization) - -Each type of run configuration allows you to specify commands for execution, required compute resources, retry policies, auto-scaling rules, authorization settings, and more. +* [Tasks](https://dstack.ai/docs/tasks.md) — for scheduling jobs (incl. distributed jobs) or running web apps +* [Services](https://dstack.ai/docs/services.md) — for deployment of models and web apps (with auto-scaling and authorization) +* [Fleets](https://dstack.ai/docs/fleets.md) — for managing cloud and on-prem clusters +* [Volumes](https://dstack.ai/docs/concepts/volumes.md) — for managing persisted volumes +* [Gateways](https://dstack.ai/docs/concepts/volumes.md) — for configuring the ingress traffic and public endpoints Configuration can be defined as YAML files within your repo. -### 2. Run configurations - -Run any defined configuration either via `dstack` CLI or API. - -`dstack` automatically handles provisioning, interruptions, port-forwarding, auto-scaling, network, volumes, -run failures, out-of-capacity errors, and more. +### 2. Apply configurations -### 3. Manage fleets +Apply the configuration either via the `dstack apply` CLI command or through a programmatic API. -Use [fleets](https://dstack.ai/docs/fleets.md) to provision and manage clusters and instances, both in the cloud and on-prem. +`dstack` automatically manages provisioning, job queuing, auto-scaling, networking, volumes, run failures, +out-of-capacity errors, port-forwarding, and more — across clouds and on-prem clusters. ## More information For additional information and examples, see the following links: * [Docs](https://dstack.ai/docs) -* [Examples](examples) +* [Examples](https://dstack.ai/docs/examples) * [Changelog](https://github.com/dstackai/dstack/releases) * [Discord](https://discord.gg/u8SmfwPpMd) diff --git a/docs/assets/stylesheets/extra.css b/docs/assets/stylesheets/extra.css index 08c3c0f8c..74b09fd15 100644 --- a/docs/assets/stylesheets/extra.css +++ b/docs/assets/stylesheets/extra.css @@ -598,7 +598,7 @@ code .md-code__nav:hover .md-code__button { } .md-typeset h5 { - font-size: 16px; + font-size: 18px; } .md-typeset h3 { diff --git a/docs/docs/concepts/gateways.md b/docs/docs/concepts/gateways.md index 81e9a22b4..49dc69e84 100644 --- a/docs/docs/concepts/gateways.md +++ b/docs/docs/concepts/gateways.md @@ -1,13 +1,14 @@ # Gateways -Gateways handle the ingress traffic of running services. -They provide [services](services.md) with HTTPS domains, handle authentication, distribute load, and perform auto-scaling. -In order to run a service, you need to have at least one gateway set up. +Gateways manage the ingress traffic of running services and provide them with an HTTPS endpoint mapped to your domain, +handling authentication, load distribution, and auto-scaling. + +To run a service, you need at least one gateway set up. > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, -the gateway is already set up for you. +> the gateway is already set up for you. -## Configuration +## Define a configuration First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `gateway.dstack.yml` are both acceptable). @@ -16,10 +17,14 @@ are both acceptable). ```yaml type: gateway +# A name of the gateway name: example-gateway +# Gateways are bound to a specific backend and region backend: aws region: eu-west-1 + +# This domain will be used to access the endpoint domain: example.com ``` @@ -28,10 +33,10 @@ domain: example.com A domain name is required to create a gateway. !!! info "Reference" - See the [.dstack.yml reference](../reference/dstack.yml/gateway.md) - for all supported configuration options and examples. + See [.dstack.yml](../reference/dstack.yml/gateway.md) for all the options supported by + gateways, along with multiple examples. -## Creating and updating gateways +## Create or update a gateway To create or update the gateway, simply call the [`dstack apply`](../reference/cli/index.md#dstack-apply) command: @@ -39,7 +44,6 @@ To create or update the gateway, simply call the [`dstack apply`](../reference/c ```shell $ dstack apply . -f examples/deployment/gateway.dstack.yml - The example-gateway doesn't exist. Create it? [y/n]: y BACKEND REGION NAME HOSTNAME DOMAIN DEFAULT STATUS @@ -49,36 +53,37 @@ The example-gateway doesn't exist. Create it? [y/n]: y
-## Updating DNS records +## Update DNS records Once the gateway is assigned a hostname, go to your domain's DNS settings and add an `A` DNS record for `*.` (e.g., `*.example.com`) pointing to the gateway's hostname. -This will allow you to access runs and models using this domain. +## Manage gateways -## Managing gateways - -### Listing gateways +### List gateways The [`dstack gateway list`](../reference/cli/index.md#dstack-gateway-list) command lists existing gateways and their status. -### Deleting gateways +### Delete a gateway To delete a gateway, pass gateway configuration to [`dstack delete`](../reference/cli/index.md#dstack-delete):
```shell -$ dstack delete . -f examples/deployment/gateway.dstack.yml +$ dstack delete -f examples/deployment/gateway.dstack.yml ```
-[//]: # (TODO: Ellaborate on default`) +[//]: # (TODO: Ellaborate on default) [//]: # (TODO: ## Accessing endpoints) ## What's next? 1. See [services](../services.md) on how to run services -2. Check the [`.dstack.yml` reference](../reference/dstack.yml/gateway.md) for more details and examples \ No newline at end of file + +!!! info "Reference" + See [.dstack.yml](../reference/dstack.yml/gateway.md) for all the options supported by + gateways, along with multiple examples. \ No newline at end of file diff --git a/docs/docs/concepts/volumes.md b/docs/docs/concepts/volumes.md index ffbc92074..e35bed730 100644 --- a/docs/docs/concepts/volumes.md +++ b/docs/docs/concepts/volumes.md @@ -1,13 +1,13 @@ # Volumes -Volumes allow you to persist data between runs. `dstack` simplifies managing volumes and lets you mount them to a specific -directory when working with dev environments, tasks, and services. +Volumes allow you to persist data between runs. `dstack` allows to create and attach volumes to +dev environments, tasks, and services. !!! info "Experimental" Volumes are currently experimental and work with the `aws`, `gcp`, and `runpod` backends. Support for other backends is coming soon. -## Configuration +## Define a configuration First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `vol.dstack.yml` are both acceptable). @@ -16,9 +16,14 @@ are both acceptable). ```yaml type: volume +# A name of the volume name: my-new-volume + +# Volumes are bound to a specific backend and region backend: aws region: eu-central-1 + +# Required size size: 100GB ``` @@ -28,13 +33,13 @@ If you use this configuration, `dstack` will create a new volume based on the sp !!! info "Registering existing volumes" If you prefer not to create a new volume but to reuse an existing one (e.g., created manually), you can - [specify its ID via `volume_id`](../reference/dstack.yml/volume.md#register-volume). In this case, `dstack` will register the specified volume so that you can use it with dev environments, tasks, and services. + [specify its ID via `volume_id`](../reference/dstack.yml/volume.md#existing-volume). In this case, `dstack` will register the specified volume so that you can use it with dev environments, tasks, and services. !!! info "Reference" - See the [.dstack.yml reference](../reference/dstack.yml/dev-environment.md) - for all supported configuration options and examples. + See [.dstack.yml](../reference/dstack.yml/volume.md) for all the options supported by + volumes, along with multiple examples. -## Creating and registering volumes +## Create, register, or update a volume To create or register the volume, simply call the `dstack apply` command: @@ -43,6 +48,7 @@ To create or register the volume, simply call the `dstack apply` command: ```shell $ dstack apply -f volume.dstack.yml Volume my-new-volume does not exist yet. Create the volume? [y/n]: y + NAME BACKEND REGION STATUS CREATED my-new-volume aws eu-central-1 submitted now @@ -54,7 +60,7 @@ Volume my-new-volume does not exist yet. Create the volume? [y/n]: y Once created, the volume can be attached with dev environments, tasks, and services. -## Attaching volumes +## Attach a volume Dev environments, tasks, and services let you attach any number of volumes. To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: @@ -63,7 +69,12 @@ To attach a volume, simply specify its name using the `volumes` property and spe ```yaml type: dev-environment +# A name of the dev environment +name: vscode-vol + ide: vscode + +# Map the name of the volume to any path volumes: - name: my-new-volume path: /volume_data @@ -79,9 +90,9 @@ and its contents will persist across runs. to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to attach volumes to `/workflow` or any of its subdirectories. -## Managing volumes +## Manage volumes -### Listing volumes +### List volumes The [`dstack volume list`](../reference/cli/index.md#dstack-gateway-list) command lists created and registered volumes: @@ -91,7 +102,7 @@ NAME BACKEND REGION STATUS CREATED my-new-volume aws eu-central-1 active 3 weeks ago ``` -### Deleting volumes +### Delete volumes When the volume isn't attached to any active dev environment, task, or service, you can delete it using `dstack delete`: @@ -104,15 +115,17 @@ If you've registered an existing volume, it will be de-registered with `dstack` ## FAQ -??? info "Using volumes across backends" - Since volumes are backed up by cloud network disks, you can only use them within the same cloud. If you need to access - data across different backends, you should either use object storage or replicate the data across multiple volumes. +##### Can I use volumes across backends? + +Since volumes are backed up by cloud network disks, you can only use them within the same cloud. If you need to access +data across different backends, you should either use object storage or replicate the data across multiple volumes. + +##### Can I use volumes across regions? + +Typically, network volumes are associated with specific regions, so you can't use them in other regions. Often, +volumes are also linked to availability zones, but some providers support volumes that can be used across different +availability zones within the same region. -??? info "Using volumes across regions" - Typically, network volumes are associated with specific regions, so you can't use them in other regions. Often, - volumes are also linked to availability zones, but some providers support volumes that can be used across different - availability zones within the same region. +##### Can I attach volumes to multiple runs or instances? -??? info "Attaching volumes to multiple runs and instances" - You can mount a volume in multiple runs. - This feature is currently supported only by the `runpod` backend. +You can mount a volume in multiple runs. This feature is currently supported only by the `runpod` backend. \ No newline at end of file diff --git a/docs/docs/dev-environments.md b/docs/docs/dev-environments.md index 3afb69a7f..0c7c60070 100644 --- a/docs/docs/dev-environments.md +++ b/docs/docs/dev-environments.md @@ -1,27 +1,34 @@ # Dev environments -Before scheduling a task or deploying a model, you may want to run code interactively. Dev environments allow you to -provision a remote machine set up with your code and favorite IDE with just one command. +A dev environment lets you provision a remote machine with your code, dependencies, and resources, and access it with +your desktop IDE. -## Configuration +Dev environments are perfect when you need to run code interactively. -First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `dev.dstack.yml` are +## Define a configuration + +First, create a YAML file in your project repo. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `dev.dstack.yml` are both acceptable). -
+
```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode -# Specify the Python version, or your Docker image python: "3.11" +# Uncomment to use a custom Docker image +#image: dstackai/base:py3.10-0.4-cuda-12.1 -# This pre-configures the IDE with required extensions ide: vscode -# Specify GPU, disk, and other resource requirements +# Use either spot or on-demand instances +spot_policy: auto + resources: - gpu: 80GB + # Required resources + gpu: 24GB ```
@@ -30,52 +37,36 @@ If you don't specify your Docker image, `dstack` uses the [base](https://hub.doc (pre-configured with Python, Conda, and essential CUDA drivers). !!! info "Reference" - See the [.dstack.yml reference](reference/dstack.yml/dev-environment.md) - for all supported configuration options and examples. + See [.dstack.yml](reference/dstack.yml/dev-environment.md) for all the options supported by + dev environments, along with multiple examples. -## Running +## Run a configuration -To run a configuration, use the [`dstack run`](reference/cli/index.md#dstack-run) command followed by the working directory path, -configuration file path, and other options. +To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-apply) command.
```shell -$ dstack run . -f .dstack.yml +$ dstack apply -f examples/.dstack.yml - BACKEND REGION RESOURCES SPOT PRICE - tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595 - azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - -Continue? [y/n]: y + # BACKEND REGION RESOURCES SPOT PRICE + 1 runpod CA-MTL-1 9xCPU, 48GB, A5000:24GB yes $0.11 + 2 runpod EU-SE-1 9xCPU, 43GB, A5000:24GB yes $0.11 + 3 gcp us-west4 4xCPU, 16GB, L4:24GB yes $0.214516 -Provisioning `fast-moth-1`... +Submit the run vscode? [y/n]: y + +Launching `vscode`... ---> 100% To open in VS Code Desktop, use this link: - vscode://vscode-remote/ssh-remote+fast-moth-1/workflow + vscode://vscode-remote/ssh-remote+vscode/workflow ```
-When `dstack` provisions the dev environment, it mounts the project folder contents. - -??? info ".gitignore" - If there are large files or folders you'd like to avoid uploading, - you can list them in `.gitignore`. - -??? info "Fleets" - By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md). - If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends. - - To have the fleet deleted after a certain idle time automatically, set - [`termination_idle_time`](../reference/dstack.yml/fleet.md#termination_idle_time). - By default, it's set to `5min`. - -!!! info "Reference" - See the [CLI reference](reference/cli/index.md#dstack-run) for more details - on how `dstack run` works. +`dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes. +To avoid uploading large files, ensure they are listed in `.gitignore`. ### VS Code @@ -96,21 +87,41 @@ $ ssh fast-moth-1
-## Managing runs +## Manage runs -### Listing runs +### List runs -The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running runs and their status. +The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running jobs and their statuses. +Use `--watch` (or `-w`) to monitor the live status of runs. -### Stopping runs +### Stop a run -Once the run exceeds the max duration, -or when you use [`dstack stop`](reference/cli/index.md#dstack-stop), -the dev environment and its cloud resources are deleted. +Once the run exceeds the [`max_duration`](reference/dstack.yml/dev-environment.md#max_duration), or when you use [`dstack stop`](reference/cli/index.md#dstack-stop), +the dev environment is stopped. Use `--abort` or `-x` to stop the run abruptly. [//]: # (TODO: Mention `dstack logs` and `dstack logs -d`) +## Manage fleets + +By default, `dstack apply` reuses `idle` instances from one of the existing [fleets](fleets.md), +or creates a new fleet through backends. + +!!! info "Idle duration" + To ensure the created fleets are deleted automatically, set + [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time). + By default, it's set to `5min`. + +!!! info "Creation policy" + To ensure `dstack apply` always reuses an existing fleet and doesn't create a new one, + pass `--reuse` to `dstack apply` (or set [`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the task configuration). + The default policy is `reuse_or_create`. + ## What's next? -1. Check the [`.dstack.yml` reference](reference/dstack.yml/dev-environment.md) for more details and examples -2. See [fleets](fleets.md) on how to manage fleets \ No newline at end of file +1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), and + [services](services.md) +2. See [fleets](fleets.md) on how to manage fleets + +!!! info "Reference" + See [.dstack.yml](reference/dstack.yml/dev-environment.md) for all the options supported by + dev environments, along with multiple examples. diff --git a/docs/docs/fleets.md b/docs/docs/fleets.md index 68386e967..38a71133e 100644 --- a/docs/docs/fleets.md +++ b/docs/docs/fleets.md @@ -5,7 +5,7 @@ fleet is created, it can be reused by dev environments, tasks, and services. > Fleets is a new feature. To use it, ensure you've installed version `0.18.7` or higher. -## Configuration +## Define a configuration To create a fleet, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `fleet.dstack.yml` are both acceptable). @@ -15,19 +15,27 @@ are both acceptable). To provision a fleet in the cloud using the configured backends, specify the required resources, number of nodes, and other optional parameters. -
+
```yaml type: fleet - name: my-fleet + # The name is optional, if not specified, generated randomly + name: ah-fleet-distrib + # Size of the cluster nodes: 2 + # Ensure instances are interconnected placement: cluster - backends: [aws] + # Use either spot or on-demand instances + spot_policy: auto resources: - gpu: 24GB + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. ```
@@ -41,14 +49,17 @@ are both acceptable). To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH. -
+
```yaml type: fleet - name: my-fleet + # The name is optional, if not specified, generated randomly + name: my-on-prem-fleet + # Ensure instances are interconnected placement: cluster + # The user, private SSH key, and hostnames of the on-prem servers ssh_config: user: ubuntu identity_file: ~/.ssh/id_rsa @@ -65,21 +76,22 @@ are both acceptable). Set `placement` to `cluster` if the nodes are interconnected (e.g. if you'd like to use them for multi-node tasks). In that case, by default, `dstack` will automatically detect the private network. - You can specify the [`network`](../reference/dstack.yml/fleet.md#network) parameter manually. + You can specify the [`network`](reference/dstack.yml/fleet.md#network) parameter manually. !!! info "Reference" - See the [.dstack.yml reference](reference/dstack.yml/fleet.md) - for all supported configuration options and examples. + See [.dstack.yml](reference/dstack.yml/fleet.md) for all the options supported by + fleets, along with multiple examples. -## Creating and updating fleets +## Create or update a fleet To create or update the fleet, simply call the [`dstack apply`](reference/cli/index.md#dstack-apply) command:
```shell -$ dstack apply -f examples/fleets/cluster.dstack.yml -Fleet my-fleet does not exist yet. Create the fleet? [y/n]: y +$ dstack apply -f examples/fine-tuning/alignment-handbook/fleet-distributed.dstack.yml +Fleet ah-fleet-distrib does not exist yet. Create the fleet? [y/n]: y + FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED my-fleet 0 pending now 1 pending now @@ -87,22 +99,26 @@ Fleet my-fleet does not exist yet. Create the fleet? [y/n]: y
-Once the status of instances change to `idle`, they can be used by `dstack run`. +Once the status of instances change to `idle`, they can be used by dev environments, tasks, and services. ## Creation policy -> By default, `dstack run` tries to reuse `idle` instances from existing fleets. -If no `idle` instances meet the requirements, `dstack run` creates a new fleet automatically. -To avoid creating new fleet, specify pass `--reuse` to `dstack run`. +By default, when running dev environments, tasks, and services, `dstack apply` tries to reuse `idle` +instances from existing fleets. +If no `idle` instances meet the requirements, it creates a new fleet automatically. +To avoid creating new fleet, specify pass `--reuse` to `dstack apply` or (or set [ +`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the configuration). ## Termination policy -> If you want a fleet to be automatically deleted after a certain idle time, you can set the -you can set the [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) property. +> If you want a fleet to be automatically deleted after a certain idle time, you can set the +> you can set the [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) property. + +[//]: # (Add Idle time example to the reference page) -## Managing fleets +## Manage fleets -### Listing fleets +### List fleets The [`dstack fleet`](reference/cli/index.md#dstack-gateway-list) command lists fleet instances and theri status: @@ -117,7 +133,7 @@ $ dstack fleet
-### Deleting fleets +### Delete fleets When a fleet isn't used by run, you can delete it via `dstack delete`: @@ -133,4 +149,14 @@ Fleet my-gcp-fleet deleted You can pass either the path to the configuration file or the fleet name directly. -To terminate and delete specific instances from a fleet, pass `-i INSTANCE_NUM`. \ No newline at end of file +To terminate and delete specific instances from a fleet, pass `-i INSTANCE_NUM`. + +## What's next? + +1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), and + [services](services.md) +2. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) + +!!! info "Reference" + See [.dstack.yml](reference/dstack.yml/fleet.md) for all the options supported by + fleets, along with multiple examples. \ No newline at end of file diff --git a/docs/docs/guides/dstack-sky.md b/docs/docs/guides/dstack-sky.md new file mode 100644 index 000000000..345b6276d --- /dev/null +++ b/docs/docs/guides/dstack-sky.md @@ -0,0 +1,44 @@ +# dstack Sky + +If you don't want to host the `dstack` server or would like to access GPU from the `dstack` marketplace, +sign up with [dstack Sky](../guides/dstack-sky.md). + +### Set up the CLI + +If you've signed up, open your project settings, and copy the `dstack config` command to point the CLI to the project. + +![](https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-sky-project-config.png){ width=800 } + +Then, install the CLI on your machine and use the copied command. + +
+ +```shell +$ pip install dstack +$ dstack config --url https://sky.dstack.ai \ + --project peterschmidt85 \ + --token bbae0f28-d3dd-4820-bf61-8f4bb40815da + +Configuration is updated at ~/.dstack/config.yml +``` + +
+ +### Configure clouds + +By default, [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"} +uses the GPU from its marketplace, which requires a credit card to be attached in your account +settings. + +To use your own cloud accounts, click the settings icon of the corresponding backend and specify credentials: + +![](https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-sky-edit-backend-config.png){ width=800 } + +For more details on how to configure your own cloud accounts, check +the [server/config.yml reference](../reference/server/config.yml.md). + +## What's next? + +1. Follow [quickstart](../quickstart.md) +2. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples) +3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) \ No newline at end of file diff --git a/docs/docs/index.md b/docs/docs/index.md index 49fdb0ddb..44299a711 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -1,13 +1,12 @@ # What is dstack? -`dstack` is an open-source container orchestration engine for AI. -It accelerates the development, training, and deployment of AI models, and simplifies the management of clusters. +`dstack` is a lightweight alternative to Kubernetes, designed specifically for managing the development, training, and +deployment of AI models at any scale. -#### Cloud and on-prem +`dstack` is easy to use with any cloud provider (AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, etc.) or +any on-prem clusters. -`dstack` is easy to use with any cloud or on-prem servers. -Supported cloud providers include AWS, GCP, Azure, OCI, Lambda, TensorDock, Vast.ai, RunPod, and CUDO. -For using `dstack` with on-prem servers, see [fleets](fleets.md#__tabbed_1_2). +If you already use Kubernetes, `dstack` can be used with it. #### Accelerators @@ -15,35 +14,31 @@ For using `dstack` with on-prem servers, see [fleets](fleets.md#__tabbed_1_2). ## How does it work? -> Before using `dstack`, [install](installation/index.md) the server and configure -backends for each cloud account (or Kubernetes cluster) that you intend to use. +> Before using `dstack`, [install](installation/index.md) the server and configure backends. -#### 1. Define run configurations +#### 1. Define configurations -`dstack` supports three types of run configurations: +`dstack` supports the following configurations: * [Dev environments](dev-environments.md) — for interactive development using a desktop IDE -* [Tasks](tasks.md) — for any kind of batch jobs or web applications (supports distributed jobs) -* [Services](services.md)— for production-grade deployment (supports auto-scaling and authorization) - -Each type of run configuration allows you to specify commands for execution, required compute resources, retry policies, auto-scaling rules, authorization settings, and more. +* [Tasks](tasks.md) — for scheduling jobs (incl. distributed jobs) or running web apps +* [Services](services.md) — for deployment of models and web apps (with auto-scaling and authorization) +* [Fleets](fleets.md) — for managing cloud and on-prem clusters +* [Volumes](concepts/volumes.md) — for managing persisted volumes +* [Gateways](concepts/volumes.md) — for configuring the ingress traffic and public endpoints Configuration can be defined as YAML files within your repo. -#### 2. Run configurations - -Run any defined configuration either via `dstack` CLI or API. - -`dstack` automatically handles provisioning, interruptions, port-forwarding, auto-scaling, network, volumes, -run failures, out-of-capacity errors, and more. +#### 2. Apply configurations -#### 3. Manage fleets +Apply the configuration either via the `dstack apply` CLI command or through a programmatic API. -Use [fleets](fleets.md) to provision and manage clusters and instances, both in the cloud and on-prem. +`dstack` automatically manages provisioning, job queuing, auto-scaling, networking, volumes, run failures, +out-of-capacity errors, port-forwarding, and more — across clouds and on-prem clusters. ## Where do I start? 1. Proceed to [installation](installation/index.md) 2. See [quickstart](quickstart.md) -3. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"} +3. Browse [examples](/docs/examples) 4. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"} \ No newline at end of file diff --git a/docs/docs/installation/index.md b/docs/docs/installation/index.md index 46725cad5..8d984d441 100644 --- a/docs/docs/installation/index.md +++ b/docs/docs/installation/index.md @@ -13,7 +13,7 @@ Follow the steps below to set up the server. ### 1. Configure backends -> If you want the `dstack` server to run containers or manage clusters in your cloud accounts (or use Kubernetes), +If you want the `dstack` server to run containers or manage clusters in your cloud accounts (or use Kubernetes), create the [~/.dstack/server/config.yml](../reference/server/config.yml.md) file and configure backends. ### 2. Start the server @@ -55,16 +55,16 @@ Once the `~/.dstack/server/config.yml` file is configured, proceed to start the > For more details on how to deploy `dstack` using Docker, check its [Docker repo](https://hub.docker.com/r/dstackai/dstack). -> By default, the `dstack` server stores its state in `~/.dstack/server/data` using SQLite. -> To use a database, set the [`DSTACK_DATABASE_URL`](../reference/cli/index.md#environment-variables) environment variable. +By default, the `dstack` server stores its state in `~/.dstack/server/data` using SQLite. +To use a database, set the [`DSTACK_DATABASE_URL`](../reference/cli/index.md#environment-variables) environment variable. -The server can be set up anywhere: on your laptop, a dedicated server, or in the cloud. -Once the `dstack` server is up, you can use the CLI or API. +The `dstack` server can run anywhere: on your laptop, a dedicated server, or in the cloud. Once it's up, you +can use either the CLI or the API. ### 3. Set up the CLI To point the CLI to the `dstack` server, configure it -with the server address, user token and project name: +with the server address, user token, and project name:
@@ -81,55 +81,18 @@ Configuration is updated at ~/.dstack/config.yml This configuration is stored in `~/.dstack/config.yml`. -### 4. Add on-prem servers +### 4. Create on-prem fleets -!!! info "Fleets" - If you want the `dstack` server to run containers on your on-prem servers, - use [fleets](../fleets.md#__tabbed_1_2). - -## dstack Sky - -If you don't want to host the `dstack` server yourself or would like to access GPU from the `dstack` marketplace, sign up with -[dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}. - -### Set up the CLI - -If you've signed up, -open your project settings, and copy the `dstack config` command to point the CLI to the project. - -![](https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-sky-project-config.png){ width=800 } - -Then, install the CLI on your machine and use the copied command. - -
- -```shell -$ pip install dstack -$ dstack config --url https://sky.dstack.ai \ - --project peterschmidt85 \ - --token bbae0f28-d3dd-4820-bf61-8f4bb40815da - -Configuration is updated at ~/.dstack/config.yml -``` - -
- -### Configure clouds - -By default, [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"} -uses the GPU from its marketplace, which requires a credit card to be attached in your account -settings. - -To use your own cloud accounts, click the settings icon of the corresponding backend and specify credentials: - -![](https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-sky-edit-backend-config.png){ width=800 } - -[//]: # (The `dstack server` command automatically updates `~/.dstack/config.yml`) -[//]: # (with the `main` project.) +If you want the `dstack` server to run containers on your on-prem servers, +use [fleets](../fleets.md#__tabbed_1_2). ## What's next? 1. Check the [server/config.yml reference](../reference/server/config.yml.md) on how to configure backends 2. Follow [quickstart](../quickstart.md) 3. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples) -4. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) \ No newline at end of file +4. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) + +!!! info "dstack Sky" + If you don't want to host the `dstack` server or would like to access GPU from the `dstack` marketplace, + check [dstack Sky](../guides/dstack-sky.md). \ No newline at end of file diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md index 5f4700810..81e3fb9ce 100644 --- a/docs/docs/quickstart.md +++ b/docs/docs/quickstart.md @@ -1,7 +1,6 @@ # Quickstart -> Before using `dstack`, [install](installation/index.md) the server and configure -backends. +> Before using `dstack`, [install](installation/index.md) the server. ## Initialize a repo @@ -18,118 +17,220 @@ $ dstack init Your folder can be a regular local folder or a Git repo. -## Define a configuration - -Define what you want to run as a YAML file. The filename must end with `.dstack.yml` (e.g., `.dstack.yml` -or `train.dstack.yml` are both acceptable). +## Run a configuration === "Dev environment" - Dev environments allow you to quickly provision a machine with a pre-configured environment, resources, IDE, code, etc. + A dev environment lets you provision a remote machine with your code, dependencies, and resources, and access it + with your desktop IDE. + + ##### Define a configuration + + Create the following configuration file inside the repo:
```yaml type: dev-environment - - # Use either `python` or `image` to configure environment + # The name is optional, if not specified, generated randomly + name: vscode + python: "3.11" - # image: ghcr.io/huggingface/text-generation-inference:latest + # Uncomment to use a custom Docker image + #image: dstackai/base:py3.10-0.4-cuda-12.1 ide: vscode - - # (Optional) Configure `gpu`, `memory`, `disk`, etc - resources: - gpu: 24GB + + # Use either spot or on-demand instances + spot_policy: auto + + # Uncomment to request resources + #resources: + # gpu: 24GB ```
+ ##### Run the configuration + + Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply): + +
+ + ```shell + $ dstack apply -f .dstack.yml + + # BACKEND REGION RESOURCES SPOT PRICE + 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052 + 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132 + 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248 + + Submit the run vscode? [y/n]: y + + Launching `vscode`... + ---> 100% + + To open in VS Code Desktop, use this link: + vscode://vscode-remote/ssh-remote+vscode/workflow + ``` + +
+ + Open the link to access the dev environment using your desktop IDE. + === "Task" - Tasks make it very easy to run any scripts, be it for training, data processing, or web apps. They allow you to pre-configure the environment, resources, code, etc. + A task allows you to schedule a job or run a web app. It lets you configure + dependencies, resources, ports, the number of nodes (if you want to run the task on a cluster), etc. -
+ ##### Define a configuration + + Create the following configuration file inside the repo: + +
```yaml type: task - + # The name is optional, if not specified, generated randomly + name: streamlit + python: "3.11" - env: - - HF_HUB_ENABLE_HF_TRANSFER=1 + # Uncomment to use a custom Docker image + #image: dstackai/base:py3.10-0.4-cuda-12.1 + + # Commands of the task commands: - - pip install -r fine-tuning/qlora/requirements.txt - - python fine-tuning/qlora/train.py - - # (Optional) Configure `gpu`, `memory`, `disk`, etc - resources: - gpu: 24GB + - pip install streamlit + - streamlit hello + # Ports to forward + ports: + - 8501 + + # Use either spot or on-demand instances + spot_policy: auto + + # Uncomment to request resources + #resources: + # gpu: 24GB ```
- Ensure `requirements.txt` and `train.py` are in your folder. You can take them from [`examples`](https://github.com/dstackai/dstack/tree/master/examples/fine-tuning/qlora). + ##### Run the configuration + + Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply): + +
+ + ```shell + $ dstack apply -f streamlit.dstack.yml + + # BACKEND REGION RESOURCES SPOT PRICE + 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052 + 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132 + 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248 + + Submit the run streamlit? [y/n]: y + + Continue? [y/n]: y + + Provisioning `streamlit`... + ---> 100% + + Welcome to Streamlit. Check out our demo in your browser. + + Local URL: http://localhost:8501 + ``` + +
+ + `dstack apply` automatically forwards the remote ports to `localhost` for convenient access. === "Service" - Services make it easy to deploy models and apps cost-effectively as public endpoints, allowing you to use any frameworks. + A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure + dependencies, resources, authorizarion, auto-scaling rules, etc. + + ??? info "Prerequisites + If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service. -
+ If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, + the gateway is already set up for you. + + ##### Define a configuration + + Create the following configuration file inside the repo: + +
```yaml type: service - - image: ghcr.io/huggingface/text-generation-inference:latest - env: - - HUGGING_FACE_HUB_TOKEN # required to run gated models - - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1 + # The name is optional, if not specified, generated randomly + name: streamlit-service + + python: "3.11" + # Uncomment to use a custom Docker image + #image: dstackai/base:py3.10-0.4-cuda-12.1 + + # Commands of the service commands: - - text-generation-launcher --port 8000 --trust-remote-code - port: 8000 + - pip install streamlit + - streamlit hello + # Port of the service + port: 8501 - # (Optional) Configure `gpu`, `memory`, `disk`, etc - resources: - gpu: 24GB + # Comment to enable authorizartion + auth: False + + # Use either spot or on-demand instances + spot_policy: auto + + # Uncomment to request resources + #resources: + # gpu: 24GB ```
-## Run configuration + ##### Run the configuration -Run a configuration using the [`dstack run`](reference/cli/index.md#dstack-run) command, followed by the working directory path (e.g., `.`), -and the path to the configuration file. + Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply): -
+
-```shell -$ dstack run . -f train.dstack.yml - - BACKEND REGION RESOURCES SPOT PRICE - tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595 - azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - -Continue? [y/n]: y - -Provisioning... ----> 100% + ```shell + $ dstack apply -f streamlit.dstack.yml + + # BACKEND REGION RESOURCES SPOT PRICE + 1 gcp us-west4 2xCPU, 8GB, 100GB (disk) yes $0.010052 + 2 azure westeurope 2xCPU, 8GB, 100GB (disk) yes $0.0132 + 3 gcp europe-central2 2xCPU, 8GB, 100GB (disk) yes $0.013248 + + Submit the run streamlit? [y/n]: y + + Continue? [y/n]: y + + Provisioning `streamlit`... + ---> 100% -Epoch 0: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] -Epoch 1: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] -Epoch 2: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] -``` + Welcome to Streamlit. Check out our demo in your browser. -
+ Local URL: https://streamlit-service.example.com + ``` + +
-The `dstack run` command automatically uploads your code, including any local uncommitted changes. + One the service is up, its endpoint is accessible at `https://.`. -!!! info "Fleets" - By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md). - If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends. +> `dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes. ## What's next? 1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), [services](services.md), and [fleets](fleets.md) 2. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"} -3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) \ No newline at end of file +3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd) + +!!! info "Examples" + To see how dev environments, tasks, services, and fleets can be used for + training and deploying AI models, check out the [examples](examples/index.md). \ No newline at end of file diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/docs/docs/reference/dstack.yml/dev-environment.md index 2a39c0788..c516099c2 100644 --- a/docs/docs/reference/dstack.yml/dev-environment.md +++ b/docs/docs/reference/dstack.yml/dev-environment.md @@ -2,9 +2,9 @@ The `dev-environment` configuration type allows running [dev environments](../../dev-environments.md). -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be run via [`dstack run`](../cli/index.md#dstack-run). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `dev.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples @@ -18,25 +18,46 @@ The `python` property determines which default Docker image is used. ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode -python: "3.11" +# If `image` is not specified, dstack uses its base image +python: "3.10" ide: vscode ```
-!!! info "nvcc" +??? info "nvcc" Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels. To install it, use `conda install cuda`. + + ```yaml + type: dev-environment + # The name is optional, if not specified, generated randomly + name: vscode + + python: "3.10" + + ide: vscode + + # Run this command on start + init: + - conda install cuda + ``` + ### Docker image
```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode +# Any custom Docker image image: ghcr.io/huggingface/text-generation-inference:latest ide: vscode @@ -50,8 +71,12 @@ ide: vscode ```yaml type: dev-environment - + # The name is optional, if not specified, generated randomly + name: vscode + + # Any private Docker image image: ghcr.io/huggingface/text-generation-inference:latest + # Credentials of the private Docker registry registry_auth: username: peterschmidt85 password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5 @@ -68,19 +93,19 @@ range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`). ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode ide: vscode resources: # 200GB or more RAM memory: 200GB.. - # 4 GPUs from 40GB to 80GB gpu: 40GB..80GB:4 - - # Shared memory + # Shared memory (required by multi-gpu) shm_size: 16GB - + # Disk size disk: 500GB ``` @@ -96,6 +121,8 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10 ```yaml type: dev-environment + # The name is optional, if not specified, generated randomly + name: vscode ide: vscode @@ -115,7 +142,10 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10 ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode +# Environment variables env: - HUGGING_FACE_HUB_TOKEN - HF_HUB_ENABLE_HF_TRANSFER=1 @@ -125,12 +155,12 @@ ide: vscode
-If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), +> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), `dstack` will require the value to be passed via the CLI or set in the current process. For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`. -#### Default environment variables +#### System environment variables The following environment variables are available in any run and are passed by `dstack` by default: @@ -148,9 +178,12 @@ You can choose whether to use spot instances, on-demand instances, or any availa ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode ide: vscode +# Use either spot or on-demand instances spot_policy: auto ``` @@ -166,9 +199,12 @@ By default, `dstack` provisions instances in all configured backends. However, y ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode ide: vscode +# Use only listed backends backends: [aws, gcp] ``` @@ -182,9 +218,12 @@ By default, `dstack` uses all configured regions. However, you can specify the l ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode ide: vscode +# Use only listed regions regions: [eu-west-1, eu-west-2] ``` @@ -199,9 +238,12 @@ To attach a volume, simply specify its name using the `volumes` property and spe ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: vscode ide: vscode +# Map the name of the volume to any path volumes: - name: my-new-volume path: /volume_data @@ -212,7 +254,7 @@ volumes: Once you run this configuration, the contents of the volume will be attached to `/volume_data` inside the dev environment, and its contents will persist across runs. -!!! info "Limitations" +??? info "Limitations" When you're running a dev environment, task, or service with `dstack`, it automatically mounts the project folder contents to `/workflow` (and sets that as the current working directory). Right now, `dstack` doesn't allow you to attach volumes to `/workflow` or any of its subdirectories. diff --git a/docs/docs/reference/dstack.yml/fleet.md b/docs/docs/reference/dstack.yml/fleet.md index e763ccded..ccccd4c21 100644 --- a/docs/docs/reference/dstack.yml/fleet.md +++ b/docs/docs/reference/dstack.yml/fleet.md @@ -2,35 +2,52 @@ The `fleet` configuration type allows creating and updating fleets. -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `fleet.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `fleet.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples -### Creating a cloud fleet { #create-cloud-fleet } +### Cloud fleet { #cloud-fleet } -
+
```yaml type: fleet -name: my-gcp-fleet +# The name is optional, if not specified, generated randomly +name: my-fleet + +# The number of instances nodes: 4 +# Ensure the instances are interconnected placement: cluster -backends: [gcp] + +# Use either spot or on-demand instances +spot_policy: auto + resources: - gpu: 1 + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. ```
-### Creating an on-prem fleet { #create-ssh-fleet } +### On-prem fleet { #on-prem-fleet } -
+
```yaml type: fleet -name: my-ssh-fleet +# The name is optional, if not specified, generated randomly +name: my-on-prem-fleet + +# Ensure instances are interconnected +placement: cluster + +# The user, private SSH key, and hostnames of the on-prem servers ssh_config: user: ubuntu identity_file: ~/.ssh/id_rsa @@ -43,6 +60,8 @@ ssh_config: [//]: # (TODO: a cluster, individual user and identity file, etc) +[//]: # (TODO: other examples, for all properties like in dev-environment/task/service) + ## Root reference #SCHEMA# dstack._internal.core.models.fleets.FleetConfiguration @@ -57,7 +76,6 @@ ssh_config: overrides: show_root_heading: false - ## `ssh.hosts[n]` #SCHEMA# dstack._internal.core.models.fleets.SSHHostParams diff --git a/docs/docs/reference/dstack.yml/gateway.md b/docs/docs/reference/dstack.yml/gateway.md index 66fc3c21f..c9b2cd4b9 100644 --- a/docs/docs/reference/dstack.yml/gateway.md +++ b/docs/docs/reference/dstack.yml/gateway.md @@ -2,25 +2,32 @@ The `gateway` configuration type allows creating and updating [gateways](../../services.md). -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `gateway.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `gateway.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples +### Creating a new gateway { #new-gateway } +
```yaml type: gateway +# A name of the gateway name: example-gateway +# Gateways are bound to a specific backend and region backend: aws region: eu-west-1 + +# This domain will be used to access the endpoint domain: example.com ```
+[//]: # (TODO: other examples, e.g. private gateways) ## Root reference diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md index 9ca184d57..99c4101e5 100644 --- a/docs/docs/reference/dstack.yml/service.md +++ b/docs/docs/reference/dstack.yml/service.md @@ -2,9 +2,9 @@ The `service` configuration type allows running [services](../../services.md). -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be run via [`dstack run . -f PATH`](../cli/index.md#dstack-run). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `serve.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples @@ -14,16 +14,20 @@ If you don't specify `image`, `dstack` uses the default Docker image pre-configu `python`, `pip`, `conda` (Miniforge), and essential CUDA drivers. The `python` property determines which default Docker image is used. -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service -python: "3.11" +# If `image` is not specified, dstack uses its base image +python: "3.10" +# Commands of the service commands: - python3 -m http.server - +# The port of the service port: 8000 ``` @@ -31,20 +35,24 @@ port: 8000 !!! info "nvcc" Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels. - To install it, use `conda install cuda`. + To install it, use `conda install cuda` as the first command. ### Docker image -
+
```yaml type: service + # The name is optional, if not specified, generated randomly + name: http-server-service + + # Any custom Docker image + image: dstackai/base:py3.10-0.4-cuda-12.1 - image: dstackai/base:py3.11-0.4-cuda-12.1 - + # Commands of the service commands: - python3 -m http.server - + # The port of the service port: 8000 ``` @@ -56,47 +64,55 @@ port: 8000 ```yaml type: service + # The name is optional, if not specified, generated randomly + name: http-server-service - image: dstackai/base:py3.11-0.4-cuda-12.1 - - commands: - - python3 -m http.server + # Any private Docker iamge + image: dstackai/base:py3.10-0.4-cuda-12.1 + # Credentials of the private registry registry_auth: username: peterschmidt85 password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5 - + + # Commands of the service + commands: + - python3 -m http.server + # The port of the service port: 8000 ``` -### OpenAI-compatible interface { #model-mapping } +### Model gateway { #model-mapping } By default, if you run a service, its endpoint is accessible at `https://.`. If you run a model, you can optionally configure the mapping to make it accessible via the OpenAI-compatible interface. -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: llama31-service -python: "3.11" +python: "3.10" -env: - - MODEL=NousResearch/Llama-2-7b-chat-hf +# Commands of the service commands: - - pip install vllm - - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000 + - install vllm==0.5.3.post1 + - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 +# Expose the port of the service port: 8000 resources: + # Change to what is required gpu: 24GB -# Enable the OpenAI-compatible endpoint +# Comment if you don't want to access the model via https://gateway. model: - format: openai type: chat - name: NousResearch/Llama-2-7b-chat-hf + name: meta-llama/Meta-Llama-3.1-8B-Instruct + format: openai ```
@@ -149,32 +165,32 @@ and `openai` (if you are using Text Generation Inference or vLLM with OpenAI-com By default, `dstack` runs a single replica of the service. You can configure the number of replicas as well as the auto-scaling rules. -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: llama31-service -python: "3.11" +python: "3.10" -env: - - MODEL=NousResearch/Llama-2-7b-chat-hf +# Commands of the service commands: - - pip install vllm - - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000 + - install vllm==0.5.3.post1 + - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 +# Expose the port of the service port: 8000 resources: + # Change to what is required gpu: 24GB -# Enable the OpenAI-compatible endpoint -model: - format: openai - type: chat - name: NousResearch/Llama-2-7b-chat-hf - +# Minimum and maximum number of replicas replicas: 1..4 scaling: + # Requests per seconds metric: rps + # Target metric value target: 10 ``` @@ -192,31 +208,31 @@ Setting the minimum number of replicas to `0` allows the service to scale down t If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`). -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service + +python: "3.10" -python: "3.11" +# Commands of the service commands: - pip install vllm - python -m vllm.entrypoints.openai.api_server --model mistralai/Mixtral-8X7B-Instruct-v0.1 --host 0.0.0.0 - --tensor-parallel-size 2 # Match the number of GPUs + --tensor-parallel-size $DSTACK_GPUS_NUM +# Expose the port of the service port: 8000 resources: # 2 GPUs of 80GB gpu: 80GB:2 + # Minimum disk size disk: 200GB - -# Enable the OpenAI-compatible endpoint -model: - type: chat - name: TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ - format: openai ```
@@ -235,41 +251,51 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10 By default, the service endpoint requires the `Authorization` header with `"Bearer "`. Authorization can be disabled by setting `auth` to `false`. -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service -python: "3.11" +# Disable authorization +auth: false + +python: "3.10" +# Commands of the service commands: - python3 -m http.server - +# The port of the service port: 8000 - -auth: false ```
### Environment variables -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: llama-2-7b-service -python: "3.11" +python: "3.10" +# Environment variables env: - HUGGING_FACE_HUB_TOKEN - MODEL=NousResearch/Llama-2-7b-chat-hf +# Commands of the service commands: - pip install vllm - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000 +# The port of the service port: 8000 resources: + # Required GPU vRAM gpu: 24GB ``` @@ -280,7 +306,7 @@ If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TO For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`. -#### Default environment variables +#### System environment variables The following environment variables are available in any run and are passed by `dstack` by default: @@ -294,16 +320,19 @@ The following environment variables are available in any run and are passed by ` You can choose whether to use spot instances, on-demand instances, or any available type. -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service commands: - python3 -m http.server - +# The port of the service port: 8000 +# Use either spot or on-demand instances spot_policy: auto ``` @@ -315,16 +344,20 @@ The `spot_policy` accepts `spot`, `on-demand`, and `auto`. The default for servi By default, `dstack` provisions instances in all configured backends. However, you can specify the list of backends: -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service +# Commands of the service commands: - python3 -m http.server - +# The port of the service port: 8000 +# Use only listed backends backends: [aws, gcp] ``` @@ -334,16 +367,20 @@ backends: [aws, gcp] By default, `dstack` uses all configured regions. However, you can specify the list of regions: -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service +# Commands of the service commands: - python3 -m http.server - +# The port of the service port: 8000 +# Use only listed regions regions: [eu-west-1, eu-west-2] ``` @@ -354,16 +391,20 @@ regions: [eu-west-1, eu-west-2] Volumes allow you to persist data between runs. To attach a volume, simply specify its name using the `volumes` property and specify where to mount its contents: -
+
```yaml type: service +# The name is optional, if not specified, generated randomly +name: http-server-service +# Commands of the service commands: - python3 -m http.server - +# The port of the service port: 8000 +# Map the name of the volume to any path volumes: - name: my-new-volume path: /volume_data diff --git a/docs/docs/reference/dstack.yml/task.md b/docs/docs/reference/dstack.yml/task.md index 0e0653e5f..330bfff08 100644 --- a/docs/docs/reference/dstack.yml/task.md +++ b/docs/docs/reference/dstack.yml/task.md @@ -2,9 +2,9 @@ The `task` configuration type allows running [tasks](../../tasks.md). -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `serve.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be run via [`dstack run`](../cli/index.md#dstack-run). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `train.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples @@ -18,9 +18,13 @@ The `python` property determines which default Docker image is used. ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train -python: "3.11" +# If `image` is not specified, dstack uses its base image +python: "3.10" +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py @@ -28,10 +32,25 @@ commands:
-!!! info "nvcc" +??? info "nvcc" Note that the default Docker image doesn't bundle `nvcc`, which is required for building custom CUDA kernels. To install it, use `conda install cuda`. + + ```yaml + type: task + # The name is optional, if not specified, generated randomly + name: train + + python: "3.10" + + # Before other commands, install `nvcc` (via `conda install cuda`) + commands: + - conda install cuda + - pip install -r fine-tuning/qlora/requirements.txt + - python fine-tuning/qlora/train.py + ``` + ### Ports { #_ports } A task can configure ports. In this case, if the task is running an application on a port, `dstack run` @@ -41,14 +60,17 @@ will securely allow you to access this port from your local machine through port ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train -python: "3.11" +python: "3.10" +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - tensorboard --logdir results/runs & - python fine-tuning/qlora/train.py - +# Expose the port to access TensorBoard ports: - 6000 ``` @@ -65,9 +87,13 @@ When running it, `dstack run` forwards `6000` port to `localhost:6000`, enabling ```yaml type: dev-environment +# The name is optional, if not specified, generated randomly +name: train -image: dstackai/base:py3.11-0.4-cuda-12.1 +# Any custom Docker image +image: dstackai/base:py3.10-0.4-cuda-12.1 +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py @@ -80,12 +106,17 @@ commands: ```yaml type: dev-environment + # The name is optional, if not specified, generated randomly + name: train - image: dstackai/base:py3.11-0.4-cuda-12.1 + # Any private Docker image + image: dstackai/base:py3.10-0.4-cuda-12.1 + # Credentials of the private Docker registry registry_auth: username: peterschmidt85 password: ghp_e49HcZ9oYwBzUbcSk2080gXZOU2hiT9AeSR5 + # Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py @@ -100,7 +131,10 @@ range (e.g. `24GB..`, or `24GB..80GB`, or `..80GB`). ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py @@ -108,13 +142,11 @@ commands: resources: # 200GB or more RAM memory: 200GB.. - # 4 GPUs from 40GB to 80GB gpu: 40GB..80GB:4 - - # Shared memory + # Shared memory (required by multi-gpu) shm_size: 16GB - + # Disk size disk: 500GB ``` @@ -130,9 +162,12 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10 ```yaml type: task + # The name is optional, if not specified, generated randomly + name: train - python: "3.11" + python: "3.10" + # Commands of the task commands: - pip install torch~=2.3.0 torch_xla[tpu]~=2.3.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html - git clone --recursive https://github.com/pytorch/xla.git @@ -155,12 +190,14 @@ and their quantity. Examples: `A100` (one A100), `A10G,A100` (either A10G or A10 ```yaml type: task -python: "3.11" +python: "3.10" +# Environment variables env: - HUGGING_FACE_HUB_TOKEN - HF_HUB_ENABLE_HF_TRANSFER=1 +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py @@ -168,12 +205,12 @@ commands:
-If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), +> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), `dstack` will require the value to be passed via the CLI or set in the current process. For instance, you can define environment variables in a `.env` file and utilize tools like `direnv`. -##### Default environment variables +##### System environment variables The following environment variables are available in any run and are passed by `dstack` by default: @@ -186,7 +223,7 @@ The following environment variables are available in any run and are passed by ` | `DSTACK_NODE_RANK` | The rank of the node | | `DSTACK_MASTER_NODE_IP` | The internal IP address the master node | -### Distributed tasks { #_nodes } +### Distributed tasks By default, the task runs on a single node. However, you can run it on a cluster of nodes. @@ -194,13 +231,15 @@ By default, the task runs on a single node. However, you can run it on a cluster ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train-distrib # The size of the cluster nodes: 2 -python: "3.11" -env: - - HF_HUB_ENABLE_HF_TRANSFER=1 +python: "3.10" + +# Commands of the task commands: - pip install -r requirements.txt - torchrun @@ -220,41 +259,13 @@ resources: If you run the task, `dstack` first provisions the master node and then runs the other nodes of the cluster. All nodes are provisioned in the same region. -`dstack` is easy to use with `accelerate`, `torchrun`, and other distributed frameworks. All you need to do +> `dstack` is easy to use with `accelerate`, `torchrun`, and other distributed frameworks. All you need to do is pass the corresponding environment variables such as `DSTACK_GPUS_PER_NODE`, `DSTACK_NODE_RANK`, `DSTACK_NODES_NUM`, -`DSTACK_MASTER_NODE_IP`, and `DSTACK_GPUS_NUM` (see [System environment variables](#default-environment-variables)). +`DSTACK_MASTER_NODE_IP`, and `DSTACK_GPUS_NUM` (see [System environment variables](#system-environment-variables)). ??? info "Backends" - Running on multiple nodes is supported only with `aws`, `gcp`, `azure`, `oci`, and instances added via - [`dstack pool add-ssh`](../../fleets.md#__tabbed_1_2). - -### Arguments - -You can parameterize tasks with user arguments using `${{ run.args }}` in the configuration. - -
- -```yaml -type: task - -python: "3.11" - -commands: - - pip install -r fine-tuning/qlora/requirements.txt - - python fine-tuning/qlora/train.py ${{ run.args }} -``` - -
- -Now, you can pass your arguments to the `dstack run` command: - -
- -```shell -$ dstack run . -f train.dstack.yml --train_batch_size=1 --num_train_epochs=100 -``` - -
+ Running on multiple nodes is supported only with the `aws`, `gcp`, `azure`, `oci` backends, or + [on-prem fleets](../../fleets.md#__tabbed_1_2). ### Web applications @@ -264,13 +275,16 @@ Here's an example of using `ports` to run web apps with `tasks`. ```yaml type: task +# The name is optional, if not specified, generated randomly +name: streamlit-hello -python: "3.11" +python: "3.10" +# Commands of the task commands: - pip3 install streamlit - streamlit hello - +# Expose the port to access the web app ports: - 8501 @@ -286,11 +300,15 @@ You can choose whether to use spot instances, on-demand instances, or any availa ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py +# Use either spot or on-demand instances spot_policy: auto ``` @@ -298,6 +316,34 @@ spot_policy: auto The `spot_policy` accepts `spot`, `on-demand`, and `auto`. The default for tasks is `auto`. +### Queueing tasks { #queueing-tasks } + +By default, if `dstack apply` cannot find capacity, the task fails. + +To queue the task and wait for capacity, specify the [`retry`](#retry) +property: + +
+ +```yaml +type: task +# The name is optional, if not specified, generated randomly +name: train + +# Commands of the task +commands: + - pip install -r fine-tuning/qlora/requirements.txt + - python fine-tuning/qlora/train.py + +retry: + # Retry on no-capacity errors + on_events: [no-capacity] + # Retry within 1 day + duration: 1d +``` + +
+ ### Backends By default, `dstack` provisions instances in all configured backends. However, you can specify the list of backends: @@ -306,11 +352,15 @@ By default, `dstack` provisions instances in all configured backends. However, y ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py +# Use only listed backends backends: [aws, gcp] ``` @@ -324,11 +374,15 @@ By default, `dstack` uses all configured regions. However, you can specify the l ```yaml type: task +# The name is optional, if not specified, generated randomly +name: train +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py +# Use only listed regions regions: [eu-west-1, eu-west-2] ``` @@ -343,13 +397,17 @@ To attach a volume, simply specify its name using the `volumes` property and spe ```yaml type: task +# The name is optional, if not specified, generated randomly +name: vscode -python: "3.11" +python: "3.10" +# Commands of the task commands: - pip install -r fine-tuning/qlora/requirements.txt - python fine-tuning/qlora/train.py +# Map the name of the volume to any path volumes: - name: my-new-volume path: /volume_data @@ -375,6 +433,15 @@ The `task` configuration type supports many other options. See below. type: required: true +## `retry` + +#SCHEMA# dstack._internal.core.models.profiles.ProfileRetry + overrides: + show_root_heading: false + type: + required: true + item_id_prefix: retry- + ## `resources` #SCHEMA# dstack._internal.core.models.resources.ResourcesSpecSchema diff --git a/docs/docs/reference/dstack.yml/volume.md b/docs/docs/reference/dstack.yml/volume.md index 03351fb6e..26e75b8a8 100644 --- a/docs/docs/reference/dstack.yml/volume.md +++ b/docs/docs/reference/dstack.yml/volume.md @@ -2,35 +2,45 @@ The `volume` configuration type allows creating, registering, and updating volumes. -> Configuration files must have a name ending with `.dstack.yml` (e.g., `.dstack.yml` or `vol.dstack.yml` are both acceptable) -> and can be located in the project's root directory or any nested folder. -> Any configuration can be applied via [`dstack apply`](../cli/index.md#dstack-apply). +> Configuration files must be inside the project repo, and their names must end with `.dstack.yml` +> (e.g. `.dstack.yml` or `fleet.dstack.yml` are both acceptable). +> Any configuration can be run via [`dstack apply`](../cli/index.md#dstack-apply). ## Examples -### Creating a new volume { #create-volume } +### Creating a new volume { #new-volume }
```yaml type: volume -name: my-aws-volume +# The name of the volume +name: my-new-volume + +# Volumes are bound to a specific backend and region backend: aws region: eu-central-1 + +# The size of the volume size: 100GB ```
-### Registering an existing volume { #register-volume } +### Registering an existing volume { #existing-volume } -
+
```yaml type: volume -name: my-external-volume +# The name of the volume +name: my-existing-volume + +# Volumes are bound to a specific backend and region backend: aws region: eu-central-1 + +# The ID of the volume in AWS volume_id: vol1235 ``` diff --git a/docs/docs/services.md b/docs/docs/services.md index e755cd2c4..0c0120a4e 100644 --- a/docs/docs/services.md +++ b/docs/docs/services.md @@ -1,8 +1,10 @@ # Services -Services make it easy to deploy models and web applications as public, -secure, and scalable endpoints. They are provisioned behind a [gateway](concepts/gateways.md) that -automatically provides an HTTPS domain, handles authentication, distributes load, and performs auto-scaling. +A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure +dependencies, resources, authorizarion, auto-scaling rules, etc. + +Services are provisioned behind a [gateway](concepts/gateways.md) which provides an HTTPS endpoint mapped to your domain, +handles authentication, distributes load, and performs auto-scaling. ??? info "Gateways" If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service. @@ -10,32 +12,43 @@ automatically provides an HTTPS domain, handles authentication, distributes load If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, the gateway is already set up for you. -## Configuration +## Define a configuration -First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `serve.dstack.yml` +First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or +`serve.dstack.yml` are both acceptable).
```yaml type: service +# The name is optional, if not specified, generated randomly +name: llama31-service + +# If `image` is not specified, dstack uses its default image +python: "3.10" -python: "3.11" +# Required environment variables env: - - MODEL=NousResearch/Llama-2-7b-chat-hf + - HUGGING_FACE_HUB_TOKEN commands: - - pip install vllm - - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000 + - install vllm==0.5.3.post1 + - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 +# Expose the vllm server port port: 8000 +# Use either spot or on-demand instances +spot_policy: auto + resources: - gpu: 80GB + # Change to what is required + gpu: 24GB -# (Optional) Enable the OpenAI-compatible endpoint +# Comment if you don't to access the model via https://gateway. model: - format: openai type: chat - name: NousResearch/Llama-2-7b-chat-hf + name: meta-llama/Meta-Llama-3.1-8B-Instruct + format: openai ```
@@ -49,25 +62,26 @@ If you don't specify your Docker image, `dstack` uses the [base](https://hub.doc In this case, `dstack` auto-scales it based on the load. !!! info "Reference" - See the [.dstack.yml reference](reference/dstack.yml/service.md) - for all supported configuration options and examples. + See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by + services, along with multiple examples. -## Running +## Run a service -To run a configuration, use the [`dstack run`](reference/cli/index.md#dstack-run) command followed by the working directory path, -configuration file path, and any other options. +To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-apply) command.
```shell +$ HUGGING_FACE_HUB_TOKEN=... + $ dstack run . -f serve.dstack.yml - BACKEND REGION RESOURCES SPOT PRICE - tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595 - azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673 + # BACKEND REGION RESOURCES SPOT PRICE + 1 runpod CA-MTL-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22 + 2 runpod EU-SE-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22 + 3 gcp us-west4 27xCPU, 150GB, A5000:24GB:3 yes $0.33 -Continue? [y/n]: y +Submit the run llama31-service? [y/n]: y Provisioning... ---> 100% @@ -77,31 +91,14 @@ Service is published at https://yellow-cat-1.example.com
-When deploying the service, `dstack run` mounts the current folder's contents. - -[//]: # (TODO: Fleets and idle duration) - -??? info ".gitignore" - If there are large files or folders you'd like to avoid uploading, - you can list them in `.gitignore`. - -??? info "Fleets" - By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md). - If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends. - - To have the fleet deleted after a certain idle time automatically, set - [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time). - By default, it's set to `5min`. - -!!! info "Reference" - See the [CLI reference](reference/cli/index.md#dstack-run) for more details - on how `dstack run` works. +`dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes. +To avoid uploading large files, ensure they are listed in `.gitignore`. -## Service endpoint +## Access the endpoint One the service is up, its endpoint is accessible at `https://.`. -By default, the service endpoint requires the `Authorization` header with `Bearer `. +By default, the service endpoint requires the `Authorization` header with `Bearer `.
@@ -110,7 +107,7 @@ $ curl https://yellow-cat-1.example.com/v1/chat/completions \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer <dstack token>' \ -d '{ - "model": "NousResearch/Llama-2-7b-chat-hf", + "model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "messages": [ { "role": "user", @@ -122,27 +119,50 @@ $ curl https://yellow-cat-1.example.com/v1/chat/completions \
-Authorization can be disabled by setting `auth` to `false` in the service configuration file. +Authorization can be disabled by setting [`auth`](reference/dstack.yml/service.md#authorization) to `false` in the +service configuration file. + +### Gateway endpoint -### Model endpoint +In case the service has the [model mapping](reference/dstack.yml/service.md#model-mapping) configured, you will also be +able to access the model at `https://gateway.` via the OpenAI-compatible interface. -In case the service has the [model mapping](reference/dstack.yml/service.md#model-mapping) configured, you will also be able -to access the model at `https://gateway.` via the OpenAI-compatible interface. +## Manage runs -## Managing runs +### List runs -### Listing runs +The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running jobs and their statuses. +Use `--watch` (or `-w`) to monitor the live status of runs. -The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running runs and their status. +### Stop a run -### Stopping runs +Once the run exceeds the [`max_duration`](reference/dstack.yml/task.md#max_duration), or when you use [`dstack stop`](reference/cli/index.md#dstack-stop), +the dev environment is stopped. Use `--abort` or `-x` to stop the run abruptly. -When you use [`dstack stop`](reference/cli/index.md#dstack-stop), the service and its cloud resources are deleted. +[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`) + +## Manage fleets + +By default, `dstack apply` reuses `idle` instances from one of the existing [fleets](fleets.md), +or creates a new fleet through backends. + +!!! info "Idle duration" + To ensure the created fleets are deleted automatically, set + [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time). + By default, it's set to `5min`. + +!!! info "Creation policy" + To ensure `dstack apply` always reuses an existing fleet and doesn't create a new one, + pass `--reuse` to `dstack apply` (or set [`creation_policy`](reference/dstack.yml/task.md#creation_policy) to `reuse` in the task configuration). + The default policy is `reuse_or_create`. ## What's next? -1. Check the [Text Generation Inference :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/tgi/README.md){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/vllm/README.md){:target="_blank"} examples -2. Check the [`.dstack.yml` reference](reference/dstack.yml/service.md) for more details and examples -3. See [gateways](concepts/gateways.md) on how to set up a gateway -4. Browse [examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"} -5. See [fleets](fleets.md) on how to manage fleets \ No newline at end of file +1. Check the [TGI :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/tgi/README.md){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/deployment/vllm/README.md){:target="_blank"} examples +2. See [gateways](concepts/gateways.md) on how to set up a gateway +3. Browse [examples](/docs/examples) +4. See [fleets](fleets.md) on how to manage fleets + +!!! info "Reference" + See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by + services, along with multiple examples. \ No newline at end of file diff --git a/docs/docs/tasks.md b/docs/docs/tasks.md index acae1b7bc..330834093 100644 --- a/docs/docs/tasks.md +++ b/docs/docs/tasks.md @@ -1,34 +1,42 @@ # Tasks -Tasks allow for convenient scheduling of various batch jobs, such as training, fine-tuning, or -data processing. They can also be used to run web applications -when features offered by [services](services.md) are not needed, such as for debugging. +A task allows you to schedule a job or run a web app. It lets you configure dependencies, resources, ports, and more. +Tasks can be distributed and run on clusters. -You can run tasks on a single machine or on a cluster of nodes. +Tasks are ideal for training and fine-tuning jobs. They can also be used instead of services if you want to run a web +app but don't need a public endpoint. -## Configuration +## Define a configuration First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or `train.dstack.yml` are both acceptable). -
+[//]: # (TODO: Make tabs - single machine & distributed tasks & web app) + +
```yaml type: task +# The name is optional, if not specified, generated randomly +name: axolotl-train + +# Using the official Axolotl's Docker image +image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1 -python: "3.11" +# Required environment variables env: - - HF_HUB_ENABLE_HF_TRANSFER=1 + - HUGGING_FACE_HUB_TOKEN + - WANDB_API_KEY +# Commands of the task commands: - - pip install -r fine-tuning/qlora/requirements.txt - - tensorboard --logdir results/runs & - - python fine-tuning/qlora/train.py -ports: - - 6000 + - accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml -# (Optional) Configure `gpu`, `memory`, `disk`, etc resources: - gpu: 80GB + gpu: + # 24GB or more vRAM + memory: 24GB.. + # Two or more GPU + count: 2.. ```
@@ -36,84 +44,91 @@ resources: If you don't specify your Docker image, `dstack` uses the [base](https://hub.docker.com/r/dstackai/base/tags) image (pre-configured with Python, Conda, and essential CUDA drivers). - !!! info "Distributed tasks" By default, tasks run on a single instance. However, you can specify - the [number of nodes](reference/dstack.yml/task.md#_nodes). - In this case, `dstack` provisions a cluster of instances. + the [number of nodes](reference/dstack.yml/task.md#distributed-tasks). + In this case, the task will run a cluster of instances. !!! info "Reference" - See the [.dstack.yml reference](reference/dstack.yml/task.md) - for all supported configuration options and examples. + See [.dstack.yml](reference/dstack.yml/task.md) for all the options supported by + tasks, along with multiple examples. -## Running +## Run a configuration -To run a configuration, use the [`dstack run`](reference/cli/index.md#dstack-run) command followed by the working directory path, -configuration file path, and other options. +To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-apply) command.
```shell -$ dstack run . -f train.dstack.yml +$ HUGGING_FACE_HUB_TOKEN=... +$ WANDB_API_KEY=... - BACKEND REGION RESOURCES SPOT PRICE - tensordock unitedkingdom 10xCPU, 80GB, 1xA100 (80GB) no $1.595 - azure westus3 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - azure westus2 24xCPU, 220GB, 1xA100 (80GB) no $3.673 - -Continue? [y/n]: y +$ dstack apply -f examples/.dstack.yml -Provisioning... ----> 100% + # BACKEND REGION RESOURCES SPOT PRICE + 1 runpod CA-MTL-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22 + 2 runpod EU-SE-1 18xCPU, 100GB, A5000:24GB:2 yes $0.22 + 3 gcp us-west4 27xCPU, 150GB, A5000:24GB:3 yes $0.33 -TensorBoard 2.13.0 at http://localhost:6006/ (Press CTRL+C to quit) +Submit the run axolotl-train? [y/n]: y -Epoch 0: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] -Epoch 1: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] -Epoch 2: 100% 1719/1719 [00:18<00:00, 92.32it/s, loss=0.0981, acc=0.969] +Launching `axolotl-train`... +---> 100% + +{'loss': 1.4967, 'grad_norm': 1.2734375, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0} + 0% 1/24680 [00:13<95:34:17, 13.94s/it] + 6% 73/1300 [00:48<13:57, 1.47it/s] ```
-If the task specifies `ports`, `dstack run` automatically forwards them to your local machine for -convenient and secure access. +`dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes. +To avoid uploading large files, ensure they are listed in `.gitignore`. -When running the task, `dstack run` mounts the current folder's contents. +!!! info "Ports" + If the task specifies [`ports`](reference/dstack.yml/task.md#_ports), `dstack run` automatically forwards them to your + local machine for convenient and secure access. -[//]: # (TODO: Fleets and idle duration) +!!! info "Queueing tasks" + By default, if `dstack apply` cannot find capacity, the task fails. + To queue the task and wait for capacity, specify the [`retry`](reference/dstack.yml/task.md#queueing-tasks) + property in the task configuration. -??? info ".gitignore" - If there are large files or folders you'd like to avoid uploading, - you can list them in `.gitignore`. +## Manage runs -??? info "Fleets" - By default, `dstack run` reuses `idle` instances from one of the existing [fleets](fleets.md). - If no `idle` instances meet the requirements, it creates a new fleet using one of the configured backends. +### List runs - To have the fleet deleted after a certain idle time automatically, set - [`termination_idle_time`](../reference/dstack.yml/fleet.md#termination_idle_time). - By default, it's set to `5min`. +The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running jobs and their statuses. +Use `--watch` (or `-w`) to monitor the live status of runs. -!!! info "Reference" - See the [CLI reference](reference/cli/index.md#dstack-run) for more details - on how `dstack run` works. +### Stop a run -## Managing runs +Once the run exceeds the [`max_duration`](reference/dstack.yml/task.md#max_duration), or when you use [`dstack stop`](reference/cli/index.md#dstack-stop), +the dev environment is stopped. Use `--abort` or `-x` to stop the run abruptly. -### Listing runs +[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`) -The [`dstack ps`](reference/cli/index.md#dstack-ps) command lists all running runs and their status. +## Manage fleets -### Stopping runs +By default, `dstack apply` reuses `idle` instances from one of the existing [fleets](fleets.md), +or creates a new fleet through backends. -Once you use [`dstack stop`](reference/cli/index.md#dstack-stop) (or when the run exceeds the -`max_duration`), the instances return to the [fleet](fleets.md). +!!! info "Idle duration" + To ensure the created fleets are deleted automatically, set + [`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time). + By default, it's set to `5min`. -[//]: # (TODO: Mention `dstack logs` and `dstack logs -d`) +!!! info "Creation policy" + To ensure `dstack apply` always reuses an existing fleet and doesn't create a new one, + pass `--reuse` to `dstack apply` (or set [`creation_policy`](reference/dstack.yml/task.md#creation_policy) to `reuse` in the task configuration). + The default policy is `reuse_or_create`. ## What's next? -1. Check the [QLoRA :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/qlora/README.md){:target="_blank"} example -2. Check the [`.dstack.yml` reference](../reference/dstack.yml/task.md) for more details and examples -3. Browse [all examples :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/examples){:target="_blank"} -4. See [fleets](fleets.md) on how to manage fleets \ No newline at end of file +1. Check the [Axolotl](/docs/examples/fine-tuning/axolotl) example +2. Browse [all examples](/docs/examples) +3. See [fleets](fleets.md) on how to manage fleets + +!!! info "Reference" + See [.dstack.yml](reference/dstack.yml/task.md) for all the options supported by + tasks, along with multiple examples. diff --git a/docs/overrides/home.html b/docs/overrides/home.html index d65ddd63c..89083a996 100644 --- a/docs/overrides/home.html +++ b/docs/overrides/home.html @@ -112,8 +112,8 @@

AI container orchestration engine for everyone

- dstack is an open-source orchestration engine that simplifies developing, training, and deploying AI - models, as well as managing clusters on any cloud or data center. + dstack is a lightweight alternative to Kubernetes for AI. It simplifies container orchestration for + AI on any cloud or on-premises, accelerating the development, training, and deployment.

@@ -228,10 +228,11 @@

Dev environments

Tasks

-

Tasks allow for convenient scheduling of various batch jobs, such as training, fine-tuning, or - data processing, as well as running web applications.

+

A task allows you to schedule a job or run a web app. It lets you configure dependencies, + resources, ports, and more. Tasks can be distributed and run on clusters.

-

You can run tasks on a single machine or on a cluster of nodes.

+

Tasks are ideal for training and fine-tuning jobs or running apps + for development purposes.

Tasks

Services

- Services make it very easy to deploy any kind of model as public, - secure, and scalable endpoints. + A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure + dependencies, resources, authorizarion, auto-scaling rules, etc.

@@ -343,9 +344,9 @@

- -

+

Axolotl

diff --git a/examples/.dstack.yml b/examples/.dstack.yml index 143c04aec..9a09e8641 100644 --- a/examples/.dstack.yml +++ b/examples/.dstack.yml @@ -1,10 +1,16 @@ type: dev-environment +# The name is optional, if not specified, generated randomly name: vscode -# This configuration launches a blank dev environment - python: "3.11" +# Uncomment to use a custom Docker image +#image: dstackai/base:py3.10-0.4-cuda-12.1 ide: vscode +# Use either spot or on-demand instances spot_policy: auto + +# Uncomment to request resources +#resources: +# gpu: 24GB \ No newline at end of file diff --git a/examples/README.md b/examples/README.md index 7bffa9914..23accbb3f 100644 --- a/examples/README.md +++ b/examples/README.md @@ -10,7 +10,9 @@ cd dstack dstack init ``` -Now you are ready to run examples! Select any example from the left-hand sidebar. +Now you are ready to run examples! + +> Browse the examples using the menu on the left. ## Source code diff --git a/examples/fine-tuning/alignment-handbook/.dstack.yml b/examples/fine-tuning/alignment-handbook/.dstack.yml index 5d121d1ef..fc97d6b96 100644 --- a/examples/fine-tuning/alignment-handbook/.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/.dstack.yml @@ -1,4 +1,5 @@ type: dev-environment +# The name is optional, if not specified, generated randomly name: ah-vscode # If `image` is not specified, dstack uses its default image @@ -25,5 +26,8 @@ ide: vscode spot_policy: auto resources: - # Minimum 24GB, one or more GPU - gpu: 24GB..:1.. \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/fine-tuning/alignment-handbook/config.yaml b/examples/fine-tuning/alignment-handbook/config.yaml index fee2964b4..330c43c10 100644 --- a/examples/fine-tuning/alignment-handbook/config.yaml +++ b/examples/fine-tuning/alignment-handbook/config.yaml @@ -40,7 +40,7 @@ gradient_accumulation_steps: 2 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false -hub_model_id: chansung/coding_llamaduo_60k_v0.2 +hub_model_id: peterschmidt85/coding_llamaduo_60k_v0.2 hub_strategy: every_save learning_rate: 2.0e-04 log_level: info diff --git a/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml index 0fa773f16..10c049017 100644 --- a/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/fleet-distrib.dstack.yml @@ -2,16 +2,19 @@ type: fleet # The name is optional, if not specified, generated randomly name: ah-fleet-distrib +# Number of instances in fleet +nodes: 2 +# Ensure instances are interconnected +placement: cluster + # Use either spot or on-demand instances spot_policy: auto -# Terminate the instance if not used for one hour +# Terminate instances if not used for one hour termination_idle_time: 1h resources: - # Change to what is required - gpu: 24GB - -# Specify a number of instances -nodes: 2 -# Ensure instances are interconnected -placement: cluster \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/fine-tuning/alignment-handbook/fleet.dstack.yml b/examples/fine-tuning/alignment-handbook/fleet.dstack.yml index 2388cc745..d8ae8872d 100644 --- a/examples/fine-tuning/alignment-handbook/fleet.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/fleet.dstack.yml @@ -2,14 +2,17 @@ type: fleet # The name is optional, if not specified, generated randomly name: ah-fleet +# Number of instances in fleet +nodes: 1 + # Use either spot or on-demand instances spot_policy: auto # Terminate the instance if not used for one hour termination_idle_time: 1h resources: - # Change to what is required - gpu: 24GB - -# Need one instance only -nodes: 1 \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml index cf09c0290..b33902a5d 100644 --- a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml @@ -25,17 +25,20 @@ commands: --machine_rank=$DSTACK_NODE_RANK --num_processes=$DSTACK_GPUS_NUM --num_machines=$DSTACK_NODES_NUM - scripts/run_sft.py + scripts/run_sft.py ../examples/fine-tuning/alignment-handbook/config.yaml # Expose 6006 to access TensorBoard ports: - 6006 -# The number of interconnected instances required +# Number of instances in cluster nodes: 2 resources: - # Required resources - gpu: 24GB - # Shared memory size for inter-process communication + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. + # Shared memory (for multi-gpu) shm_size: 24GB \ No newline at end of file diff --git a/examples/fine-tuning/alignment-handbook/train.dstack.yml b/examples/fine-tuning/alignment-handbook/train.dstack.yml index fc57a2adc..a52a3b08f 100644 --- a/examples/fine-tuning/alignment-handbook/train.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/train.dstack.yml @@ -28,5 +28,8 @@ commands: # - 6006 resources: - # Required resources - gpu: 24GB \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/fine-tuning/axolotl/.dstack.yml b/examples/fine-tuning/axolotl/.dstack.yml index f419d6ac8..4b9096cfa 100644 --- a/examples/fine-tuning/axolotl/.dstack.yml +++ b/examples/fine-tuning/axolotl/.dstack.yml @@ -1,4 +1,5 @@ type: dev-environment +# The name is optional, if not specified, generated randomly name: axolotl-vscode # Using the official Axolotl's Docker image @@ -15,5 +16,8 @@ ide: vscode spot_policy: auto resources: - # Two or more 24GB GPUs (required by FSDP) - gpu: 24GB:2.. \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # Two or more GPU + count: 2.. \ No newline at end of file diff --git a/examples/fine-tuning/axolotl/README.md b/examples/fine-tuning/axolotl/README.md index 85dc7d77a..c7ffd762b 100644 --- a/examples/fine-tuning/axolotl/README.md +++ b/examples/fine-tuning/axolotl/README.md @@ -47,13 +47,13 @@ env: # Commands of the task commands: - accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml -# Expose 6006 to access TensorBoard -ports: - - 6006 resources: - # Two or more 24GB GPUs (required by FSDP) - gpu: 24GB:2.. + gpu: + # 24GB or more vRAM + memory: 24GB.. + # Two or more GPU + count: 2.. ``` The task uses Axolotl's Docker image, where Axolotl is already pre-installed. @@ -67,9 +67,6 @@ WANDB_API_KEY=... dstack apply -f examples/fine-tuning/axolotl/train.dstack.yml ``` -If you list `tensorbord` via `report_to` in [`examples/fine-tuning/axolotl/config.yaml`](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/config.yaml), -you'll be able to access experiment metrics via `http://localhost:6006` (while the task is running). - ## Fleets > By default, `dstack run` reuses `idle` instances from one of the existing [fleets](https://dstack.ai/docs/fleets). diff --git a/examples/fine-tuning/axolotl/config.yaml b/examples/fine-tuning/axolotl/config.yaml index 5087bc434..7f3c08745 100644 --- a/examples/fine-tuning/axolotl/config.yaml +++ b/examples/fine-tuning/axolotl/config.yaml @@ -79,4 +79,4 @@ fsdp_config: special_tokens: pad_token: <|end_of_text|> -hub_model_id: chansung/axolotl_llama3_8b_fsdp_qlora \ No newline at end of file +hub_model_id: peterschmidt85/axolotl_llama3_8b_fsdp_qlora \ No newline at end of file diff --git a/examples/fine-tuning/axolotl/fleet.dstack.yml b/examples/fine-tuning/axolotl/fleet.dstack.yml index b3aefe6a9..0a10d67e1 100644 --- a/examples/fine-tuning/axolotl/fleet.dstack.yml +++ b/examples/fine-tuning/axolotl/fleet.dstack.yml @@ -2,14 +2,16 @@ type: fleet # The name is optional, if not specified, generated randomly name: axolotl-fleet +# Number of instances in fleet +nodes: 1 + # Use either spot or on-demand instances spot_policy: auto # Terminate the instance if not used for one hour termination_idle_time: 1h resources: - # Two or more 24GB GPUs (required by FSDP) - gpu: 24GB:2.. - -# Need one instance only -nodes: 1 \ No newline at end of file + # 24GB or more vRAM + memory: 24GB.. + # Two or more GPU (required by FSDP) + count: 2.. \ No newline at end of file diff --git a/examples/fine-tuning/axolotl/train.dstack.yaml b/examples/fine-tuning/axolotl/train.dstack.yaml index 9accbe5fc..b81c5fc8c 100644 --- a/examples/fine-tuning/axolotl/train.dstack.yaml +++ b/examples/fine-tuning/axolotl/train.dstack.yaml @@ -12,10 +12,10 @@ env: # Commands of the task commands: - accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml -# Uncomment to access TensorBoard -#ports: -# - 6006 resources: - # Two or more 24GB GPUs (required by FSDP) - gpu: 24GB:2.. \ No newline at end of file + gpu: + # 24GB or more vRAM + memory: 24GB.. + # Two or more GPU (required by FSDP) + count: 2.. \ No newline at end of file diff --git a/examples/fine-tuning/qlora/train.dstack.yml b/examples/fine-tuning/qlora/train.dstack.yml index 029c472de..a51bb1ff0 100644 --- a/examples/fine-tuning/qlora/train.dstack.yml +++ b/examples/fine-tuning/qlora/train.dstack.yml @@ -1,5 +1,4 @@ type: task -# This task fine-tunes Llama 2 with QLoRA. Learn more at https://dstack.ai/examples/qlora/ python: "3.11" diff --git a/examples/fine-tuning/trl/.dstack.yml b/examples/fine-tuning/trl/.dstack.yml new file mode 100644 index 000000000..13685d624 --- /dev/null +++ b/examples/fine-tuning/trl/.dstack.yml @@ -0,0 +1,35 @@ +type: dev-environment +# The name is optional, if not specified, generated randomly +name: trl-vscode + +# If `image` is not specified, dstack uses its default image +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN + - ACCELERATE_LOG_LEVEL=info + - WANDB_API_KEY +# Uncomment if you want the environment to be pre-installed +#init: +# - conda install cuda +# - pip install flash-attn --no-build-isolation +# - pip install "transformers>=4.43.2" +# - pip install bitsandbytes +# - pip install peft +# - pip install wandb +# - git clone https://github.com/huggingface/trl +# - cd trl +# - pip install . + +ide: vscode + +# Use either spot or on-demand instances +spot_policy: auto + +resources: + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/fine-tuning/trl/train-distrib.dstack.yml b/examples/fine-tuning/trl/train-distrib.dstack.yml new file mode 100644 index 000000000..f17d42997 --- /dev/null +++ b/examples/fine-tuning/trl/train-distrib.dstack.yml @@ -0,0 +1,62 @@ +type: task +# The name is optional, if not specified, generated randomly +name: trl-train-distrib + +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN + - ACCELERATE_LOG_LEVEL=info + - WANDB_API_KEY +# Commands of the task +commands: + - conda install cuda + - pip install "transformers>=4.43.2" + - pip install bitsandbytes + - pip install flash-attn --no-build-isolation + - pip install peft + - pip install wandb + - git clone https://github.com/huggingface/trl + - cd trl + - pip install . + - accelerate launch + --config_file=examples/accelerate_configs/fsdp_qlora.yaml + --main_process_ip=$DSTACK_MASTER_NODE_IP + --main_process_port=8008 + --machine_rank=$DSTACK_NODE_RANK + --num_processes=$DSTACK_GPUS_NUM + --num_machines=$DSTACK_NODES_NUM + examples/scripts/sft.py + --model_name meta-llama/Meta-Llama-3.1-8B + --dataset_name OpenAssistant/oasst_top1_2023-08-25 + --dataset_text_field="text" + --per_device_train_batch_size 1 + --per_device_eval_batch_size 1 + --gradient_accumulation_steps 4 + --learning_rate 2e-4 + --report_to wandb + --bf16 + --max_seq_length 1024 + --lora_r 16 --lora_alpha 32 + --lora_target_modules q_proj k_proj v_proj o_proj + --load_in_4bit + --use_peft + --attn_implementation "flash_attention_2" + --logging_steps=10 + --output_dir models/llama31 + --hub_model_id peterschmidt85/FineLlama-3.1-8B + --torch_dtype bfloat16 + --use_bnb_nested_quant + +# Size of the cluster +nodes: 2 + +resources: + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. + # Shared memory (for multi-gpu) + shm_size: 24GB \ No newline at end of file diff --git a/examples/fine-tuning/trl/train.dstack.yml b/examples/fine-tuning/trl/train.dstack.yml new file mode 100644 index 000000000..f965654ac --- /dev/null +++ b/examples/fine-tuning/trl/train.dstack.yml @@ -0,0 +1,51 @@ +type: task +# The name is optional, if not specified, generated randomly +name: trl-train + +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN + - ACCELERATE_LOG_LEVEL=info + - WANDB_API_KEY +# Commands of the task +commands: + - conda install cuda + - pip install "transformers>=4.43.2" + - pip install bitsandbytes + - pip install flash-attn --no-build-isolation + - pip install peft + - pip install wandb + - git clone https://github.com/huggingface/trl + - cd trl + - pip install . + - accelerate launch + --config_file=examples/accelerate_configs/multi_gpu.yaml + --num_processes $DSTACK_GPUS_PER_NODE + examples/scripts/sft.py + --model_name meta-llama/Meta-Llama-3.1-8B + --dataset_name OpenAssistant/oasst_top1_2023-08-25 + --dataset_text_field="text" + --per_device_train_batch_size 1 + --per_device_eval_batch_size 1 + --gradient_accumulation_steps 4 + --learning_rate 2e-4 + --report_to wandb + --bf16 + --max_seq_length 1024 + --lora_r 16 --lora_alpha 32 + --lora_target_modules q_proj k_proj v_proj o_proj + --load_in_4bit + --use_peft + --attn_implementation "flash_attention_2" + --logging_steps=10 + --output_dir models/llama31 + --hub_model_id peterschmidt85/FineLlama-3.1-8B + +resources: + gpu: + # 24GB or more vRAM + memory: 24GB.. + # One or more GPU + count: 1.. \ No newline at end of file diff --git a/examples/llms/llama31/.dstack.yml b/examples/llms/llama31/.dstack.yml new file mode 100644 index 000000000..b9782c82a --- /dev/null +++ b/examples/llms/llama31/.dstack.yml @@ -0,0 +1,20 @@ +type: dev-environment +# The name is optional, if not specified, generated randomly +name: llama31-vscode + +# If `image` is not specified, dstack uses its default image +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN +ide: vscode + +# Use either spot or on-demand instances +spot_policy: auto +# Uncomment to ensure it doesn't create a new fleet +#creation_policy: reuse + +resources: + # Required resources + gpu: 24GB diff --git a/examples/llms/llama31/fleet.dstack.yml b/examples/llms/llama31/fleet.dstack.yml new file mode 100644 index 000000000..51136e5cd --- /dev/null +++ b/examples/llms/llama31/fleet.dstack.yml @@ -0,0 +1,15 @@ +type: fleet +# The name is optional, if not specified, generated randomly +name: llama31-fleet + +# Need one instance only +nodes: 1 + +# Use either spot or on-demand instances +spot_policy: auto +# Terminate the instance if not used for one hour +termination_idle_time: 1h + +resources: + # Required resources + gpu: 24GB diff --git a/examples/llms/llama31/service.dstack.yml b/examples/llms/llama31/service.dstack.yml new file mode 100644 index 000000000..400d02379 --- /dev/null +++ b/examples/llms/llama31/service.dstack.yml @@ -0,0 +1,30 @@ +type: service +# The name is optional, if not specified, generated randomly +name: llama31-service + +# If `image` is not specified, dstack uses its base image +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN +commands: + - install vllm==0.5.3.post1 + - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 +# Expose the vllm server port +port: 8000 + +# Use either spot or on-demand instances +spot_policy: auto +# Uncomment to ensure it doesn't create a new fleet +#creation_policy: reuse + +resources: + # Change to what is required + gpu: 24GB + +# Comment if you don't want to access the model via https://gateway. +model: + type: chat + name: meta-llama/Meta-Llama-3.1-8B-Instruct + format: openai \ No newline at end of file diff --git a/examples/llms/llama31/task.dstack.yml b/examples/llms/llama31/task.dstack.yml new file mode 100644 index 000000000..1a8516927 --- /dev/null +++ b/examples/llms/llama31/task.dstack.yml @@ -0,0 +1,24 @@ +type: task +name: llama31-task + +# If `image` is not specified, dstack uses its default image +python: "3.10" + +# Required environment variables +env: + - HUGGING_FACE_HUB_TOKEN +commands: + - pip install vllm==0.5.3.post1 + - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 +# Expose the vllm server port +ports: + - 8000 + +# Use either spot or on-demand instances +spot_policy: auto +# Uncomment to ensure it doesn't create a new fleet +#creation_policy: reuse + +resources: + # Required resources + gpu: 24GB \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index f07d87d0e..a75e8240d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -206,6 +206,7 @@ nav: - Volumes: docs/concepts/volumes.md - Guides: - Protips: docs/guides/protips.md + - dstack Sky: docs/guides/dstack-sky.md - Examples: docs/examples - Reference: - server/config.yml: docs/reference/server/config.yml.md