Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Backends can be set up in `~/.dstack/server/config.yml` or through the [project

For more details, see [Backends](https://dstack.ai/docs/concepts/backends).

> When using `dstack` with on-prem servers, backend configuration isn’t required. Simply create [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh) once the server is up.
> When using `dstack` with on-prem servers, backend configuration isn’t required. Simply create [SSH fleets](https://dstack.ai/docs/concepts/fleets#ssh-fleets) once the server is up.

##### Start the server

Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/amd-on-tensorwave.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ to orchestrate AI containers with any AI cloud vendor, whether they provide on-d

In this tutorial, we’ll walk you through how `dstack` can be used with
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
[SSH fleets](../../docs/concepts/fleets.md#ssh).
[SSH fleets](../../docs/concepts/fleets.md#ssh-fleets).

<img src="https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png" width="630"/>

Expand Down Expand Up @@ -235,6 +235,6 @@ Want to see how it works? Check out the video below:
<iframe width="750" height="520" src="https://www.youtube.com/embed/b1vAgm5fCfE?si=qw2gYHkMjERohdad&rel=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

!!! info "What's next?"
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets)
2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
2 changes: 1 addition & 1 deletion docs/blog/posts/benchmark-amd-containers-and-partitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ The full, reproducible steps are available in our GitHub repository. Below is a

#### Creating a fleet

We first defined a `dstack` [SSH fleet](../../docs/concepts/fleets.md#ssh) to manage the two-node cluster.
We first defined a `dstack` [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) to manage the two-node cluster.

```yaml
type: fleet
Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/gh200-on-lambda.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ categories:
# Supporting ARM and NVIDIA GH200 on Lambda

The latest update to `dstack` introduces support for NVIDIA GH200 instances on [Lambda](../../docs/concepts/backends.md#lambda)
and enables ARM-powered hosts, including GH200 and GB200, with [SSH fleets](../../docs/concepts/fleets.md#ssh).
and enables ARM-powered hosts, including GH200 and GB200, with [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets).

<img src="https://dstack.ai/static-assets/static-assets/images/dstack-arm--gh200-lambda-min.png" width="630"/>

Expand Down Expand Up @@ -78,7 +78,7 @@ $ dstack apply -f .dstack.yml
!!! info "Retry policy"
Note, if GH200s are not available at the moment, you can specify the [retry policy](../../docs/concepts/dev-environments.md#retry-policy) in your run configuration so that `dstack` can run the configuration once the GPU becomes available.

> If you have GH200 or GB200-powered hosts already provisioned via Lambda, another cloud provider, or on-prem, you can now use them with [SSH fleets](../../docs/concepts/fleets.md#ssh).
> If you have GH200 or GB200-powered hosts already provisioned via Lambda, another cloud provider, or on-prem, you can now use them with [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets).

!!! info "What's next?"
1. Sign up with [Lambda :material-arrow-top-right-thin:{ .external }](https://cloud.lambda.ai/sign-up?_gl=1*1qovk06*_gcl_au*MTg2MDc3OTAyOS4xNzQyOTA3Nzc0LjE3NDkwNTYzNTYuMTc0NTQxOTE2MS4xNzQ1NDE5MTYw*_ga*MTE2NDM5MzI0My4xNzQyOTA3Nzc0*_ga_43EZT1FM6Q*czE3NDY3MTczOTYkbzM0JGcxJHQxNzQ2NzE4MDU2JGo1NyRsMCRoMTU0Mzg1NTU1OQ..){:target="_blank"}
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/gpu-health-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ For active checks today, you can run [NCCL tests](../../examples/clusters/nccl-t

## Supported backends

Passive GPU health checks work on AWS (except with custom `os_images`), Azure (except A10 GPUs), GCP, OCI, and [SSH fleets](../../docs/concepts/fleets.md#ssh) where DCGM is installed and configured for background checks.
Passive GPU health checks work on AWS (except with custom `os_images`), Azure (except A10 GPUs), GCP, OCI, and [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets) where DCGM is installed and configured for background checks.

> Fleets created before version 0.19.22 need to be recreated to enable this feature.

Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/instance-volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ resources:

<!-- more -->

> Instance volumes work with both [SSH fleets](../../docs/concepts/fleets.md#ssh)
> and [cloud fleets](../../docs/concepts/fleets.md#cloud), and it is possible to mount any folders on the instance,
> Instance volumes work with both [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets)
> and [cloud fleets](../../docs/concepts/fleets.md#backend-fleets), and it is possible to mount any folders on the instance,
> whether they are regular folders or NFS share mounts.

The configuration above mounts `/root/.dstack/cache` on the instance to `/root/.cache` inside container.
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/intel-gaudi.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ machines equipped with Intel Gaudi accelerators.
## Create a fleet

To manage container workloads on on-prem machines with Intel Gaudi accelerators, start by configuring an
[SSH fleet](../../docs/concepts/fleets.md#ssh). Here’s an example configuration for your fleet:
[SSH fleet](../../docs/concepts/fleets.md#ssh-fleets). Here’s an example configuration for your fleet:

<div editor-title="examples/misc/fleets/gaudi.dstack.yml">

Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/kubernetes-beta.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ VM-based backends also offer more granular control over cluster provisioning.

### SSH fleets vs Kubernetes backend

If you’re using on-prem servers and Kubernetes isn’t a requirement, [SSH fleets](../../docs/concepts/fleets.md#ssh) may be simpler.
If you’re using on-prem servers and Kubernetes isn’t a requirement, [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets) may be simpler.
They provide a lightweight and flexible alternative.

### AMD GPUs
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/nebius.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ $ dstack apply -f .dstack.yml
The new `nebius` backend supports CPU and GPU instances, [fleets](../../docs/concepts/fleets.md),
[distributed tasks](../../docs/concepts/tasks.md#distributed-tasks), and more.

> Support for [network volumes](../../docs/concepts/volumes.md#network) and accelerated cluster
> Support for [network volumes](../../docs/concepts/volumes.md#network-volumes) and accelerated cluster
interconnects is coming soon.

!!! info "What's next?"
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ For a full list of available metrics and labels, check out [Metrics](../../docs/

??? info "NVIDIA"
NVIDIA DCGM metrics are automatically collected for `aws`, `azure`, `gcp`, and `oci` backends,
as well as for [SSH fleets](../../docs/concepts/fleets.md#ssh).
as well as for [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets).

To ensure NVIDIA DCGM metrics are collected from SSH fleets, ensure the `datacenter-gpu-manager-4-core`,
`datacenter-gpu-manager-4-proprietary`, and `datacenter-gpu-manager-exporter` packages are installed on the hosts.
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/concepts/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -1157,12 +1157,12 @@ Also, the `vastai` backend supports on-demand instances only. Spot instance supp
## On-prem

In on-prem environments, the [Kubernetes](#kubernetes) backend can be used if a Kubernetes cluster is already set up and configured.
However, often [SSH fleets](../concepts/fleets.md#ssh) are a simpler and lighter alternative.
However, often [SSH fleets](../concepts/fleets.md#ssh-fleets) are a simpler and lighter alternative.

### SSH fleets

SSH fleets require no backend configuration.
All you need to do is [provide hostnames and SSH credentials](../concepts/fleets.md#ssh), and `dstack` sets up a fleet that can orchestrate container-based runs on your servers.
All you need to do is [provide hostnames and SSH credentials](../concepts/fleets.md#ssh-fleets), and `dstack` sets up a fleet that can orchestrate container-based runs on your servers.

SSH fleets support the same features as [VM-based](#vm-based) backends.

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/concepts/fleets.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Fleets act both as pools of instances and as templates for how those instances a

When you run `dstack apply` to start a dev environment, task, or service, `dstack` will reuse idle instances from an existing fleet whenever available.

## Backend fleets { #backend-fleets }
## Backend fleets

If you configured [backends](backends.md), `dstack` can provision fleets on the fly.
However, it’s recommended to define fleets explicitly.
Expand Down Expand Up @@ -269,7 +269,7 @@ retry:
[`max_price`](../reference/dstack.yml/fleet.md#max_price), and
among [others](../reference/dstack.yml/fleet.md).

## SSH fleets { #ssh-fleets }
## SSH fleets

If you have a group of on-prem servers accessible via SSH, you can create an SSH fleet.

Expand Down
12 changes: 6 additions & 6 deletions docs/docs/concepts/volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ Volumes enable data persistence between runs of dev environments, tasks, and ser

`dstack` supports two kinds of volumes:

* [Network volumes](#network) &mdash; provisioned via backends and mounted to specific container directories.
* [Network volumes](#network-volumes) &mdash; provisioned via backends and mounted to specific container directories.
Ideal for persistent storage.
* [Instance volumes](#instance) &mdash; bind directories on the host instance to container directories.
* [Instance volumes](#instance-volumes) &mdash; bind directories on the host instance to container directories.
Useful as a cache for cloud fleets or for persistent storage with SSH fleets.

## Network volumes { #network }
## Network volumes

Network volumes are currently supported for the `aws`, `gcp`, and `runpod` backends.

Expand Down Expand Up @@ -222,7 +222,7 @@ If you've registered an existing volume, it will be de-registered with `dstack`
??? info "Can I attach network volumes to multiple runs or instances?"
You can mount a volume in multiple runs. This feature is currently supported only by the `runpod` backend.

## Instance volumes { #instance }
## Instance volumes

Instance volumes allow mapping any directory on the instance where the run is executed to any path inside the container.
This means that the data in instance volumes is persisted only if the run is executed on the same instance.
Expand Down Expand Up @@ -257,7 +257,7 @@ Since persistence isn't guaranteed (instances may be interrupted or runs may occ
volumes only for caching or with directories manually mounted to network storage.

!!! info "Backends"
Instance volumes are currently supported for all backends except `runpod`, `vastai` and `kubernetes`, and can also be used with [SSH fleets](fleets.md#ssh).
Instance volumes are currently supported for all backends except `runpod`, `vastai` and `kubernetes`, and can also be used with [SSH fleets](fleets.md#ssh-fleets).

??? info "Optional volumes"
If the volume is not critical for your workload, you can mark it as `optional`.
Expand Down Expand Up @@ -297,7 +297,7 @@ volumes:

### Use instance volumes with SSH fleets

If you control the instances (e.g. they are on-prem servers configured via [SSH fleets](fleets.md#ssh)),
If you control the instances (e.g. they are on-prem servers configured via [SSH fleets](fleets.md#ssh-fleets)),
you can mount network storage (e.g., NFS or SMB) and use the mount points as instance volumes.

For example, if you mount a network storage to `/mnt/nfs-storage` on all hosts of your SSH fleet,
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/guides/clusters.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ Ensure a fleet is created before you run any distributed task. This can be eithe

### SSH fleets

[SSH fleets](../concepts/fleets.md#ssh) can be used to create a fleet out of existing baremetals or VMs, e.g. if they are already pre-provisioned, or set up on-premises.
[SSH fleets](../concepts/fleets.md#ssh-fleets) can be used to create a fleet out of existing baremetals or VMs, e.g. if they are already pre-provisioned, or set up on-premises.

> For SSH fleets, fast interconnect is supported provided that the hosts are pre-configured with the appropriate interconnect drivers.

### Cloud fleets

[Cloud fleets](../concepts/fleets.md#cloud) allow to provision interconnected clusters across supported backends.
[Cloud fleets](../concepts/fleets.md#backend-fleets) allow to provision interconnected clusters across supported backends.
For cloud fleets, fast interconnect is currently supported only on the `aws`, `gcp`, `nebius`, and `runpod` backends.

=== "AWS"
Expand Down Expand Up @@ -68,7 +68,7 @@ To test the interconnect of a created fleet, ensure you run [NCCL](../../example

### Instance volumes

[Instance volumes](../concepts/volumes.md#instance) enable mounting any folder from the host into the container, allowing data persistence during distributed tasks.
[Instance volumes](../concepts/volumes.md#instance-volumes) enable mounting any folder from the host into the container, allowing data persistence during distributed tasks.

Instance volumes can be used to mount:

Expand All @@ -77,7 +77,7 @@ Instance volumes can be used to mount:

### Network volumes

Currently, no backend supports multi-attach [network volumes](../concepts/volumes.md#network) for distributed tasks. However, single-attach volumes can be used by leveraging volume name [interpolation syntax](../concepts/volumes.md#distributed-tasks). This approach mounts a separate single-attach volume to each node.
Currently, no backend supports multi-attach [network volumes](../concepts/volumes.md#network-volumes) for distributed tasks. However, single-attach volumes can be used by leveraging volume name [interpolation syntax](../concepts/volumes.md#distributed-tasks). This approach mounts a separate single-attach volume to each node.

!!! info "What's next?"
1. Read about [distributed tasks](../concepts/tasks.md#distributed-tasks), [fleets](../concepts/fleets.md), and [volumes](../concepts/volumes.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/guides/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,4 @@ For more details on clusters, see the [corresponding guide](clusters.md).

If your priority is orchestrating cloud GPUs and Kubernetes isn’t a must, [VM-based](../concepts/backends.md#vm-based) backends are a better fit thanks to their native cloud integration.

For on-prem GPUs where Kubernetes is optional, [SSH fleets](../concepts/fleets.md#ssh) provide a simpler and more lightweight alternative.
For on-prem GPUs where Kubernetes is optional, [SSH fleets](../concepts/fleets.md#ssh-fleets) provide a simpler and more lightweight alternative.
2 changes: 1 addition & 1 deletion docs/docs/guides/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ In addition to the essential metrics available via the CLI and UI, `dstack` expo

??? info "NVIDIA DCGM"
NVIDIA DCGM metrics are automatically collected for `aws`, `azure`, `gcp`, and `oci` backends,
as well as for [SSH fleets](../concepts/fleets.md#ssh).
as well as for [SSH fleets](../concepts/fleets.md#ssh-fleets).

To ensure NVIDIA DCGM metrics are collected from SSH fleets, ensure the `datacenter-gpu-manager-4-core`,
`datacenter-gpu-manager-4-proprietary`, and `datacenter-gpu-manager-exporter` packages are installed on the hosts.
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/guides/protips.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,11 +215,11 @@ To change the default idle duration, set
## Volumes

To persist data across runs, it is recommended to use volumes.
`dstack` supports two types of volumes: [network](../concepts/volumes.md#network)
`dstack` supports two types of volumes: [network](../concepts/volumes.md#network-volumes)
(for persisting data even if the instance is interrupted)
and [instance](../concepts/volumes.md#instance) (useful for persisting cached data across runs while the instance remains active).
and [instance](../concepts/volumes.md#instance-volumes) (useful for persisting cached data across runs while the instance remains active).

> If you use [SSH fleets](../concepts/fleets.md#ssh), you can mount network storage (e.g., NFS or SMB) to the hosts and access it in runs via instance volumes.
> If you use [SSH fleets](../concepts/fleets.md#ssh-fleets), you can mount network storage (e.g., NFS or SMB) to the hosts and access it in runs via instance volumes.

## Environment variables

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/guides/server-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The server loads this file on startup.
Alternatively, you can configure backends on the [project settings page](../concepts/projects.md#backends) via UI.

> For using `dstack` with on-prem servers, no backend configuration is required.
> Use [SSH fleets](../concepts/fleets.md#ssh) instead.
> Use [SSH fleets](../concepts/fleets.md#ssh-fleets) instead.

## State persistence

Expand Down
10 changes: 5 additions & 5 deletions docs/docs/guides/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Below are some of the reasons why this might happen.
#### Cause 1: No capacity providers

Before you can run any workloads, you need to configure a [backend](../concepts/backends.md),
create an [SSH fleet](../concepts/fleets.md#ssh), or sign up for
create an [SSH fleet](../concepts/fleets.md#ssh-fleets), or sign up for
[dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
If you have configured a backend and still can't use it, check the output of `dstack server`
for backend configuration errors.
Expand Down Expand Up @@ -93,7 +93,7 @@ Examples: `gpu: amd` (one AMD GPU), `gpu: A10:4..8` (4 to 8 A10 GPUs),

#### Cause 6: Network volumes

If your run configuration uses [network volumes](../concepts/volumes.md#network),
If your run configuration uses [network volumes](../concepts/volumes.md#network-volumes),
`dstack` will only select instances from the same backend and region as the volumes.
For AWS, the availability zone of the volume and the instance should also match.

Expand All @@ -102,8 +102,8 @@ For AWS, the availability zone of the volume and the instance should also match.
Some `dstack` features are not supported by all backends. If your configuration uses
one of these features, `dstack` will only select offers from the backends that support it.

- [Cloud fleet](../concepts/fleets.md#cloud) configurations,
[Instance volumes](../concepts/volumes.md#instance),
- [Backend fleets](../concepts/fleets.md#backend-fleets) configurations,
[Instance volumes](../concepts/volumes.md#instance-volumes),
and [Privileged containers](../reference/dstack.yml/dev-environment.md#privileged)
are supported by all backends except `runpod`, `vastai`, and `kubernetes`.
- [Clusters](../concepts/fleets.md#cloud-placement)
Expand All @@ -120,7 +120,7 @@ If you are using
you will not see marketplace offers until you top up your balance.
Alternatively, you can configure your own cloud accounts
on the [project settings page](../concepts/projects.md#backends)
or use [SSH fleets](../concepts/fleets.md#ssh).
or use [SSH fleets](../concepts/fleets.md#ssh-fleets).

### Provisioning fails

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/installation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Backends can be set up in `~/.dstack/server/config.yml` or through the [project
For more details, see [Backends](../concepts/backends.md).

??? info "SSH fleets"
When using `dstack` with on-prem servers, backend configuration isn’t required. Simply create [SSH fleets](../concepts/fleets.md#ssh) once the server is up.
When using `dstack` with on-prem servers, backend configuration isn’t required. Simply create [SSH fleets](../concepts/fleets.md#ssh-fleets) once the server is up.

### Start the server

Expand Down
30 changes: 15 additions & 15 deletions docs/layouts/custom.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,17 @@ size: { width: 1200, height: 630 }
layers:
- background:
color: "black"
- size: { width: 44, height: 44 }
offset: { x: 970, y: 521 }
- size: { width: 50, height: 50 }
offset: { x: 935, y: 521 }
background:
image: *logo
- size: { width: 300, height: 42 }
offset: { x: 1018, y: 525 }
- size: { width: 340, height: 55 }
offset: { x: 993, y: 521 }
typography:
content: *site_name
color: "white"
- size: { width: 850, height: 320 }
offset: { x: 80, y: 115 }
- size: { width: 1000, height: 220 }
offset: { x: 80, y: 280 }
typography:
content: *page_title
overflow: shrink
Expand All @@ -69,15 +69,15 @@ layers:
line:
amount: 3
height: 1.25
- size: { width: 850, height: 64 }
offset: { x: 80, y: 495 }
typography:
content: *page_description
align: start
color: "white"
line:
amount: 2
height: 1.5
# - size: { width: 850, height: 64 }
# offset: { x: 80, y: 495 }
# typography:
# content: *page_description
# align: start
# color: "white"
# line:
# amount: 2
# height: 1.5

tags:

Expand Down
Loading