Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support dstack volumes #1158

Open
r4victor opened this issue Apr 23, 2024 · 4 comments
Open

[Feature]: Support dstack volumes #1158

r4victor opened this issue Apr 23, 2024 · 4 comments
Assignees
Labels

Comments

@r4victor
Copy link
Collaborator

r4victor commented Apr 23, 2024

Problem

Currently, dstack lacks built-in functionality that would allow users to persist data between runs. Cloud providers usually provide data persistence via network volumes. The proposal is to introduce volumes to dstack.

  • There should be a way to register an existing volume with dstack such as an existing EBS volume (aka external volumes)
  • Users may not need to create volumes themselves, so dstack should be able to create and manage volumes (aka dstack-managed volumes).

Solution

Users will manage volumes via dstack apply. The new dstack apply configuration of type volume is to be introduced. Here's some examples of volume configurations:

  • AWS EBS dstack-managed volume

    type: volume
    name: my-dstack-aws-volume
    backend: aws
    region: eu-west-1
    size: 500GB
    
  • AWS EBS external volume

    type: volume
    name: my-external-aws-volume
    backend: aws
    region: eu-west-1
    volume_id: vol23at3432
    

Once a volume is already added to dstack, it can be mounted in a run like this:

type: task
commands:
  - ...
volumes:
  - name: volume-name
    path: /dstack_data

dstack will try to provision the instance in the backend/region of that volume. In case of no availability, the run will fail – other backends/regions won’t be tried since the specified volume cannot be mounted there.

Scope and Future plans

We'll start by implementing volumes support for AWS backend. Other backends such as GCP, Azure, OCI, runpod are likely to follow.

dstack will implement default volume configurations for all backends to allow for simple volume configurations as above. Users may also need to specify advanced configurations to make use of backend-specific features. This can be done by introducing a spec property of different types for different volumes storage options such as EBS/Persistent Disks/HyperDisks/etc:

type: volume
name: my-dstack-aws-advanced-volume
backend: aws
region: eu-west-1
size: 500GB
spec:
  type: ebs
  volume_type: io2
  iops: 256000
  az: eu-west-1a

The following is not planned for initial volume release but may be added later:

  • Volumes pricing.
  • Volumes support in dstack Sky (requires pricing).
  • Different volume kinds such as GCP’s Hyperdisks and Persistent Disks.
  • Volumes support for Tensordock and VastAI. These backends have no explicit support for volumes. Persistence can only be achieved via instance stopping/restarting.
  • Volumes support for Kubernetes. Not clear which volumes types to support and how to do that.
  • Attaching volumes to multiple instances.
@JosvanderWesthuizen
Copy link

Would you be able to use the aws volume with other backends? (e.g. gcp or tensorboard)

@peterschmidt85
Copy link
Contributor

Would you be able to use the aws volume with other backends? (e.g. gcp or tensorboard)

@JosvanderWesthuizen Volumes can be used only within the same provider (and within the same region). That's how volumes work.
Replicating volumes data across providers and regions in theory is possible via a custom script.

@peterschmidt85
Copy link
Contributor

@r4victor BTW, would be great to update the issue description to reflect the plans to use dstack apply and remove what is not relevant anymore.

@r4victor
Copy link
Collaborator Author

Updated issue description with dstack apply-based approach.

@r4victor r4victor self-assigned this Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants