Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Runpod network volumes #1296

Closed
dinosaursarecool opened this issue Jun 2, 2024 · 6 comments · Fixed by #1420
Closed

[Feature]: Support Runpod network volumes #1296

dinosaursarecool opened this issue Jun 2, 2024 · 6 comments · Fixed by #1420
Assignees
Labels

Comments

@dinosaursarecool
Copy link

Problem

In order to get the most out of Runpod deployments, it would be amazing to have support for things like network storage, selecting specific data centers, or specifying a template_id that includes a lot of the existing configuration.

The create_pod [link] function in the api_client of the runpod backend accepts these parameters, namely template_id, data_center_id, network_volume_id, however when defined in a configuration, e.g. as example.dstack.yml:

type: task

spot_policy: auto
template_id: runpod-torch-v21
data_center_id: EU-RO-1

backends: [runpod]

dstack run . -f example.dstack.yml fails with:

3 validation errors for RunConfigurationRequest
__root__ -> TaskConfigurationRequest -> data_center_id
  extra fields not permitted (type=value_error.extra)
__root__ -> TaskConfigurationRequest -> template_id
  extra fields not permitted (type=value_error.extra)
__root__ -> TaskConfigurationRequest -> __root__
  Either `commands` or `image` must be set (type=value_error)

There are 2 problems with this:

  1. It appears the configuration values such as template_id, data_center_id, network_volume_id are not picked up as valid variables.
  2. On a philosophical level there's a question if image or command should be required to be defined in the dstack task itself if a runpod template is used (i.e., there is a template_id reference), as that template will already define the image and command. My biased view is that the template should override what's in the dstack configuration, but I think either way it's workable so it has little practical importance and might more come down to what's more suitable according to the principles of the dstack architecture.

Having support for (1) would be incredibly helpful as it enabled network volume usage on runpod which enables usage of dstack for large(r) scale deployments where downloading remote models for each instance is too expensive.

Solution

Add support for runpod variables to the dstack configuration.
Pass those variables to the runpod backend and the create_pod function.

Workaround

None to my knowledge, but I recognize there's an open issue for general volume support #1158 which would alleviate some of these pains. However, having support for these configuration variables in general seems like a quick win to increase runpod adoption

Would you like to help us implement this feature by sending a PR?

No

@peterschmidt85
Copy link
Contributor

@dinosaursarecool Thank you very much for the request.
Here's a few questions that may help us move forward with this:

  1. data_center_id. AFAIK, dstack supports this via regions:
type: task

spot_policy: auto
regions: [EU-RO-1]
backends: [runpod]
  1. network_volume_id this feature is planned as a part of [Feature]: Support dstack volumes #1158

First, we'll support AWS and GCP and after that we're also happy to support RunPod too!

  1. As to template_id, is there anything that you need template_id what dstack doesn't support? I wonder why you many nee to use templates? You can specify everything via commands and you repo files. Please let me know!

@dinosaursarecool
Copy link
Author

@peterschmidt85 Thanks, got it. Yeah I think everything should be achievable through the current dstack configuration except for volumes. So if volume support is solved in #1158 then I can see how we could consider template support to be superfluous

@peterschmidt85
Copy link
Contributor

@dinosaursarecool Don't mind if we update the title/description of this issue to focus on just volumes with RunPod?

@dinosaursarecool dinosaursarecool changed the title [Feature]: Support Runpod custom arguments (template_id, network_volume_id, data_center_id, etc) [Feature]: Support Runpod network volumes Jun 4, 2024
@dinosaursarecool
Copy link
Author

@peterschmidt85 absolutely, updated the title

@r4victor r4victor mentioned this issue Jun 28, 2024
42 tasks
@peterschmidt85
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@r4victor
Copy link
Collaborator

@dinosaursarecool, the support for runpod network volumes is in master. Give it a try! It will be coming in the next 0.18.7 release within two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants