-
Notifications
You must be signed in to change notification settings - Fork 204
Restructure of 'How-to-add-a backend' contribution guide #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,96 +1,165 @@ | ||
| # How to add a new backend | ||
| # How to add a Backend to dstack.ai | ||
| ## Introduction | ||
|
|
||
| ## Overview | ||
| Welcome to the Integration Guide for adding a backend by intergrating new cloud providers to gpuhunt and extending the capabilities of dstack.<br> | ||
| This document is designed to assist developers and contributors in integrating additional cloud computing resources into dstack. | ||
|
|
||
|
|
||
| ## Overview of Steps | ||
|
|
||
| 1. Add cloud provider to `gpuhunt` | ||
| 1. Add `src/gpuhunt/providers/<name>.py` | ||
| 2. Define class attribute `NAME` and implement | ||
| 2. Add Backend, Compute, and configuration models in `dstack` | ||
| 2. Integrating a Cloud Provider into dstackai/dstack | ||
|
|
||
| ## dstackai/gpuhunt | ||
| ## Adding a cloud provider to dstackai/gpuhunt | ||
| To integrate a new cloud provider into `gpuhunt`, follow these steps: | ||
|
|
||
| Clone and open https://github.com/dstackai/gpuhunt. Create `<YourName>Provider` class | ||
| in `src/gpuhunt/providers/<yourprovider>.py`. | ||
| 1. **Clone the Repository**: Start by cloning the `gpuhunt` repository from GitHub: | ||
| ```bash | ||
| https://github.com/dstackai/gpuhunt.git | ||
| ``` | ||
| 2. **Create the Provider Class**: Navigate to the `providers` directory and create a new Python file for your provider: | ||
| - Path: `src/gpuhunt/providers/<YourProvider>.py` | ||
| - Replace `<YourProvider>` with the name of your cloud provider. | ||
|
|
||
| Your class must inherit `AbstractProvider`, have `NAME` class variable, and implement `get` method. Use | ||
| optional `query_filter` to speed up the query. Use `balance_resources` if your backend provides fine-grained control on | ||
| resources like RAM and CPU to prevent under-optimal configurations (i.e., A100 80GB with 1 GB of RAM). | ||
| 3. **Implement the Provider Class**: Your class should meet the following criteria: | ||
|
|
||
| `get` method is called during catalog generation for `offline` providers and every query for `online` providers. | ||
| - **Inherit from `AbstractProvider`**: Ensure your class extends the `AbstractProvider` base class. | ||
| ```python | ||
| from gpuhunt.providers import AbstractProvider | ||
|
|
||
| > There are two types of providers in `gpuhunt`: | ||
| >1. `offline` — providers that take a lot of time to get all offers. A catalog is precomputed and stored as csv file | ||
| >2. `online` — providers that take a few seconds to get all offers. A catalog is computed in a real-time as needed | ||
| class <YourName>Provider(AbstractProvider): | ||
| ``` | ||
|
|
||
| If your provider is `offline`, also add data quality tests to `src/integrity_tests/test_<yourprovider>.py` to verify | ||
| generated csv files before publication. | ||
| - **Define the `NAME` Class Variable**: This should be a unique identifier for your provider. | ||
|
|
||
| ## dstackai/dstack | ||
| ```python | ||
| NAME = '<YourProvider>_name' | ||
| ``` | ||
|
|
||
| Clone and open https://github.com/dstackai/dstack. Follow `CONTRIBUTING.md` to setup your environment. | ||
| - **Implement the `get` Method**: This method is responsible for fetching the available GPU resources information from your cloud provider. Implement it according to the `AbstractProvider` interface. | ||
|
|
||
| Add your dependencies to `setup.py` in a separate `<yourprovider>` section. Also, update `all` section. | ||
| ```python | ||
| def get(self, query_filter: Optional[QueryFilter] = None, balance_resources: bool = True) -> List[RawCatalogItem]: | ||
| # Implementation here | ||
| ``` | ||
| - **Utilize `query_filter`**: (Optional) Use this parameter to speed up the query process by filtering results early on. | ||
|
|
||
| Add a new enum entry `BackendType.<YourBackend>` at `src/dstack/_internal/core/models/backends/base.py`. | ||
| - **Use `balance_resources`**: If your backend offers detailed control over resources (like RAM and CPU), to prevent configurations that are not optimal, such as pairing a high-end GPU with insufficient RAM (i.e., A100 80GB with 1 GB of RAM). | ||
|
|
||
| Create `src/dstack/_internal/core/backends/<yourprovider>` directory: | ||
| 4. **Understand Provider Types**: | ||
| - `gpuhunt` distinguishes between two types of providers: | ||
| 1. **`offline`**: These providers take a significant amount of time to retrieve all offers. A catalog is precomputed and stored as a CSV file. | ||
| 2. **`online`**: These providers can fetch all offers within a few seconds. A catalog is computed in real-time as needed. | ||
|
|
||
| - Implement `YourProviderBackend` in `__init__.py`, inherit it from `BaseBackend`. | ||
| - Define the `TYPE` class variable. | ||
| - Implement `<YourProvider>Compute` in `compute.py`, and inherit it from `Compute`. | ||
| - Implement `get_offers`. It will be called every time the user wants to provision something. Add availability | ||
| information if possible. | ||
| - Implement `run_job`. Here you create a compute resource and run `dstack-shim` or `dstack-runner`. | ||
| - Implement `terminate_instance`. This method should not raise an error, if there is no such instance. | ||
| - Implement `<YourProvider>Config` in `config.py`, inherit it from `BackendConfig` and `<YourProvider>StoredConfig`. | ||
| This config is accepted by `<YourProvider>Backend` class. | ||
|
|
||
| > There are two types of compute in `dstask`: | ||
| >1. `dockerized: False` — the backend runs `dstack-shim`. Later, `dstack-shim` will create a job container | ||
| with `dstack-runner` in it. This is common for VM. | ||
| >2. `dockerized: True` — the backend runs `dstack-runner` inside a docker container. | ||
| 5. **Data Quality Tests for Offline Providers**: | ||
| - If your provider is classified as `offline`, you should add data quality tests to ensure the integrity of the precomputed CSV files. These tests are located in: | ||
| ``` | ||
| src/integrity_tests/test_<YourProvider>.py | ||
| ``` | ||
| - Replace `<YourProvider>` with the name of your cloud provider. These tests verify the generated CSV files before publication to ensure accuracy and reliability. | ||
|
|
||
| > Note, that the Compute class interface is subject to changes with the coming pools feature release. | ||
|
|
||
| Create configuration models in `src/dstack/_internal/core/models/backends/<yourprovider>.py`. `<YourProvider>ConfigInfo` | ||
| contains everything except for the credentials. You may have multiple models for credentials (i.e., default | ||
| credentials & explicit credentials). Create a model with creds: `<YourProvider>ConfigInfoWithCreds`. Create a model with | ||
| all fields being optional: `<YourProvider>ConfigInfoWithCredsPartial`. Create a model representing UI elements for | ||
| configurator: `<YourProvider>ConfigValues`. | ||
| ## Integrating a Cloud Provider into dstackai/dstack | ||
|
|
||
| Import all created models to `src/dstack/_internal/core/models/backends/__init__.py`. | ||
| Integrating a new cloud provider into `dstack` involves several key steps, from setting up your development environment to implementing specific backend configurations. Here’s how to proceed: | ||
|
|
||
| Implement `<YourProvider>Configurator` | ||
| in `src/dstack/_internal/server/services/backends/configurators/<yourprovider>.py` | ||
| ### Setup and Initial Configuration | ||
|
|
||
| Add `<YourProvider>Config` in `src/dstack/_internal/server/services/config.py`. This model represents the YAML | ||
| configuration. | ||
| 1. **Clone the `dstack` Repository**: Begin by cloning the `dstack` repository from GitHub: | ||
|
|
||
| Add safe import for your backend in `src/dstack/_internal/server/services/backends/__init__.py`. Update expected | ||
| backends in tests in `src/tests/_internal/server/routers/test_backends.py`. | ||
| ```bash | ||
| git clone https://github.com/dstackai/dstack.git | ||
| ``` | ||
|
|
||
| ## Appendix | ||
| 2. **Follow Setup Instructions**: Consult the `CONTRIBUTING.md` document within the repository for instructions on setting up your development environment. | ||
|
|
||
| ### Modifying `setup.py` | ||
|
|
||
| 1. **Add Dependencies**: Incorporate any dependencies required by your cloud provider into `setup.py`. Create a separate section named `<YourProvider>` for these dependencies and ensure to update the `all` section to include them. | ||
|
|
||
| ### Extending Backend Models | ||
|
|
||
| 1. **Add Backend Type**: Insert a new enumeration entry for your backend in `src/dstack/_internal/core/models/backends/base.py`: | ||
|
|
||
| ```python | ||
| <YOURBACKEND> = '<your_backend>' | ||
| ``` | ||
| 2. **Create Provider Directory**: Establish a new directory at `src/dstack/_internal/core/backends/<YourProvider> `to house your provider’s backend and compute implementations. | ||
|
|
||
|
|
||
| 3. **Backend Implementation:** | ||
| In `__init__.py`, implement `<YourProvider>Backend`, inheriting from `BaseBackend`. Define the `TYPE` class variable to associate your backend with the newly added enum entry. | ||
|
|
||
| 4. **Compute Implementation:** | ||
| In `compute.py`, develop `<YourProvider>Compute`, inheriting from `Compute`.<br> | ||
|
|
||
| You'll need to implement methods like | ||
| - `get_offers` It will be called every time the user wants to provision something. Add availability information if possible. | ||
| - `run_job` Here you create a compute resource and run `dstack-shim` or `dstack-runner`. | ||
| - `terminate_instance` This method should not raise an error, if there is no such instance. | ||
|
|
||
| 5. **Configuration Implementation**: | ||
| - Implement the `<YourProvider>Config` class in `config.py`, inheriting from both `BackendConfig` and `<YourProvider>StoredConfig`. This configuration is accepted by the `<YourProvider>Backend` class. | ||
|
|
||
|
|
||
| ### Configuration Models | ||
| 1. **Create Configuration Models:** | ||
|
|
||
| ### Adding VM compute backend | ||
| You may have multiple models for credentials (i.e., default credentials & explicit credentials). | ||
| In `src/dstack/_internal/core/models/backends/<YourProvider>.py`, create models for your provider's configuration: | ||
| - `<YourProvider>ConfigInfo:` create a model with all configuration details except credentials. | ||
| - `<YourProvider>ConfigInfoWithCreds`: create a model with credentials. | ||
| - `<YourProvider>ConfigInfoWithCredsPartial`: create a model with all fields optional. | ||
| - `<YourProvider>ConfigValues:` create a model representing UI elements for configurator. | ||
|
|
||
| `dstack` expects the following features from your backend: | ||
| 2. **Import Models:** | ||
| Ensure all new models are imported into `src/dstack/_internal/core/models/backends/__init__.py`. | ||
|
|
||
| ### Finalizing Integration | ||
| 1. **Implement Configurator:** | ||
| Develop `<YourProvider>Configurator` in `src/dstack/_internal/server/services/backends/configurators/<YourProvider>.py`. | ||
|
|
||
| 2. **Add YAML Configuration Model:** | ||
| Insert `<YourProvider>Config` in `src/dstack/_internal/server/services/config.py` to represent the provider’s configuration in YAML. | ||
|
|
||
| 3. **Ensure Safe Import:** | ||
| Add a safe import for your backend in `src/dstack/_internal/server/services/backends/__init__.py` and update expected backends in tests within `src/tests/_internal/server/routers/test_backends.py.` | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| ## Appendix | ||
Egor-S marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### Adding VM Compute Backend | ||
| dstack expects VM backends to have: | ||
|
|
||
| - Ubuntu 22.04 LTS | ||
| - Nvidia Drivers 535 | ||
| - Docker with Nvidia runtime | ||
| - OpenSSH server | ||
| - External IP & 1 port for SSH (any) | ||
| - cloud-init script (preferable) | ||
| - cloud-init script (preferred) | ||
| - API for creating and terminating instances | ||
|
|
||
| To accelerate provisioning — we prebuild VM images with necessary dependencies. You can find configurations | ||
| in `packer/`. | ||
| To speed up provisioning, we prebuild VM images with necessary dependencies, available in `packer/`. | ||
|
|
||
| ### Adding Docker-only compute backend | ||
| Examples: `aws`, `azure`, `gcp` etc | ||
|
|
||
| `dstack` expects the following features from your backend: | ||
| ### Adding Docker-only Compute Backend | ||
| For Docker-only backends, dstack requires: | ||
|
|
||
| - Docker with Nvidia runtime | ||
| - External IP & 1 port for SSH (any) | ||
| - Container entrypoint override (~2KB) | ||
| - API for creating and terminating containers | ||
| - API for creating and terminating containers | ||
|
|
||
| Examples: `kubernetes`, `vastai` etc | ||
|
|
||
| Note: There are two types of compute in dstack: | ||
|
|
||
| - `dockerized: False` — the backend runs `dstack-shim`. This setup is common for VMs. | ||
| - `dockerized: True`— the backend directly runs `dstack-runner` inside a docker container. | ||
|
|
||
| The Compute class interface may undergo changes with the upcoming pools feature release, so keep an eye out for updates. | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.