diff --git a/contributing/BACKENDS.md b/contributing/BACKENDS.md index 416110e67..719d8473b 100644 --- a/contributing/BACKENDS.md +++ b/contributing/BACKENDS.md @@ -1,96 +1,165 @@ -# How to add a new backend +# How to add a Backend to dstack.ai +## Introduction -## Overview +Welcome to the Integration Guide for adding a backend by intergrating new cloud providers to gpuhunt and extending the capabilities of dstack.
+This document is designed to assist developers and contributors in integrating additional cloud computing resources into dstack. + + +## Overview of Steps 1. Add cloud provider to `gpuhunt` - 1. Add `src/gpuhunt/providers/.py` - 2. Define class attribute `NAME` and implement -2. Add Backend, Compute, and configuration models in `dstack` +2. Integrating a Cloud Provider into dstackai/dstack -## dstackai/gpuhunt +## Adding a cloud provider to dstackai/gpuhunt +To integrate a new cloud provider into `gpuhunt`, follow these steps: -Clone and open https://github.com/dstackai/gpuhunt. Create `Provider` class -in `src/gpuhunt/providers/.py`. +1. **Clone the Repository**: Start by cloning the `gpuhunt` repository from GitHub: +```bash +https://github.com/dstackai/gpuhunt.git +``` + 2. **Create the Provider Class**: Navigate to the `providers` directory and create a new Python file for your provider: +- Path: `src/gpuhunt/providers/.py` +- Replace `` with the name of your cloud provider. -Your class must inherit `AbstractProvider`, have `NAME` class variable, and implement `get` method. Use -optional `query_filter` to speed up the query. Use `balance_resources` if your backend provides fine-grained control on -resources like RAM and CPU to prevent under-optimal configurations (i.e., A100 80GB with 1 GB of RAM). +3. **Implement the Provider Class**: Your class should meet the following criteria: -`get` method is called during catalog generation for `offline` providers and every query for `online` providers. +- **Inherit from `AbstractProvider`**: Ensure your class extends the `AbstractProvider` base class. + ```python + from gpuhunt.providers import AbstractProvider -> There are two types of providers in `gpuhunt`: ->1. `offline` — providers that take a lot of time to get all offers. A catalog is precomputed and stored as csv file ->2. `online` — providers that take a few seconds to get all offers. A catalog is computed in a real-time as needed + class Provider(AbstractProvider): + ``` -If your provider is `offline`, also add data quality tests to `src/integrity_tests/test_.py` to verify -generated csv files before publication. +- **Define the `NAME` Class Variable**: This should be a unique identifier for your provider. -## dstackai/dstack + ```python + NAME = '_name' + ``` -Clone and open https://github.com/dstackai/dstack. Follow `CONTRIBUTING.md` to setup your environment. +- **Implement the `get` Method**: This method is responsible for fetching the available GPU resources information from your cloud provider. Implement it according to the `AbstractProvider` interface. -Add your dependencies to `setup.py` in a separate `` section. Also, update `all` section. + ```python + def get(self, query_filter: Optional[QueryFilter] = None, balance_resources: bool = True) -> List[RawCatalogItem]: + # Implementation here + ``` +- **Utilize `query_filter`**: (Optional) Use this parameter to speed up the query process by filtering results early on. -Add a new enum entry `BackendType.` at `src/dstack/_internal/core/models/backends/base.py`. +- **Use `balance_resources`**: If your backend offers detailed control over resources (like RAM and CPU), to prevent configurations that are not optimal, such as pairing a high-end GPU with insufficient RAM (i.e., A100 80GB with 1 GB of RAM). -Create `src/dstack/_internal/core/backends/` directory: +4. **Understand Provider Types**: +- `gpuhunt` distinguishes between two types of providers: + 1. **`offline`**: These providers take a significant amount of time to retrieve all offers. A catalog is precomputed and stored as a CSV file. + 2. **`online`**: These providers can fetch all offers within a few seconds. A catalog is computed in real-time as needed. -- Implement `YourProviderBackend` in `__init__.py`, inherit it from `BaseBackend`. - - Define the `TYPE` class variable. -- Implement `Compute` in `compute.py`, and inherit it from `Compute`. - - Implement `get_offers`. It will be called every time the user wants to provision something. Add availability - information if possible. - - Implement `run_job`. Here you create a compute resource and run `dstack-shim` or `dstack-runner`. - - Implement `terminate_instance`. This method should not raise an error, if there is no such instance. -- Implement `Config` in `config.py`, inherit it from `BackendConfig` and `StoredConfig`. - This config is accepted by `Backend` class. -> There are two types of compute in `dstask`: ->1. `dockerized: False` — the backend runs `dstack-shim`. Later, `dstack-shim` will create a job container - with `dstack-runner` in it. This is common for VM. ->2. `dockerized: True` — the backend runs `dstack-runner` inside a docker container. +5. **Data Quality Tests for Offline Providers**: +- If your provider is classified as `offline`, you should add data quality tests to ensure the integrity of the precomputed CSV files. These tests are located in: + ``` + src/integrity_tests/test_.py + ``` +- Replace `` with the name of your cloud provider. These tests verify the generated CSV files before publication to ensure accuracy and reliability. -> Note, that the Compute class interface is subject to changes with the coming pools feature release. -Create configuration models in `src/dstack/_internal/core/models/backends/.py`. `ConfigInfo` -contains everything except for the credentials. You may have multiple models for credentials (i.e., default -credentials & explicit credentials). Create a model with creds: `ConfigInfoWithCreds`. Create a model with -all fields being optional: `ConfigInfoWithCredsPartial`. Create a model representing UI elements for -configurator: `ConfigValues`. +## Integrating a Cloud Provider into dstackai/dstack -Import all created models to `src/dstack/_internal/core/models/backends/__init__.py`. +Integrating a new cloud provider into `dstack` involves several key steps, from setting up your development environment to implementing specific backend configurations. Here’s how to proceed: -Implement `Configurator` -in `src/dstack/_internal/server/services/backends/configurators/.py` +### Setup and Initial Configuration -Add `Config` in `src/dstack/_internal/server/services/config.py`. This model represents the YAML -configuration. +1. **Clone the `dstack` Repository**: Begin by cloning the `dstack` repository from GitHub: -Add safe import for your backend in `src/dstack/_internal/server/services/backends/__init__.py`. Update expected -backends in tests in `src/tests/_internal/server/routers/test_backends.py`. +```bash +git clone https://github.com/dstackai/dstack.git +``` -## Appendix +2. **Follow Setup Instructions**: Consult the `CONTRIBUTING.md` document within the repository for instructions on setting up your development environment. + +### Modifying `setup.py` + +1. **Add Dependencies**: Incorporate any dependencies required by your cloud provider into `setup.py`. Create a separate section named `` for these dependencies and ensure to update the `all` section to include them. + +### Extending Backend Models + +1. **Add Backend Type**: Insert a new enumeration entry for your backend in `src/dstack/_internal/core/models/backends/base.py`: + +```python + = '' +``` +2. **Create Provider Directory**: Establish a new directory at `src/dstack/_internal/core/backends/ `to house your provider’s backend and compute implementations. + + +3. **Backend Implementation:** +In `__init__.py`, implement `Backend`, inheriting from `BaseBackend`. Define the `TYPE` class variable to associate your backend with the newly added enum entry. + +4. **Compute Implementation:** +In `compute.py`, develop `Compute`, inheriting from `Compute`.
+ +You'll need to implement methods like + - `get_offers` It will be called every time the user wants to provision something. Add availability information if possible. + - `run_job` Here you create a compute resource and run `dstack-shim` or `dstack-runner`. + - `terminate_instance` This method should not raise an error, if there is no such instance. + +5. **Configuration Implementation**: +- Implement the `Config` class in `config.py`, inheriting from both `BackendConfig` and `StoredConfig`. This configuration is accepted by the `Backend` class. + + +### Configuration Models + 1. **Create Configuration Models:** -### Adding VM compute backend +You may have multiple models for credentials (i.e., default credentials & explicit credentials). + In `src/dstack/_internal/core/models/backends/.py`, create models for your provider's configuration: +- `ConfigInfo:` create a model with all configuration details except credentials. +- `ConfigInfoWithCreds`: create a model with credentials. +- `ConfigInfoWithCredsPartial`: create a model with all fields optional. +- `ConfigValues:` create a model representing UI elements for configurator. -`dstack` expects the following features from your backend: +2. **Import Models:** +Ensure all new models are imported into `src/dstack/_internal/core/models/backends/__init__.py`. + +### Finalizing Integration +1. **Implement Configurator:** +Develop `Configurator` in `src/dstack/_internal/server/services/backends/configurators/.py`. + +2. **Add YAML Configuration Model:** +Insert `Config` in `src/dstack/_internal/server/services/config.py` to represent the provider’s configuration in YAML. + +3. **Ensure Safe Import:** +Add a safe import for your backend in `src/dstack/_internal/server/services/backends/__init__.py` and update expected backends in tests within `src/tests/_internal/server/routers/test_backends.py.` + + + + + +## Appendix +### Adding VM Compute Backend +dstack expects VM backends to have: - Ubuntu 22.04 LTS - Nvidia Drivers 535 - Docker with Nvidia runtime - OpenSSH server - External IP & 1 port for SSH (any) -- cloud-init script (preferable) +- cloud-init script (preferred) - API for creating and terminating instances -To accelerate provisioning — we prebuild VM images with necessary dependencies. You can find configurations -in `packer/`. +To speed up provisioning, we prebuild VM images with necessary dependencies, available in `packer/`. -### Adding Docker-only compute backend +Examples: `aws`, `azure`, `gcp` etc -`dstack` expects the following features from your backend: +### Adding Docker-only Compute Backend +For Docker-only backends, dstack requires: - Docker with Nvidia runtime - External IP & 1 port for SSH (any) - Container entrypoint override (~2KB) -- API for creating and terminating containers \ No newline at end of file +- API for creating and terminating containers + +Examples: `kubernetes`, `vastai` etc + +Note: There are two types of compute in dstack: + +- `dockerized: False` — the backend runs `dstack-shim`. This setup is common for VMs. +- `dockerized: True`— the backend directly runs `dstack-runner` inside a docker container. + +The Compute class interface may undergo changes with the upcoming pools feature release, so keep an eye out for updates. +