Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
826ddde
Import gateway code, add README
Egor-S Dec 20, 2023
c676c5d
Test Store.register agains failures
Egor-S Dec 20, 2023
acecc6f
Refactor rollbacks with AsyncExitStack
Egor-S Dec 20, 2023
b3cc3f8
Register entrypoint
Egor-S Dec 20, 2023
e48a2b9
Disable SSH tunnel from the runner
Egor-S Dec 21, 2023
952dc31
Pick the latest runner build from the same branch
Egor-S Dec 21, 2023
c33a5d8
Fix get_latest_runner_build
Egor-S Dec 21, 2023
bc3bdd5
Add gateway TODOs
Egor-S Dec 22, 2023
15f1882
Add gateway-build job to build.yml
Egor-S Dec 22, 2023
8e512d9
Fix python version
Egor-S Dec 22, 2023
0ef4ac5
Fix dependencies
Egor-S Dec 22, 2023
f95fa88
Handle in get_latest_runner_build if commit is not fetched
Egor-S Dec 25, 2023
b33808b
Implement GatewayClient
Egor-S Dec 25, 2023
b187aff
Implement gateway tunnel
Egor-S Dec 25, 2023
430641e
Register a service on RUNNING status
Egor-S Dec 26, 2023
7a9cab7
Implement gateway preflight: check conflicts, issue SSL certificate
Egor-S Dec 26, 2023
1dfe5ff
Enable ServerAliveInterval
Egor-S Dec 26, 2023
54caec6
Add openai to service configuration
Egor-S Dec 27, 2023
d7d9d67
Pull tokenizer_config from HF repo
Egor-S Dec 27, 2023
0e193a1
Deploy gateway as systemd service
Egor-S Dec 28, 2023
e51ae1c
Remove www-data from gateway cloud-config
Egor-S Dec 28, 2023
8d669b2
Install dstack.gateway on gateway provisioning
Egor-S Dec 28, 2023
8f277ec
Fix gateway provisioning after testing
Egor-S Jan 8, 2024
719c0c9
Fix GCP gateway deletion
Egor-S Jan 8, 2024
9e92a01
Update server_names_hash_bucket_size on gateway creation
Egor-S Jan 8, 2024
49d7e7d
Fix GatewayComputeModel.ssh_public_key
Egor-S Jan 8, 2024
2e908c8
Update ARCHITECTURE.md
Egor-S Jan 8, 2024
4e7332b
Unregister service on instance termination
Egor-S Jan 8, 2024
11b27a3
Merge branch 'master' into issue-799-gateway-app
Egor-S Jan 9, 2024
f117098
Use gateway.<domain> for OpenAI entrypoint
Egor-S Jan 9, 2024
ee136ec
Add GatewayComputeModel.deleted flag
Egor-S Jan 9, 2024
b16fd54
Drop unused code
Egor-S Jan 9, 2024
ea7f36d
Update gateways on server start
Egor-S Jan 9, 2024
a7ecdb8
Build gateway in release
Egor-S Jan 9, 2024
7c443e4
Preserve gateway state on restart
Egor-S Jan 10, 2024
6e9dd20
Set ConnectTimeout on gateway update
Egor-S Jan 10, 2024
60b7fca
Attach only if provisioning was successful
Egor-S Jan 11, 2024
e4591f2
Merge branch 'master' into issue-799-gateway-app
Egor-S Jan 11, 2024
9ca201d
Add service_url to Run
Egor-S Jan 11, 2024
7232efc
Address review comments
Egor-S Jan 11, 2024
142393f
Bump gpuhunt version
Egor-S Jan 11, 2024
d3d164d
Use python 3.10 for dstack-gateway
Egor-S Jan 15, 2024
bc79fd9
Small gateway fixes
Egor-S Jan 15, 2024
cb1fce3
Add openai interface docs
Egor-S Jan 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,3 +175,32 @@ jobs:
aws s3 cp configuration.json "s3://dstack-runner-downloads-stgn/latest/schemas/configuration.json" --acl public-read
aws s3 cp profiles.json "s3://dstack-runner-downloads-stgn/$VERSION/schemas/profiles.json" --acl public-read
aws s3 cp profiles.json "s3://dstack-runner-downloads-stgn/latest/schemas/profiles.json" --acl public-read

gateway-build:
runs-on: ubuntu-latest
defaults:
run:
working-directory: gateway
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Install AWS
run: pip install awscli
- name: Install dependencies
run: pip install wheel build
- name: Compute version
run: echo VERSION=$((${{ github.run_number }} + ${{ env.BUILD_INCREMENT }})) > $GITHUB_ENV
- name: Build package
run: |
echo "__version__ = \"${{ env.VERSION }}\"" > src/dstack/gateway/version.py
python -m build .
- name: Upload to S3
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
WHEEL=dstack_gateway-${{ env.VERSION }}-py3-none-any.whl
aws s3 cp dist/$WHEEL "s3://dstack-gateway-downloads/stgn/$WHEEL"
29 changes: 29 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,32 @@ jobs:
aws s3 cp configuration.json "s3://dstack-runner-downloads/latest/schemas/configuration.json" --acl public-read
aws s3 cp profiles.json "s3://dstack-runner-downloads/$VERSION/schemas/profiles.json" --acl public-read
aws s3 cp profiles.json "s3://dstack-runner-downloads/latest/schemas/profiles.json" --acl public-read

gateway-build:
runs-on: ubuntu-latest
defaults:
run:
working-directory: gateway
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Install AWS
run: pip install awscli
- name: Install dependencies
run: pip install wheel build
- name: Store version
run: echo VERSION=${GITHUB_REF#refs/tags/} > $GITHUB_ENV
- name: Build package
run: |
echo "__version__ = \"${{ env.VERSION }}\"" > src/dstack/gateway/version.py
python -m build .
- name: Upload to S3
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
WHEEL=dstack_gateway-${{ env.VERSION }}-py3-none-any.whl
aws s3 cp dist/$WHEEL "s3://dstack-gateway-downloads/release/$WHEEL"
11 changes: 9 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@

## Overview

The `dstack` platform consists of five major components:
The `dstack` platform consists of six major components:

* the server
* the Python API
* the CLI
* the runner
* the shim
* the gateway (optional)

The server provides an HTTP API for submitting runs and managing all of the `dstack` functionality including users, projects, backends, repos, secrets, and gateways.

Expand All @@ -19,6 +20,8 @@ When the server provisions a cloud instance for a run, it launches a Docker imag

The shim may be or may not be present depending on which type of cloud is used. If it's a GPU cloud that provides an API for running Docker images, then no shim is required. If it's a traditional cloud that provisions VMs, then the shim is started on the VM launch. It pulls and runs the Docker image, controls its execution, and implements any cloud-specific functionality such as terminating the instance.

The gateway makes jobs available via a public URL. It works like a reverse proxy that forwards requests to the job instance via an SSH tunnel.

## Implementation of dstack run

When a user invokes `dstack run`, the CLI first sends the run configuration and other profile parameters to the server to get the run plan. The server iterates over configured backends to get all instance offers matching the requirements and their availability. If the user is willing to proceed with the offers suggested, the CLI uploads the code from the user's machine to the server and submits the run configuration.
Expand Down Expand Up @@ -77,4 +80,8 @@ The server is a FastAPI app backend by sqlite. The runner and shim are written i
* `_public` – the implementation of the high-level Python API
* `server` – the low-level Python API (a Python wrapper around server's HTTP API)
* `core/` – core Python API modules (e.g. `dstack` errors)
* `tests/`
* `tests/`
* `gateway/src/dstack/gateway` - source code for the gateway application
* `openai/` - OpenAI API proxy
* `registry/` - gateway services registry
* `systemd/` - systemd service files
68 changes: 68 additions & 0 deletions docs/docs/guides/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,74 @@ Once the service is deployed, its endpoint will be available at

[//]: # (TODO: Example)

### Enable OpenAI interface

To use your model via the OpenAI interface, you need to extend the configuration with `model`.

<div editor-title="mistral_openai.dstack.yml">

```yaml
type: service

image: ghcr.io/huggingface/text-generation-inference:1.3

env:
- MODEL_ID=TheBloke/Mistral-7B-OpenOrca-AWQ

port: 8000

commands:
- text-generation-launcher --hostname 0.0.0.0 --port 8000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

model:
type: chat
name: TheBloke/Mistral-7B-OpenOrca-AWQ
format: tgi
```

</div>

!!! info "Experimental feature"
OpenAI interface is an experimental feature.
Only TGI chat models are supported at the moment.
Streaming is not supported yet.

Run the configuration. Text Generation Inference requires a GPU with a compute capability above 8.0: e.g., L4 or A100.

<div class="termy">

```shell
$ dstack run . -f mistral_openai.dstack.yml --gpu L4
```

</div>

Once the service is deployed,
OpenAI interface will be available at `https://gateway.<domain-name>` for all deployed models in the project.

The example below shows how to use the model with `openai` library:

<div editor-title="mistral_complete.py">

```python
from openai import OpenAI

client = OpenAI(
base_url="https://gateway.<domain-name>",
api_key="none",
)
r = client.chat.completions.create(
model="TheBloke/Mistral-7B-OpenOrca-AWQ",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Three main ingredients of a burger in one sentence?"}
],
)
print(r)
```

</div>

What's next?

1. Check the [Text Generation Inference](../../learn/tgi.md) and [vLLM](../../learn/vllm.md) examples
Expand Down
29 changes: 29 additions & 0 deletions gateway/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# dstack gateway

## Purpose

* Make dstack services available to the outside world
* Manage SSL certificates
* Manage nginx configs
* Establish SSH tunnels from gateway to dstack runner
* Proxy OpenAI API requests to different formats (e.g. TGI)

## Development

1. Build the wheel:
```
python -m build .
```
2. Upload the wheel:
```shell
scp dist/dstack_gateway-0.0.0-py3-none-any.whl ubuntu@${GATEWAY}:/tmp/
```
3. Install the wheel:
```
ssh ubuntu@${GATEWAY} "pip install --force-reinstall /tmp/dstack_gateway-0.0.0-py3-none-any.whl"
```
4. Run the tunnel and the gateway:
```
ssh -L 9001:localhost:8000 -t ubuntu@${GATEWAY} "uvicorn dstack.gateway.main:app"
```
5. Visit the gateway docs page at http://localhost:9001/docs
24 changes: 24 additions & 0 deletions gateway/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "dstack-gateway"
authors = [
{ name = "dstack GmbH" },
]
requires-python = ">=3.10"
dynamic = ["version"]
dependencies = [
"fastapi",
"pydantic >=2.0.0",
"httpx",
"jinja2",
"uvicorn",
]

[tool.setuptools.package-data]
"dstack.gateway" = ["systemd/resources/*"]

[tool.setuptools.dynamic]
version = {attr = "dstack.gateway.version.__version__"}
Empty file added gateway/src/dstack/__init__.py
Empty file.
Empty file.
11 changes: 11 additions & 0 deletions gateway/src/dstack/gateway/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import asyncio
import functools
from typing import Callable, ParamSpec, TypeVar

R = TypeVar("R")
P = ParamSpec("P")


async def run_async(func: Callable[P, R], *args: P.args, **kwargs: P.kwargs) -> R:
func_with_args = functools.partial(func, *args, **kwargs)
return await asyncio.get_running_loop().run_in_executor(None, func_with_args)
28 changes: 28 additions & 0 deletions gateway/src/dstack/gateway/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from fastapi import HTTPException


class GatewayError(Exception):
def http(self, code: int = 400, **kwargs) -> HTTPException:
return HTTPException(
code,
{
"error": self.__class__.__name__,
"message": str(self),
**kwargs,
},
)


class SSHError(GatewayError):
pass


class NotFoundError(HTTPException):
def __init__(self, message: str = "Not found", **kwargs):
super().__init__(
404,
{
"message": message,
**kwargs,
},
)
11 changes: 11 additions & 0 deletions gateway/src/dstack/gateway/logging.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import logging


def configure_logging(level: int = logging.INFO):
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler = logging.StreamHandler()
handler.setFormatter(formatter)

logger = logging.getLogger("dstack.gateway")
logger.setLevel(level)
logger.addHandler(handler)
43 changes: 43 additions & 0 deletions gateway/src/dstack/gateway/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import logging
from contextlib import asynccontextmanager

import pydantic_core
from fastapi import FastAPI

import dstack.gateway.openai.store as openai_store
import dstack.gateway.version
from dstack.gateway.logging import configure_logging
from dstack.gateway.openai.routes import router as openai_router
from dstack.gateway.registry.routes import router as registry_router
from dstack.gateway.services.persistent import save_persistent_state
from dstack.gateway.services.store import get_store


@asynccontextmanager
async def lifespan(app: FastAPI):
store = get_store()
openai = openai_store.get_store()
await store.subscribe(openai)
yield

async with store._lock, store.nginx._lock, openai._lock:
# Store the state between restarts
save_persistent_state(
pydantic_core.to_json(
{
"store": store,
"openai": openai,
}
)
)


configure_logging(logging.DEBUG)
app = FastAPI(lifespan=lifespan)
app.include_router(registry_router, prefix="/api/registry")
app.include_router(openai_router, prefix="/api/openai")


@app.get("/")
def get_info():
return {"version": dstack.gateway.version.__version__}
Empty file.
18 changes: 18 additions & 0 deletions gateway/src/dstack/gateway/openai/clients/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from abc import ABC, abstractmethod
from typing import AsyncIterator

from dstack.gateway.openai.schemas import (
ChatCompletionsChunk,
ChatCompletionsRequest,
ChatCompletionsResponse,
)


class ChatCompletionsClient(ABC):
@abstractmethod
async def generate(self, request: ChatCompletionsRequest) -> ChatCompletionsResponse:
pass

@abstractmethod
async def stream(self, request: ChatCompletionsRequest) -> AsyncIterator[ChatCompletionsChunk]:
pass
Loading