Skip to content
Merged
48 changes: 13 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# LMStack
<p align="center">
<img src="docs/LMStack-light.png" alt="LMStack" height="80">
</p>

[中文文檔](README_zh-TW.md)
<p align="center">
<a href="README_zh-TW.md">中文文檔</a>
</p>

LLM Deployment Management Platform - Deploy and manage Large Language Models on distributed GPU workers.

Expand Down Expand Up @@ -48,29 +52,14 @@ docker compose -f docker-compose.deploy.yml up -d
- Frontend: http://localhost:3000
- Backend API: http://localhost:52000

### Start Worker (on GPU machine)

```bash
docker run -d \
--name lmstack-worker \
--gpus all \
--privileged \
-p 52001:52001 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v /:/host:ro \
-e BACKEND_URL=http://YOUR_SERVER_IP:52000 \
-e WORKER_NAME=gpu-worker-01 \
infinirc/lmstack-worker:latest
```

### Usage

1. Login with `admin` / `admin` (change password after first login)
2. Check **Workers** page - workers auto-register
3. Add model in **Models** page
4. Create deployment in **Deployments** page
5. Use OpenAI-compatible API:
2. Go to **Workers** page and click **Add Worker** to get the Docker command
3. Run the Docker command on your GPU machine to register a worker
4. Add model in **Models** page
5. Create deployment in **Deployments** page
6. Use OpenAI-compatible API:

```bash
curl http://localhost:52000/v1/chat/completions \
Expand All @@ -96,21 +85,10 @@ Build and run Docker images locally:

# Run locally built backend + frontend
docker compose -f docker-compose.local.yml up -d

# Run locally built worker (on GPU machine)
docker run -d \
--name lmstack-worker \
--gpus all \
--privileged \
-p 52001:52001 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v /:/host:ro \
-e BACKEND_URL=http://YOUR_SERVER_IP:52000 \
-e WORKER_NAME=gpu-worker-01 \
infinirc/lmstack-worker:local
```

Then go to **Workers** page in the UI to add a worker.

### Without Docker

```bash
Expand Down
52 changes: 15 additions & 37 deletions README_zh-TW.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# LMStack
<p align="center">
<img src="docs/LMStack-light.png" alt="LMStack" height="80">
</p>

[English](README.md)
<p align="center">
<a href="README.md">English</a>
</p>

LLM 部署管理平台 - 在分散式 GPU 節點上部署和管理大型語言模型。

Expand All @@ -23,8 +27,8 @@ LLM 部署管理平台 - 在分散式 GPU 節點上部署和管理大型語言
┌────────────┴────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Worker Agent │ │ Worker Agent
│ (GPU 節點) │ │ (GPU 節點) │
│ Worker │ │ Worker
│ (GPU 節點) │ │ (GPU 節點) │
└──────────────┘ └──────────────┘
```

Expand All @@ -48,29 +52,14 @@ docker compose -f docker-compose.deploy.yml up -d
- 前端: http://localhost:3000
- 後端 API: http://localhost:52000

### 啟動 Worker(在 GPU 機器上)

```bash
docker run -d \
--name lmstack-worker \
--gpus all \
--privileged \
-p 52001:52001 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v /:/host:ro \
-e BACKEND_URL=http://你的伺服器IP:52000 \
-e WORKER_NAME=gpu-worker-01 \
infinirc/lmstack-worker:latest
```

### 使用方式

1. 使用 `admin` / `admin` 登入(首次登入後請更改密碼)
2. 查看 **Workers** 頁面 - Workers 會自動註冊
3. 在 **Models** 頁面新增模型
4. 在 **Deployments** 頁面建立部署
5. 使用 OpenAI 相容 API:
2. 前往 **Workers** 頁面,點擊 **Add Worker** 取得 Docker 指令
3. 在 GPU 機器上執行該 Docker 指令以註冊 Worker
4. 在 **Models** 頁面新增模型
5. 在 **Deployments** 頁面建立部署
6. 使用 OpenAI 相容 API:

```bash
curl http://localhost:52000/v1/chat/completions \
Expand All @@ -96,21 +85,10 @@ curl http://localhost:52000/v1/chat/completions \

# 運行本地構建的 backend + frontend
docker compose -f docker-compose.local.yml up -d

# 運行本地構建的 worker(在 GPU 機器上)
docker run -d \
--name lmstack-worker \
--gpus all \
--privileged \
-p 52001:52001 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v /:/host:ro \
-e BACKEND_URL=http://你的伺服器IP:52000 \
-e WORKER_NAME=gpu-worker-01 \
infinirc/lmstack-worker:local
```

然後前往 UI 中的 **Workers** 頁面新增 Worker。

### 不使用 Docker

```bash
Expand Down
6 changes: 6 additions & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ FROM python:3.11-slim

WORKDIR /app

# Install docker CLI for local worker spawn feature
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& curl -fsSL https://download.docker.com/linux/static/stable/x86_64/docker-24.0.7.tgz | tar xz --strip-components=1 -C /usr/local/bin docker/docker \
&& rm -rf /var/lib/apt/lists/*

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
Expand Down
37 changes: 36 additions & 1 deletion backend/app/api/apps/deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,47 @@ async def pull_image_with_progress(
Exception: On pull failure
"""
url = f"http://{worker.address}/images/pull"
progress_url = f"http://{worker.address}/images/pull-progress/{app_id}"

set_deployment_progress(app_id, "pulling", 0, f"Pulling image {image}...")

try:
async with httpx.AsyncClient(timeout=IMAGE_PULL_TIMEOUT) as client:
response = await client.post(url, json={"image": image})
# Start the pull request in a task with app_id for progress tracking
pull_task = asyncio.create_task(
client.post(url, json={"image": image, "app_id": app_id})
)

# Poll for progress while waiting
while not pull_task.done():
try:
progress_resp = await client.get(progress_url, timeout=5.0)
if progress_resp.status_code == 200:
progress_data = progress_resp.json()
status = progress_data.get("status", "")
progress = progress_data.get("progress", 0)

if status == "pulling":
set_deployment_progress(
app_id,
"pulling",
progress,
f"Pulling image {image}... ({progress}%)",
)
elif status == "completed":
set_deployment_progress(
app_id,
"pulling",
100,
"Image pulled successfully",
)
except Exception:
pass # Progress polling is best-effort

await asyncio.sleep(2)

# Get the final response
response = await pull_task
if response.status_code >= 400:
raise Exception(f"Failed to pull image: {response.text}")

Expand Down
13 changes: 11 additions & 2 deletions backend/app/api/apps/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,15 @@ async def deploy_app(
proxy_path = f"/apps/{app_type.value}"
port = await _find_available_port(db, worker.id)

# Auto-disable proxy for localhost workers to avoid port conflicts
# When worker is on localhost (using --network host), app container binds
# directly to host port, so proxy would conflict
worker_host = worker.address.split(":")[0]
use_proxy = deploy_request.use_proxy
if worker_host in ("localhost", "127.0.0.1"):
use_proxy = False
logger.info(f"Auto-disabled proxy for localhost worker {worker.name}")

# Create app record
app = App(
app_type=app_type.value,
Expand All @@ -188,7 +197,7 @@ async def deploy_app(
status=AppStatus.PENDING.value,
proxy_path=proxy_path,
port=port,
use_proxy=deploy_request.use_proxy,
use_proxy=use_proxy,
)
db.add(app)
await db.commit()
Expand Down Expand Up @@ -222,7 +231,7 @@ async def deploy_app(
port=port,
app_def=app_def,
lmstack_port=lmstack_port,
use_proxy=deploy_request.use_proxy,
use_proxy=use_proxy,
)

return app_to_response(app, request)
Expand Down
10 changes: 9 additions & 1 deletion backend/app/api/headscale.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from pydantic import BaseModel, Field

from app.api.auth import require_admin
from app.services.headscale_manager import LMSTACK_USER, get_headscale_manager
from app.services.headscale_manager import LMSTACK_USER, get_headscale_manager, get_startup_progress

logger = logging.getLogger(__name__)
router = APIRouter()
Expand Down Expand Up @@ -94,6 +94,14 @@ async def get_headscale_status(
return HeadscaleStatusResponse(enabled=False, running=False)


@router.get("/progress")
async def get_headscale_progress(
_: dict = Depends(require_admin),
):
"""Get Headscale startup progress."""
return get_startup_progress()


@router.post("/start", response_model=HeadscaleStatusResponse)
async def start_headscale(
request: Request,
Expand Down
Loading
Loading