CloudSigma's sovereign cloud LLM inference platform. OpenAI-compatible API with local data residency.
Clients (OpenAI SDK compatible)
│
Caddy (TLS + API key auth + logging)
│
├── /v1/chat/completions
├── /v1/completions
├── /v1/models
└── /v1/embeddings
│
vLLM Backends (GPU inference)
├── DeepSeek V3 685B (4×B200, TP=4) — port 8001
├── Qwen 2.5 72B (1×B200) — port 8002
├── Qwen2.5-Coder 32B (1×B200) — port 8003
└── BGE-M3 Embeddings (CPU) — port 8004
# Install
pip install vllm
# Start a model
vllm serve deepseek-ai/DeepSeek-V3 \
--tensor-parallel-size 4 \
--port 8001
# Use it
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-ai/DeepSeek-V3","messages":[{"role":"user","content":"Hello"}]}'deploy/— Deployment scripts and systemd servicesgateway/— Caddy configuration and API key middlewareadmin/— Admin CLI for key managementdocs/— Architecture and operational docs
Phase 1 MVP — In Development