Public deployment examples for serving large language models with OpenAI-compatible APIs.
Each model lives in its own directory with a self-contained Compose file, environment template, README, and optional benchmark scripts.
.
├── Qwen3.6-35B-A3B-FP8/
│ ├── .env.example
│ ├── README.md
│ ├── bench.py
│ └── docker-compose.yml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
└── SECURITY.md
| Model | Runtime | API | Directory | Status |
|---|---|---|---|---|
| Qwen3.6-35B-A3B-FP8 | vLLM v0.20.0 |
OpenAI-compatible | Qwen3.6-35B-A3B-FP8/ |
Available |
Every model directory should be usable on its own and follow this shape:
<model-name>/
├── .env.example # Local path and runtime placeholders
├── README.md # Model-specific setup and notes
├── docker-compose.yml # Serving configuration
└── bench.py # Optional local benchmark script
Model directories may include additional files when a runtime needs them, but public examples should keep private environment details out of version control.
- Docker Compose service definitions
- OpenAI-compatible API smoke tests
- Runtime-specific configuration notes
- Optional benchmark scripts for TTFT and throughput checks
.env.examplefiles with placeholder paths- Generic, sanitized prompts and examples
- Model weights
- Private deployment paths
- Internal IP addresses or hostnames
- Credentials, tokens, or API keys
- Production monitoring configuration
Choose a model directory, copy its environment template, set MODEL_PATH, and
start the service:
cd Qwen3.6-35B-A3B-FP8
cp .env.example .env
docker compose up -dCheck the local service:
curl -fsS http://localhost:8000/health
curl -s http://localhost:8000/v1/models | python3 -m json.toolStop the service:
docker compose downAdd a new top-level directory named after the model or deployment target:
<model-name>/
├── .env.example
├── README.md
└── docker-compose.yml
The model README should document:
- Model name and upstream model card
- Runtime image and version
- Required hardware assumptions
- Exposed API and model aliases
- Required environment variables
- Start, stop, health check, and smoke test commands
- Any parser, tool-calling, multimodal, or quantization settings
Before publishing, scan the new files for private infrastructure details, credentials, production paths, logs, or domain-specific sensitive examples.
For the current Qwen3.6 example:
python3 -m py_compile Qwen3.6-35B-A3B-FP8/bench.py
docker compose -f Qwen3.6-35B-A3B-FP8/docker-compose.yml configThis repository is released under the MIT License. See LICENSE.