/`.
+
+
+
+```shell
+curl http://127.0.0.1:3000/proxy/services/main/deepseek-v4/v1/chat/completions \
+ -X POST \
+ -H 'Authorization: Bearer <dstack token>' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "model": "deepseek-ai/DeepSeek-V4-Pro",
+ "messages": [
+ {
+ "role": "user",
+ "content": "What is 15% of 240? Reply with just the number."
+ }
+ ],
+ "temperature": 0,
+ "max_tokens": 32
+ }'
+```
+
+
+
+## Reasoning mode
+
+To separate the model's reasoning into `reasoning_content`, keep
+`--reasoning-parser deepseek-v4` in the server command and send
+`chat_template_kwargs` in the request body.
+
+For raw HTTP requests, `chat_template_kwargs` and `separate_reasoning` must be
+top-level JSON fields.
+
+
+
+```shell
+curl http://127.0.0.1:3000/proxy/services/main/deepseek-v4/v1/chat/completions \
+ -X POST \
+ -H 'Authorization: Bearer <dstack token>' \
+ -H 'Content-Type: application/json' \
+ -d '{
+ "model": "deepseek-ai/DeepSeek-V4-Pro",
+ "messages": [
+ {
+ "role": "user",
+ "content": "Solve step by step: If 3x + 5 = 20, what is x?"
+ }
+ ],
+ "temperature": 0,
+ "max_tokens": 256,
+ "chat_template_kwargs": {
+ "thinking": true
+ },
+ "separate_reasoning": true
+ }'
+```
+
+
+
+This returns both:
+
+- `reasoning_content`: a separate reasoning trace
+- `content`: the final user-visible answer
+
+## Deployment notes
+
+- The first startup can take several minutes while the model loads and SGLang
+ finishes initialization.
+- The optional `/root/.cache` instance volume helps reuse the model cache on
+ backends that support instance volumes.
+
+## What's next?
+
+1. Read the [DeepSeek-V4-Pro model card](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro)
+2. Read the [DeepSeek-V4 SGLang cookbook](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4)
+3. Browse the dedicated [SGLang](https://dstack.ai/examples/inference/sglang/) and [vLLM](https://dstack.ai/examples/inference/vllm/) examples
diff --git a/mkdocs.yml b/mkdocs.yml
index 1baa53015..1b75f0ebe 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -306,12 +306,13 @@ nav:
- vLLM: examples/inference/vllm/index.md
- NIM: examples/inference/nim/index.md
- TensorRT-LLM: examples/inference/trtllm/index.md
+ - Models:
+ - DeepSeek V4: examples/models/deepseek-v4/index.md
+ - Qwen 3.6: examples/models/qwen36/index.md
- Accelerators:
- AMD: examples/accelerators/amd/index.md
- TPU: examples/accelerators/tpu/index.md
- Tenstorrent: examples/accelerators/tenstorrent/index.md
- - Models:
- - Qwen 3.6: examples/models/qwen36/index.md
- Blog:
- blog/index.md
- Case studies: blog/case-studies.md