Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/quick-start-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ To start the server, you can run a command like the following example inside a D
trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
```

You may also deploy pre-quantized models to improve performance.
Ensure your GPU supports FP8 quantization before running the following:

```bash
trtllm-serve "nvidia/Qwen3-8B-FP8"
```

```{note}
If you are running trtllm-server inside a Docker container, you have two options for sending API requests:
1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.
Expand Down
Loading