Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/best_practices/PaddleOCR-VL-0.9B.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--gpu-memory-utilization 0.8 \
--max-num-seqs 128 \
--max-num-seqs 128
```
**Example 2:** Deploying a 16K Context Service on a Single A100 GPU
```shell
Expand All @@ -36,7 +36,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--gpu-memory-utilization 0.8 \
--max-num-seqs 256 \
--max-num-seqs 256
```

An example is a set of configurations that can run stably while also delivering relatively good performance. If you have further requirements for precision or performance, please continue reading the content below.
Expand Down
4 changes: 2 additions & 2 deletions docs/zh/best_practices/PaddleOCR-VL-0.9B.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--gpu-memory-utilization 0.8 \
--max-num-seqs 128 \
--max-num-seqs 128
```

**示例2:** A100上单卡部署16K上下文的服务
Expand All @@ -37,7 +37,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
--max-model-len 16384 \
--max-num-batched-tokens 16384 \
--gpu-memory-utilization 0.8 \
--max-num-seqs 256 \
--max-num-seqs 256
```

示例是可以稳定运行的一组配置,同时也能得到比较好的性能。
Expand Down
Loading