PaddlePaddle · ming1753 · Nov 3, 2025 · Oct 30, 2025
diff --git a/docs/best_practices/PaddleOCR-VL-0.9B.md b/docs/best_practices/PaddleOCR-VL-0.9B.md
@@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
     --max-model-len 16384 \
     --max-num-batched-tokens 16384 \
     --gpu-memory-utilization 0.8 \
-    --max-num-seqs 128 \
+    --max-num-seqs 128
 ```
 **Example 2:** Deploying a 16K Context Service on a Single A100 GPU
 ```shell
@@ -36,7 +36,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
     --max-model-len 16384 \
     --max-num-batched-tokens 16384 \
     --gpu-memory-utilization 0.8 \
-    --max-num-seqs 256 \
+    --max-num-seqs 256
 ```
 
 An example is a set of configurations that can run stably while also delivering relatively good performance. If you have further requirements for precision or performance, please continue reading the content below.

diff --git a/docs/zh/best_practices/PaddleOCR-VL-0.9B.md b/docs/zh/best_practices/PaddleOCR-VL-0.9B.md
@@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
     --max-model-len 16384 \
     --max-num-batched-tokens 16384 \
     --gpu-memory-utilization 0.8 \
-    --max-num-seqs 128 \
+    --max-num-seqs 128
 ```
 
  **示例2：** A100上单卡部署16K上下文的服务
@@ -37,7 +37,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
     --max-model-len 16384 \
     --max-num-batched-tokens 16384 \
     --gpu-memory-utilization 0.8 \
-    --max-num-seqs 256 \
+    --max-num-seqs 256
 ```
 
 示例是可以稳定运行的一组配置，同时也能得到比较好的性能。