Skip to content

[PP] feat: optimize PP performance through throttling#1447

Merged
AlpinDale merged 3 commits into
mainfrom
pp-optimizations
Aug 29, 2025
Merged

[PP] feat: optimize PP performance through throttling#1447
AlpinDale merged 3 commits into
mainfrom
pp-optimizations

Conversation

@AlpinDale

@AlpinDale AlpinDale commented Aug 29, 2025

Copy link
Copy Markdown
Member

Qwen3-30B-A3B, 2x A100, PP=2

Before:

============ Serving Benchmark Result ============
Successful requests:                     2048      
Benchmark duration (s):                  268.78    
Total input tokens:                      1046373   
Total generated tokens:                  1044480   
Request throughput (req/s):              7.62      
Output token throughput (tok/s):         3885.99   
Total Token throughput (tok/s):          7779.01   
---------------Time to First Token----------------
Mean TTFT (ms):                          118972.89 
Median TTFT (ms):                        103316.73 
P99 TTFT (ms):                           241757.32 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          64.21     
Median TPOT (ms):                        66.32     
P99 TPOT (ms):                           68.40     
---------------Inter-token Latency----------------
Mean ITL (ms):                           64.18     
Median ITL (ms):                         56.77     
P99 ITL (ms):                            188.79    
==================================================

After:

============ Serving Benchmark Result ============
Successful requests:                     2048      
Benchmark duration (s):                  251.71    
Total input tokens:                      1046373   
Total generated tokens:                  1042887   
Request throughput (req/s):              8.14      
Output token throughput (tok/s):         4143.20   
Total Token throughput (tok/s):          8300.26   
---------------Time to First Token----------------
Mean TTFT (ms):                          110023.63 
Median TTFT (ms):                        97427.31  
P99 TTFT (ms):                           223116.10 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          59.75     
Median TPOT (ms):                        60.11     
P99 TPOT (ms):                           63.72     
---------------Inter-token Latency----------------
Mean ITL (ms):                           59.72     
Median ITL (ms):                         50.97     
P99 ITL (ms):                            173.29    
==================================================

@AlpinDale AlpinDale merged commit 5353f8f into main Aug 29, 2025
0 of 4 checks passed
@AlpinDale AlpinDale deleted the pp-optimizations branch August 29, 2025 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant