Add prefill only benchmark for different token length #68

FanhaiLu1 · 2024-05-03T03:50:50Z

Context:

For disaggregation serving, I would like to know what are max performance for prefill only. Below are example of questions:

How many prefill we can do per seconds
What are the bottleneck
what the percentage real performance vs in theory one
If compute bound, how could we improve it

Benefits:

This PR collect latency metrics of prefill for different token size (from 16 to 32768), it help me to explore above questions and also help other engineer who would like do similar analysis in future.

Results for llama2 7B:

---- execute First Token: 737
---- execute time: 0.0057392120361328125 for token_len: 16

---- execute First Token: 13
---- execute time: 0.005574226379394531 for token_len: 32

---- execute First Token: 3634
---- execute time: 0.006200551986694336 for token_len: 64

---- execute First Token: 29874
---- execute time: 0.007735252380371094 for token_len: 128

---- execute First Token: 29871
---- execute time: 0.011573553085327148 for token_len: 256

---- execute First Token: 29964
---- execute time: 0.020201683044433594 for token_len: 512

---- execute First Token: 414
---- execute time: 0.038352012634277344 for token_len: 1024

---- execute First Token: 1319
---- execute time: 0.07815766334533691 for token_len: 2048

---- execute First Token: 1068
---- execute time: 0.16911959648132324 for token_len: 4096

---- execute First Token: 313
---- execute time: 0.4015989303588867 for token_len: 8192

---- execute First Token: 404
---- execute time: 0.9996061325073242 for token_len: 16384

---- execute First Token: 5519
---- execute time: 2.7764148712158203 for token_len: 32768

add prefill only benchmark for different token length

b25c6e8

FanhaiLu1 requested review from qihqi and wang2yn84 May 3, 2024 03:51

qihqi approved these changes May 3, 2024

View reviewed changes

qihqi merged commit 9606a1f into AI-Hypercomputer:main May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prefill only benchmark for different token length #68

Add prefill only benchmark for different token length #68

Uh oh!

FanhaiLu1 commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add prefill only benchmark for different token length #68

Add prefill only benchmark for different token length #68

Uh oh!

Conversation

FanhaiLu1 commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants