Add prefill only benchmark for different token length #68
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context:
For disaggregation serving, I would like to know what are max performance for prefill only. Below are example of questions:
Benefits:
This PR collect latency metrics of prefill for different token size (from 16 to 32768), it help me to explore above questions and also help other engineer who would like do similar analysis in future.
Results for llama2 7B:
---- execute First Token: 737
---- execute time: 0.0057392120361328125 for token_len: 16
---- execute First Token: 13
---- execute time: 0.005574226379394531 for token_len: 32
---- execute First Token: 3634
---- execute time: 0.006200551986694336 for token_len: 64
---- execute First Token: 29874
---- execute time: 0.007735252380371094 for token_len: 128
---- execute First Token: 29871
---- execute time: 0.011573553085327148 for token_len: 256
---- execute First Token: 29964
---- execute time: 0.020201683044433594 for token_len: 512
---- execute First Token: 414
---- execute time: 0.038352012634277344 for token_len: 1024
---- execute First Token: 1319
---- execute time: 0.07815766334533691 for token_len: 2048
---- execute First Token: 1068
---- execute time: 0.16911959648132324 for token_len: 4096
---- execute First Token: 313
---- execute time: 0.4015989303588867 for token_len: 8192
---- execute First Token: 404
---- execute time: 0.9996061325073242 for token_len: 16384
---- execute First Token: 5519
---- execute time: 2.7764148712158203 for token_len: 32768