Fix duplicate flag addition and make sleep time configurable #20

Bslabe123 · 2025-03-27T20:52:01Z

Sleep time is now configurable via the SLEEP_TIME env var

achandrasekar · 2025-03-27T21:10:20Z

Looks good Brendan! Is there a small example run with multiple request rates to confirm?

Bslabe123 · 2025-03-27T21:22:28Z

Relevant logs

+ python3 benchmark_serving.py --save-json-results --host=vllm-inference-server --port=8000 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/Llama-2-7b-hf --backend=vllm --max-input-length=256 --max-output-length=256 --file-prefix=benchmark --models=meta-llama/Llama-2-7b-hf --pm-namespace= --pm-job= --scrape-server-metrics --save-aggregated-result --request-rate=600 --num-prompts=1002
Finding Mean vllm:cpu_cache_usage_perc with the following query: avg_over_time(vllm:cpu_cache_usage_perc{job='',namespace=''}[158s])
Got response from metrics server: {'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
Cloud Monitoring PromQL Error: {'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
Finding Mean vllm:cpu_cache_usage_perc with the following query: avg_over_time(vllm:cpu_cache_usage_perc{job='',namespace=''}[158s])
Got response from metrics server: {'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
Cloud Monitoring PromQL Error: {'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
+ cat latency-profile-2025-03-27_21-31-41.txt
Namespace(backend='vllm', sax_model='', file_prefix='benchmark', endpoint='generate', host='vllm-inference-server', port=8000, dataset='ShareGPT_V3_unfiltered_cleaned_split.json', models='meta-llama/Llama-2-7b-hf', traffic_split=None, stream_request=False, request_timeout=10800.0, tokenizer='meta-llama/Llama-2-7b-hf', best_of=1, use_beam_search=False, num_prompts=1002, max_input_length=256, max_output_length=256, top_k=32000, request_rate=600.0, seed=1743111104, trust_remote_code=False, machine_cost=None, use_dummy_text=False, save_json_results=True, output_bucket=None, output_bucket_filepath=None, save_aggregated_result=True, additional_metadata_metrics_to_save=None, scrape_server_metrics=True, pm_namespace='', pm_job='')
Models to benchmark: ['meta-llama/Llama-2-7b-hf']
No traffic split specified. Defaulting to uniform traffic split.
Starting Prometheus Server on port 9090
send all requests
====Result for Model: weighted====
Errors: {'ClientConnectorError': 0, 'TimeoutError': 0, 'ContentTypeError': 0, 'ClientOSError': 0, 'ServerDisconnectedError': 0, 'unknown_error': 0}
Total time: 158.22 s
Successful/total requests: 1002/1002
Requests/min: 379.99
Output_tokens/min: 41124.43
Input_tokens/min: 25982.92
Tokens/min: 67107.35
Average seconds/token (includes waiting time on server): 0.61
Average milliseconds/request (includes waiting time on server): 75679.43
Average milliseconds/output_token (includes waiting time on server): 1453.33
Average input length: 68.38
Average output length: 108.23
====Result for Model: meta-llama/Llama-2-7b-hf====
Errors: {'ClientConnectorError': 0, 'TimeoutError': 0, 'ContentTypeError': 0, 'ClientOSError': 0, 'ServerDisconnectedError': 0, 'unknown_error': 0}
Total time: 158.22 s
Successful/total requests: 1002/1002
Requests/min: 379.99
Output_tokens/min: 41124.43
Input_tokens/min: 25982.92
Tokens/min: 67107.35
Average seconds/token (includes waiting time on server): 0.61
Average milliseconds/request (includes waiting time on server): 75679.43
Average milliseconds/output_token (includes waiting time on server): 1453.33
Average input length: 68.38
Average output length: 108.23
+ echo 'Sleeping for 10 seconds...'
Sleeping for 10 seconds...
+ sleep 10
+ for request_rate in $(echo $REQUEST_RATES | tr ',' ' ')
+ echo 'Benchmarking request rate: 650'
Benchmarking request rate: 650
++ date +%Y-%m-%d_%H-%M-%S
+ timestamp=2025-03-27_21-35-02
+ output_file=latency-profile-2025-03-27_21-35-02.txt
+ '[' 650 == 0 ']'
++ awk 'BEGIN {print int(650 * 1.67)}'
+ num_prompts=1085
+ echo 'TOTAL prompts: 1085'
TOTAL prompts: 1085
+ PYTHON_OPTS=("${BASE_PYTHON_OPTS[@]}" "--request-rate=$request_rate" "--num-prompts=$num_prompts")
+ python3 benchmark_serving.py --save-json-results --host=vllm-inference-server --port=8000 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/Llama-2-7b-hf --backend=vllm --max-input-length=256 --max-output-length=256 --file-prefix=benchmark --models=meta-llama/Llama-2-7b-hf --pm-namespace= --pm-job= --scrape-server-metrics --save-aggregated-result --request-rate=650 --num-prompts=1085

Bslabe123 added 2 commits March 27, 2025 20:46

first commit

40c47e8

dont remove licence

ec19c61

Bslabe123 requested a review from achandrasekar March 27, 2025 20:52

revert todo

d606e2c

achandrasekar approved these changes Mar 27, 2025

View reviewed changes

achandrasekar merged commit 1330e4d into main Mar 27, 2025
1 check passed

achandrasekar mentioned this pull request Mar 28, 2025

Fix PYTHON_OPTS append in shell script #23

Closed

jjk-g mentioned this pull request Mar 28, 2025

Shell script appends arguments #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix duplicate flag addition and make sleep time configurable #20

Fix duplicate flag addition and make sleep time configurable #20

Uh oh!

Bslabe123 commented Mar 27, 2025 •

edited

Loading

Uh oh!

achandrasekar commented Mar 27, 2025

Uh oh!

Bslabe123 commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix duplicate flag addition and make sleep time configurable #20

Fix duplicate flag addition and make sleep time configurable #20

Uh oh!

Conversation

Bslabe123 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achandrasekar commented Mar 27, 2025

Uh oh!

Bslabe123 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bslabe123 commented Mar 27, 2025 •

edited

Loading

Bslabe123 commented Mar 27, 2025 •

edited

Loading