-
Notifications
You must be signed in to change notification settings - Fork 13
Closed
Description
See stacktrace below. When there was a 503 error from the backend, the tool didn't handle that properly, and failed to save the results.
Expected behavior: The tool should handle the error gracefully, record the error count, and save the results.
+ python3 benchmark_serving.py --save-json-results --host=envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local --port=8081 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/ │
│ Llama-2-7b-hf --request-rate=80.0 --backend=vllm --num-prompts=4800 --max-input-length=1024 --max-output-length=2048 --file-prefix=benchmark-catalog --models=meta-llama/Llama-2-7b-hf --pm-namespace=epp-before-metrics-run2 --pm-job=model-serv │
│ er-monitoring --scrape-server-metrics │
│ Traceback (most recent call last): │
│ File "/workspace/benchmark_serving.py", line 1067, in <module> │
│ asyncio.run(main(cmd_args)) │
│ File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run │
│ return loop.run_until_complete(main) │
│ File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete │
│ return future.result() │
│ File "/workspace/benchmark_serving.py", line 873, in main │
│ await benchmark(args, api_url, tokenizer,models, args.traffic_split) │
│ File "/workspace/benchmark_serving.py", line 479, in benchmark │
│ prompt_len, output_len, request_latency = latency │
│ TypeError: cannot unpack non-iterable NoneType object │
│ + cat latency-profile-2025-03-12_16-49-50.txt │
│ Namespace(backend='vllm', sax_model='', file_prefix='benchmark-catalog', endpoint='generate', host='envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local', port=8081, dataset='ShareGPT_V3_unfiltered_ │
│ cleaned_split.json', models='meta-llama/Llama-2-7b-hf', traffic_split=None, stream_request=False, request_timeout=10800.0, tokenizer='meta-llama/Llama-2-7b-hf', best_of=1, use_beam_search=False, num_prompts=4800, max_input_length=1024, max_o │
│ utput_length=2048, top_k=32000, request_rate=80.0, seed=1741798193, trust_remote_code=False, machine_cost=None, use_dummy_text=False, save_json_results=True, output_bucket=None, output_bucket_filepath=None, save_aggregated_result=False, addi │
│ tional_metadata_metrics_to_save=None, scrape_server_metrics=True, pm_namespace='epp-before-metrics-run2', pm_job='model-server-monitoring') │
│ Models to benchmark: ['meta-llama/Llama-2-7b-hf'] │
│ No traffic split specified. Defaulting to uniform traffic split. │
│ Starting Prometheus Server on port 9090 │
│ ContentTypeError: 503, message='Attempt to decode JSON with unexpected mimetype: text/plain', url='http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions', response: <Client │
│ Response(http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions) [503 Service Unavailable]> │
│ <CIMultiDictProxy('Content-Length': '95', 'Content-Type': 'text/plain', 'Date': 'Wed, 12 Mar 2025 16:51:07 GMT')>
Metadata
Metadata
Assignees
Labels
No labels