Bug report: The tool doesn't handle 503 backend error properly

See stacktrace below. When there was a 503 error from the backend, the tool didn't handle that properly, and failed to save the results.

**Expected behavior**: The tool should handle the error gracefully, record the error count, and save the results.

```
+ python3 benchmark_serving.py --save-json-results --host=envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local --port=8081 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/ │
│ Llama-2-7b-hf --request-rate=80.0 --backend=vllm --num-prompts=4800 --max-input-length=1024 --max-output-length=2048 --file-prefix=benchmark-catalog --models=meta-llama/Llama-2-7b-hf --pm-namespace=epp-before-metrics-run2 --pm-job=model-serv │
│ er-monitoring --scrape-server-metrics                                                                                                                                                                                                             │
│ Traceback (most recent call last):                                                                                                                                                                                                                │
│   File "/workspace/benchmark_serving.py", line 1067, in <module>                                                                                                                                                                                  │
│     asyncio.run(main(cmd_args))                                                                                                                                                                                                                   │
│   File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run                                                                                                                                                                             │
│     return loop.run_until_complete(main)                                                                                                                                                                                                          │
│   File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete                                                                                                                                                         │
│     return future.result()                                                                                                                                                                                                                        │
│   File "/workspace/benchmark_serving.py", line 873, in main                                                                                                                                                                                       │
│     await benchmark(args, api_url, tokenizer,models, args.traffic_split)                                                                                                                                                                          │
│   File "/workspace/benchmark_serving.py", line 479, in benchmark                                                                                                                                                                                  │
│     prompt_len, output_len, request_latency = latency                                                                                                                                                                                             │
│ TypeError: cannot unpack non-iterable NoneType object                                                                                                                                                                                             │
│ + cat latency-profile-2025-03-12_16-49-50.txt                                                                                                                                                                                                     │
│ Namespace(backend='vllm', sax_model='', file_prefix='benchmark-catalog', endpoint='generate', host='envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local', port=8081, dataset='ShareGPT_V3_unfiltered_ │
│ cleaned_split.json', models='meta-llama/Llama-2-7b-hf', traffic_split=None, stream_request=False, request_timeout=10800.0, tokenizer='meta-llama/Llama-2-7b-hf', best_of=1, use_beam_search=False, num_prompts=4800, max_input_length=1024, max_o │
│ utput_length=2048, top_k=32000, request_rate=80.0, seed=1741798193, trust_remote_code=False, machine_cost=None, use_dummy_text=False, save_json_results=True, output_bucket=None, output_bucket_filepath=None, save_aggregated_result=False, addi │
│ tional_metadata_metrics_to_save=None, scrape_server_metrics=True, pm_namespace='epp-before-metrics-run2', pm_job='model-server-monitoring')                                                                                                       │
│ Models to benchmark: ['meta-llama/Llama-2-7b-hf']                                                                                                                                                                                                 │
│ No traffic split specified. Defaulting to uniform traffic split.                                                                                                                                                                                  │
│ Starting Prometheus Server on port 9090                                                                                                                                                                                                           │
│ ContentTypeError: 503, message='Attempt to decode JSON with unexpected mimetype: text/plain', url='http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions', response: <Client │
│ Response(http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions) [503 Service Unavailable]>                                                                                   │
│ <CIMultiDictProxy('Content-Length': '95', 'Content-Type': 'text/plain', 'Date': 'Wed, 12 Mar 2025 16:51:07 GMT')>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug report: The tool doesn't handle 503 backend error properly #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug report: The tool doesn't handle 503 backend error properly #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions