Skip to content

Bug report: The tool doesn't handle 503 backend error properly #11

@liu-cong

Description

@liu-cong

See stacktrace below. When there was a 503 error from the backend, the tool didn't handle that properly, and failed to save the results.

Expected behavior: The tool should handle the error gracefully, record the error count, and save the results.

+ python3 benchmark_serving.py --save-json-results --host=envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local --port=8081 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/ │
│ Llama-2-7b-hf --request-rate=80.0 --backend=vllm --num-prompts=4800 --max-input-length=1024 --max-output-length=2048 --file-prefix=benchmark-catalog --models=meta-llama/Llama-2-7b-hf --pm-namespace=epp-before-metrics-run2 --pm-job=model-serv │
│ er-monitoring --scrape-server-metrics                                                                                                                                                                                                             │
│ Traceback (most recent call last):                                                                                                                                                                                                                │
│   File "/workspace/benchmark_serving.py", line 1067, in <module>                                                                                                                                                                                  │
│     asyncio.run(main(cmd_args))                                                                                                                                                                                                                   │
│   File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run                                                                                                                                                                             │
│     return loop.run_until_complete(main)                                                                                                                                                                                                          │
│   File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete                                                                                                                                                         │
│     return future.result()                                                                                                                                                                                                                        │
│   File "/workspace/benchmark_serving.py", line 873, in main                                                                                                                                                                                       │
│     await benchmark(args, api_url, tokenizer,models, args.traffic_split)                                                                                                                                                                          │
│   File "/workspace/benchmark_serving.py", line 479, in benchmark                                                                                                                                                                                  │
│     prompt_len, output_len, request_latency = latency                                                                                                                                                                                             │
│ TypeError: cannot unpack non-iterable NoneType object                                                                                                                                                                                             │
│ + cat latency-profile-2025-03-12_16-49-50.txt                                                                                                                                                                                                     │
│ Namespace(backend='vllm', sax_model='', file_prefix='benchmark-catalog', endpoint='generate', host='envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local', port=8081, dataset='ShareGPT_V3_unfiltered_ │
│ cleaned_split.json', models='meta-llama/Llama-2-7b-hf', traffic_split=None, stream_request=False, request_timeout=10800.0, tokenizer='meta-llama/Llama-2-7b-hf', best_of=1, use_beam_search=False, num_prompts=4800, max_input_length=1024, max_o │
│ utput_length=2048, top_k=32000, request_rate=80.0, seed=1741798193, trust_remote_code=False, machine_cost=None, use_dummy_text=False, save_json_results=True, output_bucket=None, output_bucket_filepath=None, save_aggregated_result=False, addi │
│ tional_metadata_metrics_to_save=None, scrape_server_metrics=True, pm_namespace='epp-before-metrics-run2', pm_job='model-server-monitoring')                                                                                                       │
│ Models to benchmark: ['meta-llama/Llama-2-7b-hf']                                                                                                                                                                                                 │
│ No traffic split specified. Defaulting to uniform traffic split.                                                                                                                                                                                  │
│ Starting Prometheus Server on port 9090                                                                                                                                                                                                           │
│ ContentTypeError: 503, message='Attempt to decode JSON with unexpected mimetype: text/plain', url='http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions', response: <Client │
│ Response(http://envoy-epp-before-metrics-run2-inference-gateway-0cb11840.envoy-gateway-system.svc.cluster.local:8081/v1/completions) [503 Service Unavailable]>                                                                                   │
│ <CIMultiDictProxy('Content-Length': '95', 'Content-Type': 'text/plain', 'Date': 'Wed, 12 Mar 2025 16:51:07 GMT')>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions