muliturn benchmark by shihaobai · Pull Request #1313 · ModelTC/LightLLM

shihaobai · 2026-05-19T08:46:06Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a multi-turn dialogue benchmark tool, a quantization configuration for Qwen 3.5 122B, and a server startup script. The benchmark script simulates concurrent multi-turn sessions to measure metrics such as TTFT, TPOT, and cache hit ratios. Review feedback highlights a potential issue with SSE message parsing in the benchmark script, suggesting the use of readline() for robustness. Other recommendations include optimizing prompt length calculations to prevent performance bottlenecks in long sessions, improving the handling of existing output files, and using script-relative paths in the shell script to increase portability.

gemini-code-assist · 2026-05-19T08:48:19Z

+            async for raw in response.content:
+                line = raw.strip()
+                if not line or not line.startswith(b"data:"):
+                    continue
+                data_str = line[len(b"data:"):].strip()
+                if data_str == b"[DONE]":
+                    break


The current implementation iterates over response.content using async for, which in aiohttp yields arbitrary chunks of data as they arrive from the network. This is not guaranteed to align with SSE message boundaries (lines starting with data:). If a chunk contains multiple lines or a partial line, json.loads will fail or data will be skipped. Using readline() ensures that each iteration processes a complete line.

Suggested change

async for raw in response.content:

line = raw.strip()

if not line or not line.startswith(b"data:"):

continue

data_str = line[len(b"data:"):].strip()

if data_str == b"[DONE]":

break

while True:

line = await response.content.readline()

if not line:

break

line = line.strip()

if not line or not line.startswith(b"data:"):

continue

data_str = line[len(b"data:"):].strip()

if data_str == b"[DONE]":

break

gemini-code-assist · 2026-05-19T08:48:19Z

+    else:
+        new_text = ""
+    new_prompt = prompt + generated_text + new_text
+    new_len = len(tokenizer.encode(new_prompt, add_special_tokens=False))


Re-encoding the entire prompt on every turn leads to $O(N^2)$ complexity relative to the number of turns. For long contexts (e.g., 160k tokens), this becomes a significant performance bottleneck on the client side, potentially deflating the reported QPS and TPM metrics as the session progresses. Consider calculating the new length by adding the lengths of the model's output and the new user tokens to the server-reported prompt_tokens from the previous turn to avoid redundant computation.

gemini-code-assist · 2026-05-19T08:48:19Z

+    if args.dump_file and os.path.exists(args.dump_file) and os.path.getsize(args.dump_file) > 0:
+        with open(args.dump_file, "r") as f:
+            print(json.dumps(json.load(f), indent=4))
+        return


The script exits immediately if the dump_file already exists and is non-empty. This behavior is counter-intuitive for a benchmark tool; users typically expect the script to run a new benchmark and either overwrite or append to the output file. It would be better to provide a warning or a flag to control this behavior (e.g., --force to overwrite or a separate command to view results).

gemini-code-assist · 2026-05-19T08:48:19Z

+)
+
+if [[ "${ENABLE_DEEPEP:-0}" == "1" || "${ENABLE_DEEPEP:-}" == "true" ]]; then
+  ARGS+=(--quant_cfg ../../advanced_config/mixed_quantization/qwen3_5-122b-moe-only-fp8.yaml --enable_ep_moe --dp 8 --batch_max_tokens 4096 --graph_max_batch_size 64 --chunked_prefill_size 2048 --mem_fraction 0.8 --linear_att_cache_size 300)


Using a relative path for --quant_cfg makes the script fragile as it depends on the current working directory from which the script is executed. It is safer to resolve the path relative to the script's own location using $(dirname "$0").

Suggested change

ARGS+=(--quant_cfg ../../advanced_config/mixed_quantization/qwen3_5-122b-moe-only-fp8.yaml --enable_ep_moe --dp 8 --batch_max_tokens 4096 --graph_max_batch_size 64 --chunked_prefill_size 2048 --mem_fraction 0.8 --linear_att_cache_size 300)

ARGS+=(--quant_cfg "$(dirname "$0")/../../advanced_config/mixed_quantization/qwen3_5-122b-moe-only-fp8.yaml" --enable_ep_moe --dp 8 --batch_max_tokens 4096 --graph_max_batch_size 64 --chunked_prefill_size 2048 --mem_fraction 0.8 --linear_att_cache_size 300)

multirun benchmakr

250135d

shihaobai merged commit f850264 into main May 19, 2026
1 check failed

shihaobai deleted the benchmark branch May 19, 2026 08:46

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

muliturn benchmark#1313

muliturn benchmark#1313
shihaobai merged 1 commit into
mainfrom
benchmark

shihaobai commented May 19, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ARGS+=(--quant_cfg ../../advanced_config/mixed_quantization/qwen3_5-122b-moe-only-fp8.yaml --enable_ep_moe --dp 8 --batch_max_tokens 4096 --graph_max_batch_size 64 --chunked_prefill_size 2048 --mem_fraction 0.8 --linear_att_cache_size 300)
	ARGS+=(--quant_cfg "$(dirname "$0")/../../advanced_config/mixed_quantization/qwen3_5-122b-moe-only-fp8.yaml" --enable_ep_moe --dp 8 --batch_max_tokens 4096 --graph_max_batch_size 64 --chunked_prefill_size 2048 --mem_fraction 0.8 --linear_att_cache_size 300)

Conversation

shihaobai commented May 19, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant