Summary
When using the Bailian / Alibaba Cloud DashScope realtime ASR provider, the raw transcript can contain repeated cumulative prefixes. This looks like the client is appending multiple result-generated interim results as final text segments.
This is not an LLM polish issue: the duplication is already present in the raw ASR transcript before polishing.
Why this appears to happen
Alibaba Cloud Fun-ASR realtime WebSocket documents result-generated as containing both interim and final sentence results. The documented finality flag is:
payload.output.sentence.sentence_end
sentence_end: false means the current sentence has not ended yet.
sentence_end: true means the current sentence is final.
The official Python SDK examples similarly use RecognitionResult.is_sentence_end(sentence) before treating a sentence as ended.
OpenLess currently appears to use end_time presence as the finality check in app/src-tauri/src/asr/bailian.rs:
let is_sentence_final = sentence.get("end_time").is_some();
st.last_result_text = trimmed.to_string();
if is_sentence_final && st.final_segments.last().map(|s| s.as_str()) != Some(trimmed) {
st.final_segments.push(trimmed.to_string());
}
Then final output joins all collected segments:
st.final_segments.join("")
If DashScope sends cumulative/interim texts such as:
我看一下
我看一下阿里云这个
我看一下阿里云这个模型会不会...
OpenLess can produce duplicated raw transcript text by appending all of them.
Example observed output
Short dictation using Bailian/DashScope realtime ASR produced raw transcript patterns like:
那我试试看呗那我试试看呗,用阿里云的那我试试看呗,用阿里云的这个是不是可那我试试看呗,用阿里云的这个是不是可效果更那我试试看呗,用阿里云的这个是不是更效果更好一点?
Another example:
我看一下我看一下把阿里云这个我看一下把阿里云这个模型会不会输...
These are cumulative prefix repetitions, not normal acoustic recognition errors.
Expected behavior
Only final sentence text should be committed once. Interim results should update the current partial sentence, not be appended to final output.
Suggested fix
- In
record_result, skip heartbeat events:
let is_heartbeat = sentence
.get("heartbeat")
.and_then(Value::as_bool)
.unwrap_or(false);
if is_heartbeat {
return;
}
- Use the documented finality flag:
let is_sentence_final = sentence
.get("sentence_end")
.and_then(Value::as_bool)
.unwrap_or(false);
- Track text by
sentence_id instead of pushing every final-looking event into a Vec. Suggested shape:
final_segments: BTreeMap<i64, String>,
partial_segments: BTreeMap<i64, String>,
- For
sentence_end == false, update the current partial segment only.
- For
sentence_end == true, commit that sentence_id once and remove the partial.
- Keep a prefix/overlap merge guard to tolerate duplicate/replayed server events.
- Add tests for multiple partial results, duplicate final events, heartbeat events, and multiple sentence IDs assembled in order.
References
Summary
When using the Bailian / Alibaba Cloud DashScope realtime ASR provider, the raw transcript can contain repeated cumulative prefixes. This looks like the client is appending multiple
result-generatedinterim results as final text segments.This is not an LLM polish issue: the duplication is already present in the raw ASR transcript before polishing.
Why this appears to happen
Alibaba Cloud Fun-ASR realtime WebSocket documents
result-generatedas containing both interim and final sentence results. The documented finality flag is:payload.output.sentence.sentence_endsentence_end: falsemeans the current sentence has not ended yet.sentence_end: truemeans the current sentence is final.The official Python SDK examples similarly use
RecognitionResult.is_sentence_end(sentence)before treating a sentence as ended.OpenLess currently appears to use
end_timepresence as the finality check inapp/src-tauri/src/asr/bailian.rs:Then final output joins all collected segments:
If DashScope sends cumulative/interim texts such as:
OpenLess can produce duplicated raw transcript text by appending all of them.
Example observed output
Short dictation using Bailian/DashScope realtime ASR produced raw transcript patterns like:
Another example:
These are cumulative prefix repetitions, not normal acoustic recognition errors.
Expected behavior
Only final sentence text should be committed once. Interim results should update the current partial sentence, not be appended to final output.
Suggested fix
record_result, skip heartbeat events:sentence_idinstead of pushing every final-looking event into a Vec. Suggested shape:sentence_end == false, update the current partial segment only.sentence_end == true, commit thatsentence_idonce and remove the partial.References