[bug] Bailian/DashScope realtime ASR duplicates cumulative partial results

## Summary

When using the Bailian / Alibaba Cloud DashScope realtime ASR provider, the raw transcript can contain repeated cumulative prefixes. This looks like the client is appending multiple `result-generated` interim results as final text segments.

This is not an LLM polish issue: the duplication is already present in the raw ASR transcript before polishing.

## Why this appears to happen

Alibaba Cloud Fun-ASR realtime WebSocket documents `result-generated` as containing both interim and final sentence results. The documented finality flag is:

```json
payload.output.sentence.sentence_end
```

- `sentence_end: false` means the current sentence has not ended yet.
- `sentence_end: true` means the current sentence is final.

The official Python SDK examples similarly use `RecognitionResult.is_sentence_end(sentence)` before treating a sentence as ended.

OpenLess currently appears to use `end_time` presence as the finality check in `app/src-tauri/src/asr/bailian.rs`:

```rust
let is_sentence_final = sentence.get("end_time").is_some();

st.last_result_text = trimmed.to_string();
if is_sentence_final && st.final_segments.last().map(|s| s.as_str()) != Some(trimmed) {
    st.final_segments.push(trimmed.to_string());
}
```

Then final output joins all collected segments:

```rust
st.final_segments.join("")
```

If DashScope sends cumulative/interim texts such as:

```text
我看一下
我看一下阿里云这个
我看一下阿里云这个模型会不会...
```

OpenLess can produce duplicated raw transcript text by appending all of them.

## Example observed output

Short dictation using Bailian/DashScope realtime ASR produced raw transcript patterns like:

```text
那我试试看呗那我试试看呗，用阿里云的那我试试看呗，用阿里云的这个是不是可那我试试看呗，用阿里云的这个是不是可效果更那我试试看呗，用阿里云的这个是不是更效果更好一点？
```

Another example:

```text
我看一下我看一下把阿里云这个我看一下把阿里云这个模型会不会输...
```

These are cumulative prefix repetitions, not normal acoustic recognition errors.

## Expected behavior

Only final sentence text should be committed once. Interim results should update the current partial sentence, not be appended to final output.

## Suggested fix

1. In `record_result`, skip heartbeat events:

```rust
let is_heartbeat = sentence
    .get("heartbeat")
    .and_then(Value::as_bool)
    .unwrap_or(false);
if is_heartbeat {
    return;
}
```

2. Use the documented finality flag:

```rust
let is_sentence_final = sentence
    .get("sentence_end")
    .and_then(Value::as_bool)
    .unwrap_or(false);
```

3. Track text by `sentence_id` instead of pushing every final-looking event into a Vec. Suggested shape:

```rust
final_segments: BTreeMap<i64, String>,
partial_segments: BTreeMap<i64, String>,
```

4. For `sentence_end == false`, update the current partial segment only.
5. For `sentence_end == true`, commit that `sentence_id` once and remove the partial.
6. Keep a prefix/overlap merge guard to tolerate duplicate/replayed server events.
7. Add tests for multiple partial results, duplicate final events, heartbeat events, and multiple sentence IDs assembled in order.

## References

- Alibaba Cloud Fun-ASR server events: https://help.aliyun.com/zh/model-studio/fun-asr-server-events
- Alibaba Cloud Fun-ASR Python SDK example: https://help.aliyun.com/zh/model-studio/fun-asr-realtime-python-sdk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Bailian/DashScope realtime ASR duplicates cumulative partial results #530

Summary

Why this appears to happen

Example observed output

Expected behavior

Suggested fix

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bug] Bailian/DashScope realtime ASR duplicates cumulative partial results #530

Description

Summary

Why this appears to happen

Example observed output

Expected behavior

Suggested fix

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions