[Bug]: Missing Stop Tokens in responses_utils.py Causes Content Leakage in Tool Use Scenarios

### System Info

# Environment
- CPU architecture: aarch64
- CPU/Host memory size: ~450GB (Estimated for GH200 systems)
- GPU properties:
- GPU name: NVIDIA GH200 480GB
- GPU memory size: 97871 MiB
- Libraries:
TensorRT-LLM branch or tag: release:1.2.0rc7
- Versions:
PyTorch: 2.9.0a0+145a3a7bda.nv25.10
CUDA: 13.0
- Container used: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc7
NVIDIA driver version: 580.65.06
OS: Linux (aarch64)

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Steps to Reproduce
Minimal Reproduction Script
Run this script against a deployed trtllm-serve instance hosting a Harmony model (e.g., gpt-oss-20b).

```python
import openai
# Configuration
API_KEY = "EMPTY"  # Functionality doesn't require actual auth for local repro
BASE_URL = "http://localhost:8000/v1"
MODEL_NAME = "openai/gpt-oss-20b"
client = openai.OpenAI(api_key=API_KEY, base_url=BASE_URL)
try:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": "Use search function"}],
        tools=[{
            "type": "function",
            "function": {
                "name": "search",
                "description": "Search function",
                "parameters": {
                    "type": "object",
                    "properties": {"q": {"type": "string"}}
                }
            }
        }],
        temperature=1.0, # High temperature increases likelihood of hitting the bug
        max_tokens=100
    )
    content = response.choices[0].message.content
    print(f"Response Content: {content}")
    
    if "<|channel|>" in str(content) or "<|message|>" in str(content):
        print("FAIL: Leak detected! Internal tokens found in output.")
    else:
        print("PASS: Output looks clean (or bug not triggered this time).")
except Exception as e:
    print(f"Error: {e}")
```

### Expected behavior

The model should inherently generate appropriate EOS tokens (e.g., <|call|>, <|return|>) even in tool-use scenarios and high temperatures. stop_token_ids should be passed correctly to the generation engine so that:

The output stream is terminated correctly at the end of a tool call or message.
No internal raw tokens (like <|channel|>) are exposed to the parser or the user.

### actual behavior

Due to stop_token_ids being conditionally set to [] (empty list) in certain request paths (use_harmony=False), the model ignores stop conditions. This leads to:

Uncontrolled Generation: The model continues generating past the intended end of the message.
Handling Failures: The Harmony Parser fails to interpret the malformed/extended sequence and falls back to raw text decoding.
Content Leakage: The user receives the raw internal representation, including <|channel|> tags and potential hallucinations/spam.

### additional notes

We identified the potential root cause in tensorrt_llm/serve/responses_utils.py.

The code conditionally sets stop_token_ids:
```python
sampling_params = request.to_sampling_params(
    default_sampling_params={
        "stop_token_ids":
        get_harmony_adapter().get_stop_tokens() if use_harmony else []
    })
```
When use_harmony is unexpectedly False (which occurs intermittently in certain request flows), stop_token_ids defaults to []. This leaves the model without explicit stop conditions.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Missing Stop Tokens in responses_utils.py Causes Content Leakage in Tool Use Scenarios #10651

System Info

Environment

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Missing Stop Tokens in responses_utils.py Causes Content Leakage in Tool Use Scenarios #10651

Description

System Info

Environment

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions