[BUG] Missing spaces between English words during streaming inference with Mistral/internlm3 models

---
### Title: [Bug] Missing spaces between English words during streaming inference with Mistral models
### Description
When generating text using Mistral-7B-Instruct-v0.1 in InfiniLM, spaces between English words are lost, causing the output to concatenate into a single string.
**Steps to Reproduce:**
1. Load the `Mistral-7B-Instruct-v0.1` model.
2. Send the prompt: `<s> [INST] introduce yourself [/INST]`
3. Observe the generated output.
**Actual Output:**
```text
Hello!I'manAIlanguagemodelheretohelpyouwithanyquestionsortasksyoumighthave.HowcanIassistyoutoday?</s>
```
**Expected Output:**
```text
Hello! I'm an AI language model here to help you with any questions or tasks you might have. How can I assist you today?</s>
```
### Root Cause
This issue stems from the interaction between SentencePiece's space placeholder `▁` (U+2581) and the underlying mechanics of the Fast Tokenizer during incremental decoding.
1. **Incremental Decoding**: During LLM streaming inference, tokens are generated one at a time. InfiniLM's decoding logic (e.g., in `generation/utils.py` and `llm/llm.py`) calls `tokenizer.decode([token_id])` to retrieve the text for each new token.
2. **Rust Backend Trim Behavior**: Mistral utilizes `LlamaTokenizerFast` (powered by the Rust-based `tokenizers` library). When `decode()` is called on a standalone token (e.g., `▁world`), the Rust backend first converts `▁` to a space (` world`), but then automatically assumes it's the beginning of a sentence and **trims the leading space**, ultimately returning `"world"`.
3. **Why Patching `convert_tokens_to_string` Fails**: For Fast Tokenizers, the `decode()` method directly invokes the Rust backend and returns the result, **completely bypassing `convert_tokens_to_string()`**. Therefore, patching `convert_tokens_to_string`—as done for ChatGLM—has no effect on Mistral.
**Comparison with ChatGLM:**
* **ChatGLM** uses the slow Python tokenizer. Its `decode()` method internally calls `convert_tokens_to_string()`, so patching that method successfully intercepts the flow.
* **Mistral** uses the Fast Rust tokenizer. Its `decode()` method bypasses `convert_tokens_to_string()`, meaning the patch must be applied directly to the `decode()` method itself.
### Proposed Fix
Introduce a `MistralProcessor` that patches `tokenizer.decode()` during instantiation. The patch bypasses the Rust trim logic by manually fetching raw token strings via `convert_ids_to_tokens` (which preserves the `▁` character), replacing `▁` with spaces, and handling SentencePiece byte fallback sequences.
**Implementation:** `python/infinilm/processors/mistral_processor.py`
```python
import re
import types
from .basic_llm_processor import BasicLLMProcessor
from .processor import register_processor
@register_processor("mistral")
class MistralProcessor(BasicLLMProcessor):
    def __init__(self, model_dir_path: str):
        super().__init__(model_dir_path)
        self._fix_tokenizer_decode(self.tokenizer)
    @staticmethod
    def _fix_tokenizer_decode(tokenizer):
        """Fix Mistral tokenizer incremental decoding space loss.
        LlamaTokenizerFast.decode() calls the Rust backend directly, which
        trims leading spaces derived from ▁ (U+2581) during single-token
        decoding, causing English words to concatenate.
        Fix: patch tokenizer.decode() to:
        1. Convert token IDs to raw token strings (preserving ▁)
        2. Manually replace ▁ → space and handle byte fallback
        """
        def patched_decode(self_tok, token_ids, skip_special_tokens=False, **kwargs):
            # 1. Get raw token strings (preserving ▁)
            if isinstance(token_ids, int):
                token_ids = [token_ids]
            tokens = self_tok.convert_ids_to_tokens(
                token_ids, skip_special_tokens=skip_special_tokens
            )
            if isinstance(tokens, str):
                tokens = [tokens]
            # 2. Remove special tokens if requested
            if skip_special_tokens:
                special = set(self_tok.all_special_tokens)
                tokens = [t for t in tokens if t not in special]
            # 3. Join + replace ▁ (U+2581) with space
            text = "".join(tokens).replace("\u2581", " ")
            # 4. Handle SentencePiece byte fallback: consecutive <0xHH> → UTF-8
            def byte_fallback_replace(match):
                hex_strs = re.findall(r"<0x([0-9A-Fa-f]{2})>", match.group(0))
                byte_values = bytes([int(h, 16) for h in hex_strs])
                return byte_values.decode("utf-8", errors="replace")
            text = re.sub(r"(<0x[0-9A-Fa-f]{2}>)+", byte_fallback_replace, text)
            return text
        tokenizer.decode = types.MethodType(patched_decode, tokenizer)
```
### Environment
* **Model**: Mistral-7B-Instruct-v0.1  Mistral-7B-Instruct-v0.2
* **InfiniLM**: main branch
* **Transformers**: 4.34.0.dev0
* **Tokenizer Class**: `LlamaTokenizerFast` (`is_fast=True`)

<img width="1475" height="452" alt="Image" src="https://github.com/user-attachments/assets/27edb5d2-c75b-45c9-8749-1c6d01cf45f8" />

<img width="1483" height="857" alt="Image" src="https://github.com/user-attachments/assets/b4fda714-acc8-4123-9f37-4b989b462e3c" />


<img width="1486" height="921" alt="Image" src="https://github.com/user-attachments/assets/c85fbb47-0e0a-4738-8ee7-db0c3835b1af" />

<img width="1470" height="752" alt="Image" src="https://github.com/user-attachments/assets/ff600a53-0eb3-4359-8e2b-75bcd741d708" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Missing spaces between English words during streaming inference with Mistral/internlm3 models #398

Title: [Bug] Missing spaces between English words during streaming inference with Mistral models

Description

Root Cause

Proposed Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Missing spaces between English words during streaming inference with Mistral/internlm3 models #398

Description

Title: [Bug] Missing spaces between English words during streaming inference with Mistral models

Description

Root Cause

Proposed Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions