fix: align mimo tts style payload with official docs by RichardLiuda · Pull Request #6814 · AstrBotDevs/AstrBot

RichardLiuda · 2026-03-22T17:01:54Z

对齐 MiMo 官方语音合成文档，修正 MiMo TTS 风格控制的请求构造方式。

Resolves #6815

概述

本 PR 修复了 MiMo TTS 风格控制的发送逻辑。

此前，mimo-tts-style-prompt 等风格相关内容是作为单独的 user 消息发送的，这与 MiMo 官方语音合成文档中的推荐用法不一致。根据官方文档，风格控制应通过 <style>...</style> 标签直接前置到 assistant 角色中的待合成文本开头。

改动内容

调整 astrbot/core/provider/sources/mimo_tts_api_source.py
- 将风格控制前置到 assistant.content 中的待合成文本开头
- 将 mimo-tts-style-prompt 和 mimo-tts-dialect 合并为开头的 <style>...</style> 标签
- 将 mimo-tts-seed-text 保留为可选的 user 消息，不再拼接到待合成文本中
- 对“唱歌”风格做特殊处理，仅在文本最前面生成 <style>唱歌</style>
更新以下文件中的 MiMo TTS 配置项说明文案：
- astrbot/core/config/default.py
- dashboard/src/i18n/locales/zh-CN/features/config-metadata.json
- dashboard/src/i18n/locales/en-US/features/config-metadata.json
- dashboard/src/i18n/locales/ru-RU/features/config-metadata.json
扩展 tests/test_mimo_api_sources.py
- 校验风格标签会前置到 assistant.content
- 校验 seed_text 仅作为可选 user 消息发送
- 校验未设置风格时仍能正常发送纯文本
- 校验“唱歌”风格的特殊处理逻辑

与官方文档对齐

对齐文档：

https://platform.xiaomimimo.com/#/docs/usage-guide/speech-synthesis

本次对齐的关键点包括：

待合成文本必须放在 assistant 角色消息中
风格控制需要通过 <style>...</style> 标签添加到目标文本开头
user 角色消息为可选参数，仅作为辅助引导信息使用
当使用“唱歌”风格时，目标文本开头仅保留 <style>唱歌</style>

兼容性说明

This is NOT a breaking change. / 这不是一个破坏性变更。

验证

本地执行以下检查：

uv run pytest tests/test_mimo_api_sources.py
uv run ruff check astrbot/core/provider/sources/mimo_tts_api_source.py tests/test_mimo_api_sources.py astrbot/core/config/default.py

结果：

tests/test_mimo_api_sources.py：10 项测试全部通过
ruff check：通过

Summary by Sourcery

Align MiMo TTS request construction with the official speech synthesis specification by moving style control into assistant messages and treating seed text as an optional user hint.

Bug Fixes:

Correct MiMo TTS style and dialect handling so they are applied via a <style>...</style> tag at the beginning of the assistant text instead of as a separate user message.

Enhancements:

Refactor MiMo TTS payload building to separate seed text handling from style prefix composition and ensure assistant content is always the actual synthesis text with any style tags prepended.

Documentation:

Update MiMo TTS configuration hints in default settings and dashboard i18n metadata to describe the new style, dialect, and seed text behavior.

Tests:

Extend MiMo TTS tests to cover style tag prefixing, omission of user messages without seed text, singing-style special handling, plain-text behavior without style, and ensuring seed text is not prepended to assistant content.

gemini-code-assist · 2026-03-22T17:02:14Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在修复 MiMo TTS 风格控制的发送逻辑，使其与 MiMo 官方语音合成文档保持一致。此前，风格相关内容作为单独的 user 消息发送，现在已改为通过 <style>...</style> 标签直接前置到 assistant 角色中的待合成文本开头，从而确保了正确的风格控制行为并提升了与官方 API 的兼容性。

Highlights

MiMo TTS 风格控制对齐官方文档: 调整了 MiMo TTS 风格控制的请求构造方式，使其与官方文档推荐的用法一致，将风格控制前置到 assistant.content 中的待合成文本开头。
风格与方言合并为 <style> 标签: 将 mimo-tts-style-prompt 和 mimo-tts-dialect 合并为一个 <style>...</style> 标签，直接插入到待合成文本的开头。
种子文本处理方式变更: 将 mimo-tts-seed-text 作为可选的 user 消息发送，不再将其拼接到待合成文本中，以辅助调节语气和风格。
“唱歌”风格特殊处理: 对“唱歌”风格进行了特殊处理，当检测到此风格时，仅在文本最前面生成 <style>唱歌</style>。
更新多语言配置项说明: 更新了中文、英文和俄文的配置文件中 MiMo TTS 相关配置项的说明文案，以反映新的实现方式。
扩展测试覆盖: 增加了新的测试用例，以验证风格标签前置、种子文本作为可选 user 消息发送、未设置风格时的纯文本发送以及“唱歌”风格的特殊处理逻辑。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The singing-style detection in _build_style_prefix uses a loose substring check (if "唱歌" in style_content), which may unintentionally trigger for phrases that merely contain those characters; consider a stricter condition (e.g., tokenized match or startswith) to avoid accidental singing mode.
The special handling for the singing style is currently hard-coded to the literal string "唱歌"; if MiMo later adds more singing-related tags or localized variants, this might be brittle—consider centralizing these style keywords or making them configurable to keep the mapping easier to extend.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The singing-style detection in `_build_style_prefix` uses a loose substring check (`if "唱歌" in style_content`), which may unintentionally trigger for phrases that merely contain those characters; consider a stricter condition (e.g., tokenized match or startswith) to avoid accidental singing mode.
- The special handling for the singing style is currently hard-coded to the literal string `"唱歌"`; if MiMo later adds more singing-related tags or localized variants, this might be brittle—consider centralizing these style keywords or making them configurable to keep the mapping easier to extend.

## Individual Comments

### Comment 1
<location path="tests/test_mimo_api_sources.py" line_range="46-55" />
<code_context>
+def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen this test to fully validate the assistant message structure and messages length.

Currently this only asserts the full first `user` message and the `content` of the second `assistant` message. To make the regression test more robust for the new protocol, also assert that the second message has `role == "assistant"` and that `len(payload["messages"]) == 2` so future refactors can’t change the ordering or add extra messages unnoticed.

Suggested implementation:

```python
def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
    provider = _make_tts_provider(
        {
            "mimo-tts-style-prompt": "开心",
            "mimo-tts-dialect": "四川话",
            "mimo-tts-seed-text": "You are chatting with a close friend.",
        }
    )
    try:
        # Build the payload used to call the model
        payload = provider._build_payload()

        # Ensure we only ever send the expected two messages
        assert len(payload["messages"]) == 2

        # First message should still be the full user seed text as before
        user_msg = payload["messages"][0]
        assert user_msg["role"] == "user"
        assert user_msg["content"] == "You are chatting with a close friend."

        # Second message should be an assistant message whose content prefixes style and dialect
        assistant_msg = payload["messages"][1]
        assert assistant_msg["role"] == "assistant"
        assert "开心" in assistant_msg["content"]
        assert "四川话" in assistant_msg["content"]

        asyncio.run(provider.terminate())

```

Because I can only see part of the file, you may need to align the helper names and patterns:

1. Replace `provider._build_payload()` with the actual helper you use to construct the chat payload that currently produces `payload["messages"]` (for example, it might be named `_build_chat_request`, `_build_mimo_payload`, etc.).
2. If the test already builds `payload` and asserts on the first user message and the assistant content, you should:
   - Keep the existing `payload` construction.
   - Insert the following assertions around where you currently assert on `payload["messages"][1]["content"]`:
   ```python
   assert len(payload["messages"]) == 2
   assert payload["messages"][1]["role"] == "assistant"
   ```
   and, if not already present, factor out `user_msg`/`assistant_msg` as shown so the test is clearer and more robust.
3. Ensure the `asyncio.run(provider.terminate())` call is not duplicated if there is already a termination call in a `finally` block or elsewhere in this test.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-22T17:05:20Z

tests/test_mimo_api_sources.py

+def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
    provider = _make_tts_provider(
        {
-            "mimo-tts-style-prompt": "Please sound cheerful and lively.",
-            "mimo-tts-dialect": "Sichuan dialect",
+            "mimo-tts-style-prompt": "开心",
+            "mimo-tts-dialect": "四川话",
            "mimo-tts-seed-text": "You are chatting with a close friend.",
        }
    )
    try:
        payload = provider._build_payload("hello")


suggestion (testing): Strengthen this test to fully validate the assistant message structure and messages length.

Currently this only asserts the full first user message and the content of the second assistant message. To make the regression test more robust for the new protocol, also assert that the second message has role == "assistant" and that len(payload["messages"]) == 2 so future refactors can’t change the ordering or add extra messages unnoticed.

Suggested implementation:

def test_mimo_tts_assistant_content_prefixes_style_and_dialect(): provider = _make_tts_provider( { "mimo-tts-style-prompt": "开心", "mimo-tts-dialect": "四川话", "mimo-tts-seed-text": "You are chatting with a close friend.", } ) try: # Build the payload used to call the model payload = provider._build_payload() # Ensure we only ever send the expected two messages assert len(payload["messages"]) == 2 # First message should still be the full user seed text as before user_msg = payload["messages"][0] assert user_msg["role"] == "user" assert user_msg["content"] == "You are chatting with a close friend." # Second message should be an assistant message whose content prefixes style and dialect assistant_msg = payload["messages"][1] assert assistant_msg["role"] == "assistant" assert "开心" in assistant_msg["content"] assert "四川话" in assistant_msg["content"] asyncio.run(provider.terminate())

Because I can only see part of the file, you may need to align the helper names and patterns:

Replace provider._build_payload() with the actual helper you use to construct the chat payload that currently produces payload["messages"] (for example, it might be named _build_chat_request, _build_mimo_payload, etc.).

If the test already builds payload and asserts on the first user message and the assistant content, you should:

Keep the existing payload construction.

Insert the following assertions around where you currently assert on payload["messages"][1]["content"]:

assert len(payload["messages"]) == 2 assert payload["messages"][1]["role"] == "assistant"
and, if not already present, factor out user_msg/assistant_msg as shown so the test is clearer and more robust.

Ensure the asyncio.run(provider.terminate()) call is not duplicated if there is already a termination call in a finally block or elsewhere in this test.

gemini-code-assist

Code Review

This pull request aligns the MiMo TTS style payload with the official documentation by adjusting the request construction to include style control within assistant messages and treating seed text as an optional user hint. The changes involve modifications to the MiMo TTS API source code, updates to configuration hints in default settings and dashboard i18n metadata, and extensions to MiMo TTS tests. The review focuses on correctness and maintainability, ensuring the changes adhere to the intended functionality and coding best practices. All comments provided are valid and do not contradict the repository's rules regarding asyncio atomicity.

gemini-code-assist · 2026-03-22T17:07:22Z

astrbot/core/provider/sources/mimo_tts_api_source.py


-    def _build_user_prompt(self) -> str:
-        prompt_parts: list[str] = []
+    def _build_user_prompt(self) -> str | None:


The function _build_user_prompt is now returning None in some cases. It would be helpful to add a docstring to explain the function's purpose, return value, and the conditions under which it returns None.

gemini-code-assist · 2026-03-22T17:07:22Z

astrbot/core/provider/sources/mimo_tts_api_source.py

-        prompt_parts: list[str] = []
+    def _build_user_prompt(self) -> str | None:
+        seed_text = self.seed_text.strip()
+        return seed_text or None


This line returns the seed_text if it's not empty, otherwise it returns None. Consider adding a comment to clarify this behavior, as it might not be immediately obvious to someone reading the code.

Suggested change

return seed_text or None

return seed_text or None # Returns seed_text if not empty, otherwise None

gemini-code-assist · 2026-03-22T17:07:22Z

astrbot/core/provider/sources/mimo_tts_api_source.py

+        seed_text = self.seed_text.strip()
+        return seed_text or None
+
+    def _build_style_prefix(self) -> str:


It would be helpful to add a docstring to explain the purpose of the _build_style_prefix function and what the returned string represents.

gemini-code-assist · 2026-03-22T17:07:22Z

astrbot/core/provider/sources/mimo_tts_api_source.py

+        if "唱歌" in style_content:
+            return "<style>唱歌</style>"


This logic handles the special case where the style includes "唱歌". It would be beneficial to add a comment explaining why this special handling is necessary, referencing the MiMo documentation if possible.

Suggested change

if "唱歌" in style_content:

return "<style>唱歌</style>"

# MiMo recommends using only the singing style tag at the very beginning.

if "唱歌" in style_content:

return "<style>唱歌</style>" # Special case for singing style

gemini-code-assist · 2026-03-22T17:07:22Z

astrbot/core/provider/sources/mimo_tts_api_source.py


-        if self.seed_text.strip():
-            prompt_parts.append(self.seed_text.strip())
+        return f"<style>{style_content}</style>"


Consider adding a comment here to explain that the <style> tag is being constructed and that this is the standard way to apply styles according to the MiMo TTS API.

Suggested change

return f"<style>{style_content}</style>"

# Construct the style tag with the combined style content

return f"<style>{style_content}</style>"

gemini-code-assist · 2026-03-22T17:07:22Z

tests/test_mimo_api_sources.py

+        assert payload["messages"][0] == {
+            "role": "user",
+            "content": "You are chatting with a close friend.",
+        }
+        assert payload["messages"][1]["content"] == "<style>开心 四川话</style>hello"


The assertion payload["messages"][1]["content"] checks the content of the assistant message. It would be more robust to also assert the role to ensure the message is correctly assigned to the assistant.

Suggested change

assert payload["messages"][0] == {

"role": "user",

"content": "You are chatting with a close friend.",

}

assert payload["messages"][1]["content"] == "<style>开心四川话</style>hello"

assert payload["messages"][0] == {

"role": "user",

"content": "You are chatting with a close friend.",

}

assert payload["messages"][1]["role"] == "assistant"

assert payload["messages"][1]["content"] == "<style>开心四川话</style>hello"

gemini-code-assist · 2026-03-22T17:07:22Z

tests/test_mimo_api_sources.py

+        assert payload["messages"] == [
+            {
+                "role": "assistant",
+                "content": "<style>开心</style>hello",
+            }


The assertion payload["messages"] checks the entire messages array. It would be more robust to assert the role and content of the assistant message separately to ensure the message is correctly constructed.

assert len(payload["messages"]) == 1 assert payload["messages"][0]["role"] == "assistant" assert payload["messages"][0]["content"] == "<style>开心</style>hello"

gemini-code-assist · 2026-03-22T17:07:22Z

tests/test_mimo_api_sources.py

+    )
+    try:
+        payload = provider._build_payload("歌词")
+        assert payload["messages"][1]["content"] == "<style>唱歌</style>歌词"


The assertion payload["messages"][1]["content"] checks the content of the assistant message. It would be more robust to also assert the role to ensure the message is correctly assigned to the assistant.

assert payload["messages"][1]["role"] == "assistant" assert payload["messages"][1]["content"] == "<style>唱歌</style>歌词"

gemini-code-assist · 2026-03-22T17:07:22Z

tests/test_mimo_api_sources.py

+        assert payload["messages"] == [
+            {
+                "role": "assistant",
+                "content": "hello",
+            }


The assertion payload["messages"] checks the entire messages array. It would be more robust to assert the role and content of the assistant message separately to ensure the message is correctly constructed.

assert len(payload["messages"]) == 1 assert payload["messages"][0]["role"] == "assistant" assert payload["messages"][0]["content"] == "hello"

gemini-code-assist · 2026-03-22T17:07:22Z

tests/test_mimo_api_sources.py

+        assert payload["messages"][0]["content"] == "reference text"
+        assert payload["messages"][1]["content"] == "<style>开心</style>明天就是周五了"
+        assert "reference text" not in payload["messages"][1]["content"]


These assertions check the content of the user and assistant messages. It would be more robust to also assert the role to ensure the messages are correctly assigned.

assert payload["messages"][0]["role"] == "user" assert payload["messages"][0]["content"] == "reference text" assert payload["messages"][1]["role"] == "assistant" assert payload["messages"][1]["content"] == "<style>开心</style>明天就是周五了" assert "reference text" not in payload["messages"][1]["content"]

fix: align mimo tts style payload with official docs

112ed9c

auto-assign bot requested review from LIghtJUNction and advent259141 March 22, 2026 17:01

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 22, 2026

dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 22, 2026

sourcery-ai bot reviewed Mar 22, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 22, 2026

View reviewed changes

RC-CHN approved these changes Mar 23, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 23, 2026

RC-CHN merged commit 04b7618 into AstrBotDevs:master Mar 23, 2026
7 checks passed

	return seed_text or None
	return seed_text or None # Returns seed_text if not empty, otherwise None

	return f"<style>{style_content}</style>"
	# Construct the style tag with the combined style content
	return f"<style>{style_content}</style>"

Uh oh!

Conversation

RichardLiuda commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概述

改动内容

与官方文档对齐

兼容性说明

验证

Summary by Sourcery

Uh oh!

gemini-code-assist bot commented Mar 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RichardLiuda commented Mar 22, 2026 •

edited

Loading