fix: align mimo tts style payload with official docs#6814
fix: align mimo tts style payload with official docs#6814RC-CHN merged 1 commit intoAstrBotDevs:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在修复 MiMo TTS 风格控制的发送逻辑,使其与 MiMo 官方语音合成文档保持一致。此前,风格相关内容作为单独的 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The singing-style detection in
_build_style_prefixuses a loose substring check (if "唱歌" in style_content), which may unintentionally trigger for phrases that merely contain those characters; consider a stricter condition (e.g., tokenized match or startswith) to avoid accidental singing mode. - The special handling for the singing style is currently hard-coded to the literal string
"唱歌"; if MiMo later adds more singing-related tags or localized variants, this might be brittle—consider centralizing these style keywords or making them configurable to keep the mapping easier to extend.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The singing-style detection in `_build_style_prefix` uses a loose substring check (`if "唱歌" in style_content`), which may unintentionally trigger for phrases that merely contain those characters; consider a stricter condition (e.g., tokenized match or startswith) to avoid accidental singing mode.
- The special handling for the singing style is currently hard-coded to the literal string `"唱歌"`; if MiMo later adds more singing-related tags or localized variants, this might be brittle—consider centralizing these style keywords or making them configurable to keep the mapping easier to extend.
## Individual Comments
### Comment 1
<location path="tests/test_mimo_api_sources.py" line_range="46-55" />
<code_context>
+def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen this test to fully validate the assistant message structure and messages length.
Currently this only asserts the full first `user` message and the `content` of the second `assistant` message. To make the regression test more robust for the new protocol, also assert that the second message has `role == "assistant"` and that `len(payload["messages"]) == 2` so future refactors can’t change the ordering or add extra messages unnoticed.
Suggested implementation:
```python
def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
provider = _make_tts_provider(
{
"mimo-tts-style-prompt": "开心",
"mimo-tts-dialect": "四川话",
"mimo-tts-seed-text": "You are chatting with a close friend.",
}
)
try:
# Build the payload used to call the model
payload = provider._build_payload()
# Ensure we only ever send the expected two messages
assert len(payload["messages"]) == 2
# First message should still be the full user seed text as before
user_msg = payload["messages"][0]
assert user_msg["role"] == "user"
assert user_msg["content"] == "You are chatting with a close friend."
# Second message should be an assistant message whose content prefixes style and dialect
assistant_msg = payload["messages"][1]
assert assistant_msg["role"] == "assistant"
assert "开心" in assistant_msg["content"]
assert "四川话" in assistant_msg["content"]
asyncio.run(provider.terminate())
```
Because I can only see part of the file, you may need to align the helper names and patterns:
1. Replace `provider._build_payload()` with the actual helper you use to construct the chat payload that currently produces `payload["messages"]` (for example, it might be named `_build_chat_request`, `_build_mimo_payload`, etc.).
2. If the test already builds `payload` and asserts on the first user message and the assistant content, you should:
- Keep the existing `payload` construction.
- Insert the following assertions around where you currently assert on `payload["messages"][1]["content"]`:
```python
assert len(payload["messages"]) == 2
assert payload["messages"][1]["role"] == "assistant"
```
and, if not already present, factor out `user_msg`/`assistant_msg` as shown so the test is clearer and more robust.
3. Ensure the `asyncio.run(provider.terminate())` call is not duplicated if there is already a termination call in a `finally` block or elsewhere in this test.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def test_mimo_tts_assistant_content_prefixes_style_and_dialect(): | ||
| provider = _make_tts_provider( | ||
| { | ||
| "mimo-tts-style-prompt": "Please sound cheerful and lively.", | ||
| "mimo-tts-dialect": "Sichuan dialect", | ||
| "mimo-tts-style-prompt": "开心", | ||
| "mimo-tts-dialect": "四川话", | ||
| "mimo-tts-seed-text": "You are chatting with a close friend.", | ||
| } | ||
| ) | ||
| try: | ||
| payload = provider._build_payload("hello") |
There was a problem hiding this comment.
suggestion (testing): Strengthen this test to fully validate the assistant message structure and messages length.
Currently this only asserts the full first user message and the content of the second assistant message. To make the regression test more robust for the new protocol, also assert that the second message has role == "assistant" and that len(payload["messages"]) == 2 so future refactors can’t change the ordering or add extra messages unnoticed.
Suggested implementation:
def test_mimo_tts_assistant_content_prefixes_style_and_dialect():
provider = _make_tts_provider(
{
"mimo-tts-style-prompt": "开心",
"mimo-tts-dialect": "四川话",
"mimo-tts-seed-text": "You are chatting with a close friend.",
}
)
try:
# Build the payload used to call the model
payload = provider._build_payload()
# Ensure we only ever send the expected two messages
assert len(payload["messages"]) == 2
# First message should still be the full user seed text as before
user_msg = payload["messages"][0]
assert user_msg["role"] == "user"
assert user_msg["content"] == "You are chatting with a close friend."
# Second message should be an assistant message whose content prefixes style and dialect
assistant_msg = payload["messages"][1]
assert assistant_msg["role"] == "assistant"
assert "开心" in assistant_msg["content"]
assert "四川话" in assistant_msg["content"]
asyncio.run(provider.terminate())Because I can only see part of the file, you may need to align the helper names and patterns:
- Replace
provider._build_payload()with the actual helper you use to construct the chat payload that currently producespayload["messages"](for example, it might be named_build_chat_request,_build_mimo_payload, etc.). - If the test already builds
payloadand asserts on the first user message and the assistant content, you should:- Keep the existing
payloadconstruction. - Insert the following assertions around where you currently assert on
payload["messages"][1]["content"]:
and, if not already present, factor outassert len(payload["messages"]) == 2 assert payload["messages"][1]["role"] == "assistant"
user_msg/assistant_msgas shown so the test is clearer and more robust. - Keep the existing
- Ensure the
asyncio.run(provider.terminate())call is not duplicated if there is already a termination call in afinallyblock or elsewhere in this test.
There was a problem hiding this comment.
Code Review
This pull request aligns the MiMo TTS style payload with the official documentation by adjusting the request construction to include style control within assistant messages and treating seed text as an optional user hint. The changes involve modifications to the MiMo TTS API source code, updates to configuration hints in default settings and dashboard i18n metadata, and extensions to MiMo TTS tests. The review focuses on correctness and maintainability, ensuring the changes adhere to the intended functionality and coding best practices. All comments provided are valid and do not contradict the repository's rules regarding asyncio atomicity.
|
|
||
| def _build_user_prompt(self) -> str: | ||
| prompt_parts: list[str] = [] | ||
| def _build_user_prompt(self) -> str | None: |
| prompt_parts: list[str] = [] | ||
| def _build_user_prompt(self) -> str | None: | ||
| seed_text = self.seed_text.strip() | ||
| return seed_text or None |
There was a problem hiding this comment.
This line returns the seed_text if it's not empty, otherwise it returns None. Consider adding a comment to clarify this behavior, as it might not be immediately obvious to someone reading the code.
| return seed_text or None | |
| return seed_text or None # Returns seed_text if not empty, otherwise None |
| seed_text = self.seed_text.strip() | ||
| return seed_text or None | ||
|
|
||
| def _build_style_prefix(self) -> str: |
| if "唱歌" in style_content: | ||
| return "<style>唱歌</style>" |
There was a problem hiding this comment.
This logic handles the special case where the style includes "唱歌". It would be beneficial to add a comment explaining why this special handling is necessary, referencing the MiMo documentation if possible.
| if "唱歌" in style_content: | |
| return "<style>唱歌</style>" | |
| # MiMo recommends using only the singing style tag at the very beginning. | |
| if "唱歌" in style_content: | |
| return "<style>唱歌</style>" # Special case for singing style |
|
|
||
| if self.seed_text.strip(): | ||
| prompt_parts.append(self.seed_text.strip()) | ||
| return f"<style>{style_content}</style>" |
There was a problem hiding this comment.
Consider adding a comment here to explain that the <style> tag is being constructed and that this is the standard way to apply styles according to the MiMo TTS API.
| return f"<style>{style_content}</style>" | |
| # Construct the style tag with the combined style content | |
| return f"<style>{style_content}</style>" |
| assert payload["messages"][0] == { | ||
| "role": "user", | ||
| "content": "You are chatting with a close friend.", | ||
| } | ||
| assert payload["messages"][1]["content"] == "<style>开心 四川话</style>hello" |
There was a problem hiding this comment.
The assertion payload["messages"][1]["content"] checks the content of the assistant message. It would be more robust to also assert the role to ensure the message is correctly assigned to the assistant.
| assert payload["messages"][0] == { | |
| "role": "user", | |
| "content": "You are chatting with a close friend.", | |
| } | |
| assert payload["messages"][1]["content"] == "<style>开心 四川话</style>hello" | |
| assert payload["messages"][0] == { | |
| "role": "user", | |
| "content": "You are chatting with a close friend.", | |
| } | |
| assert payload["messages"][1]["role"] == "assistant" | |
| assert payload["messages"][1]["content"] == "<style>开心 四川话</style>hello" |
| assert payload["messages"] == [ | ||
| { | ||
| "role": "assistant", | ||
| "content": "<style>开心</style>hello", | ||
| } |
There was a problem hiding this comment.
The assertion payload["messages"] checks the entire messages array. It would be more robust to assert the role and content of the assistant message separately to ensure the message is correctly constructed.
assert len(payload["messages"]) == 1
assert payload["messages"][0]["role"] == "assistant"
assert payload["messages"][0]["content"] == "<style>开心</style>hello"| ) | ||
| try: | ||
| payload = provider._build_payload("歌词") | ||
| assert payload["messages"][1]["content"] == "<style>唱歌</style>歌词" |
There was a problem hiding this comment.
The assertion payload["messages"][1]["content"] checks the content of the assistant message. It would be more robust to also assert the role to ensure the message is correctly assigned to the assistant.
assert payload["messages"][1]["role"] == "assistant"
assert payload["messages"][1]["content"] == "<style>唱歌</style>歌词"| assert payload["messages"] == [ | ||
| { | ||
| "role": "assistant", | ||
| "content": "hello", | ||
| } |
There was a problem hiding this comment.
The assertion payload["messages"] checks the entire messages array. It would be more robust to assert the role and content of the assistant message separately to ensure the message is correctly constructed.
assert len(payload["messages"]) == 1
assert payload["messages"][0]["role"] == "assistant"
assert payload["messages"][0]["content"] == "hello"| assert payload["messages"][0]["content"] == "reference text" | ||
| assert payload["messages"][1]["content"] == "<style>开心</style>明天就是周五了" | ||
| assert "reference text" not in payload["messages"][1]["content"] |
There was a problem hiding this comment.
These assertions check the content of the user and assistant messages. It would be more robust to also assert the role to ensure the messages are correctly assigned.
assert payload["messages"][0]["role"] == "user"
assert payload["messages"][0]["content"] == "reference text"
assert payload["messages"][1]["role"] == "assistant"
assert payload["messages"][1]["content"] == "<style>开心</style>明天就是周五了"
assert "reference text" not in payload["messages"][1]["content"]
对齐 MiMo 官方语音合成文档,修正 MiMo TTS 风格控制的请求构造方式。
Resolves #6815
概述
本 PR 修复了 MiMo TTS 风格控制的发送逻辑。
此前,
mimo-tts-style-prompt等风格相关内容是作为单独的user消息发送的,这与 MiMo 官方语音合成文档中的推荐用法不一致。根据官方文档,风格控制应通过<style>...</style>标签直接前置到assistant角色中的待合成文本开头。改动内容
astrbot/core/provider/sources/mimo_tts_api_source.pyassistant.content中的待合成文本开头mimo-tts-style-prompt和mimo-tts-dialect合并为开头的<style>...</style>标签mimo-tts-seed-text保留为可选的user消息,不再拼接到待合成文本中<style>唱歌</style>astrbot/core/config/default.pydashboard/src/i18n/locales/zh-CN/features/config-metadata.jsondashboard/src/i18n/locales/en-US/features/config-metadata.jsondashboard/src/i18n/locales/ru-RU/features/config-metadata.jsontests/test_mimo_api_sources.pyassistant.contentseed_text仅作为可选user消息发送与官方文档对齐
对齐文档:
本次对齐的关键点包括:
assistant角色消息中<style>...</style>标签添加到目标文本开头user角色消息为可选参数,仅作为辅助引导信息使用<style>唱歌</style>兼容性说明
验证
本地执行以下检查:
结果:
tests/test_mimo_api_sources.py:10 项测试全部通过ruff check:通过Summary by Sourcery
Align MiMo TTS request construction with the official speech synthesis specification by moving style control into assistant messages and treating seed text as an optional user hint.
Bug Fixes:
Enhancements:
Documentation:
Tests: