Basic Information - Models Used
minimax_tts-speech-2.8-hd
Basic Information - Scenario Description
TTS systems still struggle with Chinese news-style text
Is this badcase known and solvable?
Information about environment
git clone https://github.com/Jayden-X-L/cn-news-tts-bench.git
cd cn-news-tts-bench
python3 scripts/validate_dataset.py data/dev.jsonl
python3 scripts/validate_dataset.py data/test_public.jsonl
python3 scripts/score_submission.py
--dataset data/test_public.jsonl
--asr-results results/asr_results/public_test/volcengine_tts.asr.jsonl
--model-id volcengine_tts
--output-dir /tmp/cn-news-tts-repro
python3 scripts/aggregate_leaderboard.py
--per-model-dir results/per_model_public_test
--results-dir /tmp/cn-news-tts-leaderboard/results
--site-dir /tmp/cn-news-tts-leaderboard/site
shasum -a 256 -c release/v0.1_core_checksums.sha256
Call & Execution Information
- model_id:
minimax_tts
- model name:
speech-2.8-hd
- voice:
configured Mandarin news voice
Description
Hi MiniMax team,
I am Shijun Luo from NetEase Cloud Music, where I work on AI news briefing / AI news podcast generation. In our workflow, we use and evaluate major TTS systems to generate spoken news and information podcast content.
During this work, we found that many current TTS systems still struggle with Chinese news-style text, especially compact expressions that frequently appear in real news. These errors are not just voice-quality issues; they can change the information heard by listeners.
For example:
苏-27 may be read as "苏负二十七" instead of the intended aircraft model name.
96-91 may be read as a numeric range instead of a sports score.
620N·m may be read letter by letter or as symbol fragments instead of a torque unit.
3.5% may be read as "三点五百分号" or confused with percentage points.
AI / CEO may be expanded into "人工智能" / "首席执行官" when the original abbreviation should be preserved.
This motivated us to release CN-NewsTTS Bench, a raw-input Chinese news TTS benchmark focused on real-world news reading cases such as dates, numbers, units, named entities, mixed-script text, and text normalization.
The current public leaderboard includes a MiniMax TTS entry:
- model_id:
minimax_tts
- model name:
speech-2.8-hd
- voice:
configured Mandarin news voice
Repository:
https://github.com/Jayden-X-L/cn-news-tts-bench
We would like to invite the MiniMax team to:
- Confirm or correct the public model metadata.
- Submit an official result if the current configuration is not representative.
- Provide a system/model card if available.
Submission guide:
https://github.com/Jayden-X-L/cn-news-tts-bench/blob/main/SUBMIT.md
For questions or corrections, feel free to contact me:
xiaobiluo@gmail.com
Thanks!
Best,
Shijun Luo
Basic Information - Models Used
minimax_tts-speech-2.8-hd
Basic Information - Scenario Description
TTS systems still struggle with Chinese news-style text
Is this badcase known and solvable?
Information about environment
git clone https://github.com/Jayden-X-L/cn-news-tts-bench.git
cd cn-news-tts-bench
python3 scripts/validate_dataset.py data/dev.jsonl
python3 scripts/validate_dataset.py data/test_public.jsonl
python3 scripts/score_submission.py
--dataset data/test_public.jsonl
--asr-results results/asr_results/public_test/volcengine_tts.asr.jsonl
--model-id volcengine_tts
--output-dir /tmp/cn-news-tts-repro
python3 scripts/aggregate_leaderboard.py
--per-model-dir results/per_model_public_test
--results-dir /tmp/cn-news-tts-leaderboard/results
--site-dir /tmp/cn-news-tts-leaderboard/site
shasum -a 256 -c release/v0.1_core_checksums.sha256
Call & Execution Information
minimax_ttsspeech-2.8-hdconfigured Mandarin news voiceDescription
Hi MiniMax team,
I am Shijun Luo from NetEase Cloud Music, where I work on AI news briefing / AI news podcast generation. In our workflow, we use and evaluate major TTS systems to generate spoken news and information podcast content.
During this work, we found that many current TTS systems still struggle with Chinese news-style text, especially compact expressions that frequently appear in real news. These errors are not just voice-quality issues; they can change the information heard by listeners.
For example:
苏-27may be read as "苏负二十七" instead of the intended aircraft model name.96-91may be read as a numeric range instead of a sports score.620N·mmay be read letter by letter or as symbol fragments instead of a torque unit.3.5%may be read as "三点五百分号" or confused with percentage points.AI/CEOmay be expanded into "人工智能" / "首席执行官" when the original abbreviation should be preserved.This motivated us to release CN-NewsTTS Bench, a raw-input Chinese news TTS benchmark focused on real-world news reading cases such as dates, numbers, units, named entities, mixed-script text, and text normalization.
The current public leaderboard includes a MiniMax TTS entry:
minimax_ttsspeech-2.8-hdconfigured Mandarin news voiceRepository:
https://github.com/Jayden-X-L/cn-news-tts-bench
We would like to invite the MiniMax team to:
Submission guide:
https://github.com/Jayden-X-L/cn-news-tts-bench/blob/main/SUBMIT.md
For questions or corrections, feel free to contact me:
xiaobiluo@gmail.com
Thanks!
Best,
Shijun Luo