MiniMaxSkills is a collection of AI agent skills powered by MiniMax multimodal models. These skills extend agent capabilities with voice synthesis, music generation, and more.
| Skill | Description | Key Features |
|---|---|---|
| mmVoice_Maker | Complex text-to-speech production skill powered by MiniMax Voice API and FFmpeg. | Support multi-voice synthesis, can create audiobooks, podcasts, etc. Also provides voice cloning (10s–5min audio), voice design (text prompt), audio post-processing (merge, convert, normalize, trim) capabilities. |
| mmEasyVoice | Text-to-speech skill based on MiniMax Speech model. | Quick text-to-speech conversion, simple and easy to use, enables Agent to "speak" |
| Skill | Description | Key Features |
|---|---|---|
| mmMusicMaker | Music generation skill powered by MiniMax Music API. | Support standard songs with lyrics, pure instrumental music, melodic chanting/humming, structured prompt crafting, multiple output formats (hex/url) |
Each skill has its own SKILL.md with detailed usage instructions and reference/ docs. To get started:
- Navigate to the skill directory you want to use
- Read the
SKILL.mdfor the complete workflow - Set the required API key (
MINIMAX_VOICE_API_KEYorMINIMAX_MUSIC_API_KEY), i.e. MiniMax Pay-as-you-go API Key - Follow the step-by-step guide
- Python 3.8+
- MiniMax Pay-as-you-go API Key (Get one here (overseas users), Get one here (Chinese users))
- FFmpeg (required for audio processing in mmVoice_Maker)