MCP Server for Gemini Image and Audio generation using Google's Gemini AI models.
This MCP server provides tools to:
- Generate images from text using Gemini's Flash Image model
- Generate audio from text using Gemini 2.5 Flash Preview TTS model
pip install gemini-gen-mcpgit clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp
pip install -e .You need a Google Gemini API key to use this server. Get one from Google AI Studio.
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
Yes | - | Your Google Gemini API key |
GEMINI_DOWNLOAD_PATH |
No | /tmp/gemini_gen_mcp |
Directory where generated files are saved |
Set the environment variables:
export GEMINI_API_KEY='your-api-key-here'
export GEMINI_DOWNLOAD_PATH='/path/to/downloads' # optionalGenerated files are organized by type and date:
- Images:
$GEMINI_DOWNLOAD_PATH/images/YYYY-MM-DD/ - Audio:
$GEMINI_DOWNLOAD_PATH/audios/YYYY-MM-DD/
Each generated file includes a companion .info.json file with generation metadata.
Run the MCP server directly:
gemini-gen-mcpOr as a Python module:
python -m gemini_gen_mcp.serverSee CLAUDE_CONFIG.md for detailed instructions.
Add this to your claude_desktop_config.json:
{
"mcpServers": {
"gemini-gen": {
"command": "gemini-gen-mcp",
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}Generate images from text descriptions using Gemini's image generation models.
Parameters:
prompt(string, required): Text description of the image to generatemodel(string, optional): Gemini model to usegemini-2.5-flash-image(default)gemini-3-pro-image-preview
aspect_ratio(string, optional): Aspect ratio for the generated image (default: "1:1")- Supported:
1:1,2:3,3:2,3:4,4:3,4:5,5:4,9:16,16:9,21:9
- Supported:
temperature(float, optional): Sampling temperature for image generation (default: 1.0)top_p(float, optional): Nucleus sampling parameter (optional)
Example:
{
"prompt": "A serene mountain landscape at sunset with a lake",
"model": "gemini-2.5-flash-image",
"aspect_ratio": "16:9",
"temperature": 1.0
}Generate audio/speech from text using Gemini's TTS models. Output is saved as WAV format.
Parameters:
text(string, required): Text to convert to speechmodel(string, optional): Gemini TTS model to usegemini-2.5-flash-preview-tts(default)gemini-2.5-pro-preview-tts
voice(string, optional): Voice to use for speech generation (default: "Kore")
Available Voices:
| Voice | Style | Voice | Style | Voice | Style |
|---|---|---|---|---|---|
| Zephyr | Bright | Puck | Upbeat | Charon | Informative |
| Kore | Firm | Fenrir | Excitable | Leda | Youthful |
| Orus | Firm | Aoede | Breezy | Callirrhoe | Easy-going |
| Autonoe | Bright | Enceladus | Breathy | Iapetus | Clear |
| Umbriel | Easy-going | Algieba | Smooth | Despina | Smooth |
| Erinome | Clear | Algenib | Gravelly | Rasalgethi | Informative |
| Laomedeia | Upbeat | Achernar | Soft | Alnilam | Firm |
| Schedar | Even | Gacrux | Mature | Pulcherrima | Forward |
| Achird | Friendly | Zubenelgenubi | Casual | Vindemiatrix | Gentle |
| Sadachbia | Lively | Sadaltager | Knowledgeable | Sulafat | Warm |
Example:
{
"text": "Hello, this is a test of the Gemini text to speech system.",
"model": "gemini-2.5-flash-preview-tts",
"voice": "Kore"
}# Clone the repository
git clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp
# Install in editable mode with dependencies
pip install -e .# Install test dependencies
pip install pytest pytest-asyncio
# Run tests
```bash
# uv run pytest tests -v
npm testThis project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions, please use the GitHub Issues page.
- Built with FastMCP
- Powered by Google Gemini AI