A comprehensive Python library for audio analysis and transformations, combining the best open source tools.
- Audio Analysis: Extract features, analyze spectrograms, detect beats and onsets
- Audio Effects: Apply professional audio effects (reverb, delay, compression, EQ, etc.)
- Source Separation: Isolate vocals, drums, bass, and other components
- Audio Transformations: Pitch shift, time stretch, resample
- AI Integration: Use the AudioProcessor agent to run complex audio tasks
- Audio Registry: Efficient in-memory management of audio assets with automatic file saving
- Python 3.8+
- Required libraries listed in
requirements.txt
# Clone the repository
git clone https://github.com/yourusername/audio_toolbox.git
cd audio_toolbox
# Install dependencies
pip install -r requirements.txtfrom audio_toolbox import AudioProcessor
# Load an audio file
processor = AudioProcessor('path/to/audio.wav')
# Get audio features
melspec = processor.analyze.mel_spectrogram()
# Apply an effect
processed_audio = processor.effects.reverb(room_size=0.8, damping=0.5)
# Separate vocals from the track
vocals, accompaniment = processor.separate.vocals()
# Save processed audio
processor.save('output.wav')The AudioProcessor agent provides a conversational interface for working with audio files. It allows you to perform complex audio operations through natural language commands.
- Set up the required environment variables:
# Create a .env file with your OpenRouter API key
echo "OPENROUTER_API_KEY=your_api_key_here" > .env- Run the agent:
python agent.py- Interact with the agent using natural language:
> Load my audio file sample.mp3 and analyze its tempo
- Load and analyze audio files
- Apply audio effects and transformations
- Separate audio into stems (vocals, drums, bass, etc.)
- Generate visualizations
- Chain multiple operations together
Check out examples/test_prompts.md for example interactions with the agent.
This project includes an MCP (Model Context Protocol) server that enables the AudioProcessor agent to interface with the audio processing functionality.
To run the MCP server directly:
python mcp_server.pyThis server exposes all the audio toolbox functionality as tools that can be used by any MCP-compatible client.
The MCP server includes an Audio Registry system that efficiently manages audio data in memory using IDs instead of passing large base64 encoded strings. Key benefits:
- Token Efficiency: Small IDs instead of large encoded strings
- Intuitive References: Descriptive IDs for audio assets (e.g., "vocals", "drums")
- Automatic File Saving: All processed audio components are saved to files
- State Management: Easy tracking of audio processing history
See docs/audio_registry.md for complete details on using the registry system.
Several example scripts are provided in the examples directory:
basic_example.py: Simple demonstration of core audio toolbox functionalityeffects_example.py: Comprehensive examples of audio effects processingsource_separation_example.py: Examples of audio source separationtransformation_example.py: Examples of audio transformationsaudio_registry_example.py: Example of using the Audio Registry systemtest_prompts.md: Example prompts for interacting with the AudioProcessor agent
Here are some example prompts you can try with the AudioProcessor agent, from simple to more complex:
Load my song.mp3 and tell me its tempo, key, and duration.
Load guitar.wav and apply some reverb to make it sound like it's in a concert hall.
Load my song.mp3, separate the vocals, and save them as vocals.wav.
Load drums.wav, increase the tempo by 10%, add some compression to make them punchier, and save the result.
Load song.mp3, separate the vocals and drums, apply reverb to the vocals, boost the low-end of the drums, and then mix them back together at equal volumes.
Load guitar_solo.wav, apply a pitch shifter to transpose it up a fifth, add delay with feedback at 0.4, then apply a phaser effect with a slow rate and save the result as psychedelic_guitar.wav.
Load my band_recording.wav, analyze it to find the tempo and key. Then separate the drums, bass, and other instruments. For the drums, apply compression with a threshold of -20dB and a ratio of 4:1. For the bass, add a slight distortion and some EQ to boost around 80Hz. For the remaining instruments, add a room reverb with medium decay. Finally, mix all processed stems back together with the drums at 90% volume, bass at 100%, and other instruments at 85%, then normalize the mix and save it as processed_band.wav.
Load vocals.mp3, create a spectrogram visualization, analyze pitch contours, and identify sections where vibrato is used. Then apply a subtle chorus effect only to those sections.
Load track1.mp3 and track2.mp3. Analyze their tempos and keys. Time-stretch track2 to match track1's tempo, then create a 15-second crossfade transition between the end of track1 and the beginning of track2. Apply a high-pass filter sweep during the transition and save the result.
Load raw_vocals.wav and do the following production chain: first apply a high-pass filter at 100Hz, then add a compressor with threshold -20dB, ratio 3:1, and makeup gain of 2dB. Next add a plate reverb with 1.5 second decay but keep it 80% dry. Finally, apply a de-esser to smooth out any harsh sibilance and save the result as polished_vocals.wav.
├── agents/ # Agent system prompts
├── docs/ # Documentation
│ └── audio_registry.md # Audio registry documentation
├── examples/ # Example scripts
├── logs/ # Log files
├── output/ # Output directory for processed audio
├── src/ # Source code
│ └── audio_toolbox/ # Main package
│ ├── __init__.py # Package initialization
│ ├── processor.py # Main AudioProcessor class
│ ├── analyze.py # Audio analysis functionality
│ ├── effects.py # Audio effects processing
│ ├── separate.py # Audio source separation
│ └── transform.py # Audio transformations
├── tests/ # Unit tests
├── .well-known/ # Agent card for A2A protocol
├── agent.py # AudioProcessor agent
├── mcp_server.py # MCP server implementation
├── requirements.txt # Package dependencies
└── setup.py # Package setup script
This project combines several powerful open-source audio libraries:
- Librosa: Audio analysis and feature extraction
- Pedalboard: High-quality audio effects
- Demucs: State-of-the-art source separation
- AudioFlux: Advanced audio transformations and analysis
MIT