- Python 3.x
- FFmpeg: This tool is used for audio preprocessing. Please ensure it is installed and accessible in your system's PATH.
- On macOS:
brew install ffmpeg - On Linux (Debian/Ubuntu):
sudo apt-get install ffmpeg - On Windows: Download from ffmpeg.org and add to PATH.
- On macOS:
- Clone the repository:
git clone <repository_url> cd <repository_name>
- Install Python dependencies:
pip install -r requirements.txt
To classify an audio file, run the audio_classifier.py script:
python src/audio_classifier.py path/to/your/audiofile.wavArguments:
input_audio_file: (Required) Path to the input audio file (e.g., WAV, MP3, MP4, MOV).--model_path: (Optional) Path to a specific pyAudioAnalysis SVM model file (e.g.,path/to/your/svm_rbf_sm). The script attempts to auto-detect the default speech/music model (svm_rbf_sm) included with yourpyAudioAnalysisinstallation. If this detection fails, or if you want to use a custom model, you must provide this path.--output_dir: (Optional) Directory where the output CSV and.dcsxfiles will be saved. Defaults to the same directory as the input audio file.
Example:
Assuming you are in the project root directory:
python src/audio_classifier.py sample_audio/test_sample.wav --output_dir output_files(Ensure you create an output_files directory first if you use this example, or point to an existing one. The sample_audio/test_sample.wav is provided in the repository.)
The script generates two files in the specified output directory (or the input file's directory by default):
-
<input_filename>_logic_pro.csv:- A CSV file formatted for import as markers into Logic Pro X.
- Columns:
Marker,Position,Name Positionis the start timecode of the segment (HH:MM:SS.mmm).Nameis the classification label (e.g., "speech", "music").
-
<input_filename>.dcsx:- A JSON file containing structured information about the classified segments.
- Includes the source audio filename and a list of cues, each with:
CueIDStartTimecodeLabel(classification)
The project includes unit tests to verify basic functionality. Ensure you have installed dependencies first.
To run the tests, navigate to the project root directory and run:
python -m unittest discover -s testsThe script relies on pyAudioAnalysis and its pre-trained models (e.g., svm_rbf_sm for speech/music classification).
- Automatic Detection: The script first tries to automatically locate the default models from your
pyAudioAnalysisinstallation. - Manual Override: If the automatic detection fails (you'll see an error message like "Model file not found..."), or if you wish to use a different model, you must use the
--model_pathargument to provide the full path to the model file.
Finding your pyAudioAnalysis models:
The models are usually located within the pyAudioAnalysis/data directory of your installed pyAudioAnalysis package. The exact location can vary depending on your Python environment and installation method.
- You can try to find the
pyAudioAnalysisinstallation path (specifically the directory containing thedatafolder) by running this Python snippet:Then, look for aimport os try: import pyAudioAnalysis.audioSegmentation # The 'data' directory is typically alongside audioSegmentation.py paa_module_dir = os.path.dirname(pyAudioAnalysis.audioSegmentation.__file__) print(f"pyAudioAnalysis module directory (contains 'data' folder): {paa_module_dir}") print(f"Potential model path: {os.path.join(paa_module_dir, 'data', 'svm_rbf_sm')}") except ImportError: print("pyAudioAnalysis.audioSegmentation not found. Is pyAudioAnalysis installed correctly?")
datasubdirectory within the printed path. The speech/music SVM model is typically namedsvm_rbf_sm. - If you cloned the
pyAudioAnalysisrepository for development, the models are usually inpyAudioAnalysis/pyAudioAnalysis/data/.
Once you locate the svm_rbf_sm file (or your custom model file), provide its full path to the --model_path argument.