PyGaborSTM is a Python library for extracting Rate-Scale-Frequency (RSF) representations from audio signals using bio-inspired auditory spectrograms and 2D Gabor filterbanks. Documentation can be found here.
pip install pygaborstmFor now, install from source (see below).
git clone https://github.com/JHU-LCAP/PyGaborSTM.git
cd PyGaborSTM
poetry installFor GPU acceleration, you need:
- NVIDIA GPU with CUDA support
- CUDA Toolkit installed on your system
# Check your CUDA version
nvidia-smiDownload and install the CUDA Toolkit from NVIDIA: https://developer.nvidia.com/cuda-toolkit
After installation, add to your ~/.bashrc or ~/.zshrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATHVerify installation:
nvcc --versionThe library uses CuPy for GPU acceleration. Make sure your CuPy version matches your CUDA version:
- CUDA 11.x →
cupy-cuda11x - CUDA 12.x →
cupy-cuda12x - CUDA 13.x →
cupy-cuda13x
import pygaborstm as stm
# Create model (CPU)
model = stm.PyGaborSTM()
# Create model (GPU)
model = stm.PyGaborSTM(config=stm.Config(use_gpu=True))
# Compute spectrogram and RSF
spec = model.spectrogram(audio)
rsf = model.rsf(spec)
# Visualization
stm.plot.plt_spectrogram(spec)
stm.plot.plt_rsf(rsf)
stm.plot.plt_rsf(rsf, fold=True) # Symmetric foldingSee notebooks/example_usage.ipynb for more examples.
config = stm.Config(
# General
use_gpu=False, # Enable GPU acceleration
sample_rate=16000, # Audio sample rate
# Spectrogram
n_filters=128, # Number of frequency channels
f_min=180.0, # Minimum frequency (Hz)
octaves=5.3, # Frequency range in octaves
# RSF / Gabor
resolution="low", # "low", "medium", "high", "ultra", "max", "overkill"
)PyGaborSTM/
├── pygaborstm/
│ ├── __init__.py # Public API
│ ├── config.py # Config dataclass
│ ├── structs.py # Spectrogram, RSF dataclasses
│ ├── spectrogram.py # AuditorySpectrogram
│ ├── gabor.py # GaborFilterbank
│ ├── core.py # PyGaborSTM class
│ ├── plot.py # Plotting functions
│ ├── analysis.py # MTF analysis helpers
│ ├── backend.py # NumPy/CuPy switching
│ └── gammatone_kernel.py # Custom CUDA SOS kernel
├── notebooks/
└── tests/
poetry install # Install all dependencies
poetry run jupyter notebook # Run notebooks
poetry run pytest -m "not gpu" # Run all tests excluding GPU kernel tests (used in CI/CD)
poetry run pytest -v # Run all tests including GPU kernel tests
poetry run ruff check --fix . # lint and fix
poetry run ruff format . # format codepoetry run mkdocs serveNote: Please lint and format before pushing, as CI will fail otherwise.
Ensure your notebook uses the correct Poetry environment:
# Check Poetry env path
poetry env info --path
# Register kernel (if needed)
poetry run python -m ipykernel install --user --name pygaborstm- Bellur, A., & Elhilali, M. (2017). Feedback-driven sensory mapping adaptation for robust speech activity detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 481-492.