Python scripts accompanying the book "An Introduction to Audio Content Analysis". The source code shows example implementations of basic approaches, features, and algorithms for music audio content analysis.
All implementations are also available in:
The top-level functions are (alphabetical):
computeBeatHisto: calculates a simple beat histogram
computeChords: simple chord recognition
computeFeature: calculates instantaneous features
computeFingerprint: audio fingerprint extraction
computeKey: calculates a simple key estimate
computeMelSpectrogram: computes a mel spectrogram
computeNoveltyFunction: simple onset detection
computePitch: calculates a fundamental frequency estimate
computeSpectrogram: computes a magnitude spectrogram
The names of the additional functions follow the following conventions:
Feature*: instantaneous features
Pitch*: pitch tracking approach
Novelty*: novelty function computation
Tool*: additional helper functions and basic algorithms such as
- Blocking of audio into overlapping blocks
- Pre-processing audio
- Conversion (freq2bark, freq2mel, freq2midi, mel2freq, midi2freq)
- Filterbank (Gammatone)
- Gaussian Mixture Model
- Principal Component Analysis
- Feature Selection
- Dynamic Time Warping
- K-Means Clustering
- K Nearest Neighbor classification
- Non-Negative Matrix Factorization
- Viterbi algorithm
The latest full documentation of this package can be found at https://alexanderlerch.github.io/pyACA.
Please note that the provided code examples are only intended to showcase algorithmic principles – they are not entirely suitable for practical usage without parameter optimization and additional algorithmic tuning. Rather, they intend to show how to implement audio analysis solutions and to facilitate algorithmic understanding to enable the reader to design and implement their own analysis approaches.
- accessibility, i.e., clear algorithmic implementation from scratch without obfuscation by using 3rd party implementations,
- maintainability through independence of 3rd party code. This design choice brings, however, some limitations; for instance, reading of non-RIFF audio files is not supported and the machine learning models are very simple.
Consistent variable naming and formatting, as well as the choice for simple implementations allow for easier parsing. The readability of the source code will sometimes come at the cost of lower performance.
All code is matched exactly with Matlab implementations and the equations in the book. This also means that the python code might violate typical python style conventions in order to be consistent.
related repositories and links
Other, related repositories are
- ACA-Slides: slide decks for teaching and learning audio content analysis
- ACA-Plots: Matlab scripts for generating all plots in the book and slides
The main entry point to all book-related information is AudioContentAnalysis.org
pip install pyACA
example 1: computation and plot of the Spectral Centroid
import pyACA import matplotlib.pyplot as plt # file to analyze cPath = "c:/temp/test.wav" # extract feature [v, t] = pyACA.computeFeatureCl(cPath, "SpectralCentroid") # plot feature output plt.plot(t,np.squeeze(v))
example 2: Computation of two features (here: Spectral Centroid and Spectral Flux)
import pyACA # read audio file cPath = "c:/temp/test.wav" [f_s, afAudioData] = pyACA.ToolReadAudio(cPath) # compute feature [vsc, t] = pyACA.computeFeature("SpectralCentroid", afAudioData, f_s) [vsf, t] = pyACA.computeFeature("SpectralFlux", afAudioData, f_s)