A machine learning system to classify music genres from audio. This code was written in 2014, so it is messy and out of date.
Reflection (Aug. 2019). Looking back on this, it's interesting that 1-D ConvNets / RNNs utterly crush this clever technique for the same classification task. As Rich Sutton noted in "The Bitter Lesson", compute triumphs over cleverness.
The pessimistic view is that clever researchers have low marginal utility. But the optimistic view suggests that deep learning is here to stay. Since compute will likely continue to accelerate, AI applications will gradually become more accessible to software engineers and the world more broadly.
This implementation uses mel-frequency cepstrum coefficients fed into an SVM for classification. Broadly, the algorithm works as follows (from SciPy
's feature toolbox):
- Take the Fourier transform of a signal.
- Transform the spectra onto the mel scale, due to Stevens, Volkmann, and Newman in 1937.
-
Take logarithms of the powers at each mel frequencies.
-
Take the discrete cosine transform of the list of mel log powers.
- The MFCCs are the amplitudes of the resulting spectrum.
- Try to understand the "essence" of what a musical genre means.
- Try to understand the math and physics behind music.
- Explore Fourier analysis to model sound.
- Explore the connections between Fourier analysis and ConvNets.
- Potentially: explore Haskell implementations of GLMs or deep learning models.
- Potentially: make this code nice.