Skip to content

davidmoten/audio-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

audio-recognition

Matches audio to small vocabulary using fast fourier transforms and Mel Frequency Cepstral Coefficients (MFCCs).

Status: pre-alpha

About the algorithm

A wave file is processed with the following techniques:

  • Pre-emphasis (emphasizes higher frequencies)
  • Framing (chopping up the wav into frames of say 256 values with 156 overlap)
  • Hamming windowing (enforce periodicity of signal so FFT behaves well)
  • Fast Fourier Transform (FFT)
  • Triangular Bandpass Filters using Mel frequencies
  • Discrete Cosine Transform (DCT)

The above processing gives for each frame of 256 values an list of 13 decimal values (MFCCs). The first value is a function of the overall power of the signal during the frame and the rest describe the frequency spectrum.

To compare wave file A with wave file B we calculate the MFCCs for each then use FastDTW to measure the distance after warping between the two sets of MFCCs.

Development resources

Audio signal processing book
Sound pattern matching using FastFourier Transform in Windows Phone
Mel Frequency Cepstral Coefficient (MFCC) tutorial

About

Matches audio to small vocabulary using fast fourier transforms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages