Skip to content
Amirsina Torfi edited this page Apr 8, 2017 · 9 revisions


This wiki is aimed to provide a better insight to the procedure for generating the speech commonly used features for which the MFCCs is the most famous one.

why feature extraction ?

Basically is speech and speaker recognition there is a necessity to extract the components of the audio which are relevant to the context and linguistics(in the case of speech recognition) or to the speaker vocal characteristics(in the case of speaker recognition) and discarding all the non-informative parts of audio stream. The extracted features in high levels should be able to represent the vocal tract of what being said in order to have the ability to distinguish between different parts of spoken audio. Mel Frequency Cepstral Coefficents (MFCCs) introduced by Davis and Mermelstein in the 80's, have been widely used and been hard to beat ever since.

Clone this wiki locally