-
-
Notifications
You must be signed in to change notification settings - Fork 104
Home
This wiki is aimed to provide a better insight to the procedure for generating the speech commonly used features for which the MFCCs
is the most famous one.
Basically is speech and speaker recognition there is a necessity to extract the components of the audio which are relevant to the context and linguistics(in the case of speech recognition) or to the speaker vocal characteristics(in the case of speaker recognition) and discarding all the non-informative parts of audio stream. The extracted features in high levels should be able to represent the vocal tract of what being said in order to have the ability to distinguish between different parts of spoken audio. Mel Frequency Cepstral Coefficents (MFCCs) introduced by Davis and Mermelstein in the 80's, have been widely used and been hard to beat ever since.