Skip to content

Mfcc Config

AI-TOOLKIT edited this page Feb 17, 2018 · 7 revisions

MFCC Features Calculation Configuration Options (mfcc.conf)

This configuration file is passed to the MakeMfcc() function in VoiceBridge. All possible parameters in this configuration file together with their default values are documented hereunder. In most of the cases the default values will be ok and you will not need to use and change these parameters.

Direct MfccOptions

num-ceps : Number of cepstral coeffs in MFCC computation (including C0).

type: int32, default: 13, usage example: --num-ceps=13

use-energy : Use energy (not C0) in MFCC computation.

type: bool, default: true, usage example: --use-energy=false

energy-floor : Floor on energy (absolute, not relative) in MFCC computation. Not in log scale: a small value e.g. 1.0e-10.

type: BaseFloat, default: 0.0, usage example: --energy-floor=0.0

raw-energy : If true, compute energy before preemphasis and windowing.

type: bool, default: true, usage example: --raw-energy=true

MfccOptions for Frame Extraction (FrameExtractionOptions)

sample-frequency : Waveform data sample frequency (must match the waveform file, if specified there).

type BaseFloat, default: 16000, usage example: --sample-frequency=16000

frame-length : Frame length in milliseconds.

type BaseFloat, default: 25.0, usage example: --frame-length=25.0

frame-shift : Frame shift in milliseconds.

type BaseFloat, default: 10.0, usage example: --frame-shift=10.0

preemphasis-coefficient : Coefficient for use in signal preemphasis.

type BaseFloat, default: 0.97, usage example: --preemphasis-coefficient=0.97

remove-dc-offset : Subtract mean from waveform on each frame before FFT.

type bool, default: true, usage example: --remove-dc-offset=true

dither : Amount of dithering, 0.0 means no dither. type BaseFloat, default: 1.0, usage example: --dither=1.0

window-type : Type of window: "hamming","hanning", "povey", "rectangular", "blackman".

type std::string, default: povey, usage example: --window-type=povey

NOTE: "povey" is a window made by Daniel Povey to be similar to Hamming but to go to zero at the edges, it's pow((0.5 - 0.5cos(n/N2*pi)), 0.85). According to Dan this is the best option to choose.

blackman-coeff : Constant coefficient for generalized Blackman window.

type BaseFloat, default: 0.42, usage example: --blackman-coeff=0.42

round-to-power-of-two : If true, round window size to power of two by zero-padding input to FFT.

type bool, default: true, usage example: --round-to-power-of-two=true

snip-edges : If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends.

type bool, default: true, usage example: --snip-edges=true

allow-downsample : If true, allow the input waveform to have a higher frequency than the specified --sample-frequency (and we'll downsample).

type bool, default: false, usage example: --allow-downsample=false

MfccOptions for Mel Banks (MelBanksOptions)

num-mel-bins : Number of triangular mel-frequency bins".

type int32, default: 23, usage example: --num-mel-bins=23

NOTE: defaults the mel-banks to 23 for the MFCC computations, this seems to be common for 16khz-sampled data, but for 8khz-sampled data, 15 may be better.

low-freq : Low cutoff frequency for mel bins.

type BaseFloat, default: 20.0, usage example: --low-freq=20.0

high-freq : High cutoff frequency for mel bins (if < 0 -> offset from Nyquist, 0 -> no cutoff, negative). Added to the Nyquist frequency to get the cutoff.

type BaseFloat, default: 0.0, usage example: --high-freq=0.0

vtln-low : Low inflection point in piecewise linear VTLN warping function.

type BaseFloat, default: 100.0, usage example: --vtln-low=100.0

vtln-high : High inflection point in piecewise linear VTLN warping function (if negative, offset from high-mel-freq). Added to the Nyquist frequency to get the cutoff.

type BaseFloat, default: -500.0, usage example: --vtln-high=-500.0

debug-mel : Print out debugging information for mel bin computation.

type bool, default: false, usage example: --debug-mel=false

NOTE: this information is based on Kaldi (http://kaldi-asr.org).

You may also visit the VoiceBridge official website for more info: VoiceBridge website.