Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added modified MFCC features based on DNN-c and fDNN-c features; it i… #2908

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pegahgh
Copy link
Contributor

@pegahgh pegahgh commented Dec 12, 2018

…s activated using --modified option.

@RuABraun
Copy link
Contributor

RuABraun commented Dec 30, 2018

Any preprint available of the paper mentioned?

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation seems to be out of date w.r.t. the code here.
Can you please let me know if this configuration is the one that you are currently recommending, or did you change it somehow since this?

@@ -48,14 +48,16 @@ struct MelBanksOptions {
BaseFloat vtln_low; // vtln lower cutoff of warping function.
BaseFloat vtln_high; // vtln upper cutoff of warping function: if negative, added
// to the Nyquist frequency to get the cutoff.
bool modified; // If true, use 'modified' MFCC, which uses a breakpoint of
// 900 instead of 700.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to documentation needed here

@@ -69,6 +71,13 @@ struct MelBanksOptions {
opts->Register("vtln-high", &vtln_high,
"High inflection point in piecewise linear VTLN warping function"
" (if negative, offset from high-mel-freq");
opts->Register("modified", &modified,
"Modified MFCCs, based on paper 'An alternative to MFCCs for ASR' "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this documentation to be accurate and fix typos. (the stuff about 1nt and 2nd formant isn't accurate any more, I believe).

a lot of bins, their diamter is defined by a formula and it's a function of
the center frequency f of the bin:
diameter = 30 + 60 f / (f + 500).
so it increases from 30Hz to 90Hz with a knee around 500Hz.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation seems a bit out of date.

// breakpoint_ is 700 for normal mel, or 900 for modified.
inline BaseFloat InverseMelScale(BaseFloat mel_freq) {
if (sec_breakpoint_ > 0.0)
return 3500.0 * (expf((expf(mel_freq) - breakpoint_) / 3500.0) - 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this should be sec_breakpoint_ instead of 3500.

// and for other purposes.
BaseFloat breakpoint_; // The breakpoint in the mel scale: 700 normally;
// 500 if opts.modified is true.
BaseFloat sec_breakpoint_; // The second breakpoint used in the modified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call this either second_breakpoint_ or breakpoint2_.

BaseFloat diameter_floor = (next_center - center_freq) * 1.1,
diameter = 30.0 + 60.0 * (center_freq / (center_freq + breakpoint_));

diameter = pow(diameter * diameter + diameter_floor * diameter_floor, 0.5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sqrt would be easier than pow(.., 0.5).

@kkm000
Copy link
Contributor

kkm000 commented Mar 25, 2019

Tangential thought unrelated to the contents of this MR. It pleases me that someone at last had a look at the feature engineering part of our overall business. MFCC were invented to drop as much "irrelevant" information as possible, when ASR was tiny and puny. With the DNN renaissance, our general approach has changed: just give the network all information you have, and let it figure out what is really correlated. I am not at all sure that the currently "standard" features discard mostly useless information.

The field mostly got rid of HMMs (hooray!) which make no sense in modeling speech signals: they decay exponentially, which speech obviously do not, ye-e-e-e-eah. My general feeling is our features are another dinosaur that has outlived its time.

@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@stale
Copy link

stale bot commented Jul 19, 2020

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

@stale stale bot closed this Jul 19, 2020
@kkm000 kkm000 reopened this Jul 19, 2020
@stale stale bot removed the stale Stale bot on the loose label Jul 19, 2020
@stale
Copy link

stale bot commented Sep 17, 2020

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

@stale stale bot added the stale Stale bot on the loose label Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale bot on the loose
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants