MFCC CPU operator #1577

jantonguirao · 2019-12-13T18:07:09Z

Why we need this PR?

It adds new feature: MFCC (Mel Frequency Cepstrum Coefficients) operator

What happened in this PR?

Explain solution of the problem, new feature added.
Adds a new CPU operator MFCC that calculates the MFFCs from a mel spectrogram
What was changed, added, removed?
What is most important part that reviewers should focus on?
Operator implementation
Was this PR tested? How?
Python operator tests
Were docs and examples updated, if necessary?
Doxygen, schema docstring, jupyter notebook

JIRA TASK: [DALI-1186]

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2019-12-19T14:17:08Z

!build

dali-automaton · 2019-12-19T14:20:17Z

CI MESSAGE: [1040371]: BUILD STARTED

dali-automaton · 2019-12-19T15:03:30Z

CI MESSAGE: [1040371]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>

dali-automaton · 2019-12-19T16:23:10Z

CI MESSAGE: [1040371]: BUILD PASSED

klecki · 2019-12-19T16:28:50Z

dali/operators/audio/mfcc/mfcc.cc

+
+DALI_SCHEMA(MFCC)
+    .DocStr(R"code(Mel Frequency Cepstral Coefficiencs (MFCC).
+Computes MFCCs from a mel spectrogram)code")


Suggested change

Computes MFCCs from a mel spectrogram)code")

Computes MFCCs from a mel spectrogram.)code")

klecki · 2019-12-19T16:31:29Z

dali/operators/audio/mfcc/mfcc.cc

+      R"code(Cepstral filtering (also known as `liftering`) coefficient.
+If `lifter>0`, the MFCCs will be scaled according to the following formula::
+
+  MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)


Arguments that appeared after that have somehow broken formatting. Can you check what is wrong here?

mzient · 2019-12-19T16:36:28Z

dali/operators/audio/mfcc/mfcc.cc

+      0)
+    .AddOptionalArg("lifter",
+      R"code(Cepstral filtering (also known as `liftering`) coefficient.
+If `lifter>0`, the MFCCs will be scaled according to the following formula::


Trailing double colon starts a pre-formatted block.

Suggested change

If `lifter>0`, the MFCCs will be scaled according to the following formula::

If `lifter>0`, the MFCCs will be scaled according to the following formula:

I know, that's what @klecki suggested I can use to display equations and such

Yes, I would stick with it.

JanuszL · 2019-12-19T17:03:26Z

dali/operators/audio/mfcc/mfcc.cc

+        auto &req = kmgr_.Setup<DctKernel>(i, ctx, in_view, args_);
+        output_desc[0].shape.set_tensor_shape(i, req.output_shapes[0][0].shape);
+
+        if (in_view.shape[args_.axis] > max_length) {


Do you check args_.axis is not out of the range?

JanuszL · 2019-12-19T17:04:09Z

dali/operators/audio/mfcc/mfcc.cc

+
+template <typename T, int Dims>
+void ApplyLifter(const kernels::OutTensorCPU<T, Dims> &inout, int axis, const T* lifter_coeffs) {
+  auto* data = inout.data;


Any validation of the axis value?

There is validation of it being >= 0 in the constructor but there is no check for the upper bound. I'll add that in SetupImpl

JanuszL · 2019-12-19T17:07:33Z

dali/operators/audio/mfcc/mfcc.h

+  using Operator<Backend>::RunImpl;
+
+  void CalcLifterCoeffs(int64_t length) {
+    if (static_cast<int64_t>(lifter_coeffs_.size()) >= length || lifter_ == 0)


Suggested change

if (static_cast<int64_t>(lifter_coeffs_.size()) >= length || lifter_ == 0)

if (static_cast<int64_t>(lifter_coeffs_.size()) >= length || lifter_ == 0.0)

RunImpl already does that check. Do we need it here as well?
Also can we make CalcLifterCoeffs a free function and test it independently?

JanuszL · 2019-12-19T18:05:54Z

dali/test/python/test_operator_mfcc.py

+                        yield check_operator_mfcc_vs_python, device, batch_size, shape, \
+                            axis, dct_type, lifter, n_mfcc, norm
+
+#check_operator_mfcc_vs_python(device='cpu', batch_size=3, input_shape=(17,1), axis=0,


leftover, will remove

JanuszL · 2019-12-19T18:07:22Z

dali/test/python/test_operator_mfcc.py

+            for dct_type in [1, 2, 3]:
+                for norm in [False] if dct_type == 1 else [True, False]:
+                    for axis, n_mfcc, lifter, shape in \
+                        [(0, 17, 0.0, (17, 1)),


Can we also have some tests for invalid arguments to check if it fails. I think we don't have many test that triggers asserts.

Ok, I'll add some

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2019-12-23T09:08:56Z

!build

dali-automaton · 2019-12-23T09:10:26Z

CI MESSAGE: [1044580]: BUILD STARTED

dali-automaton · 2019-12-23T10:48:09Z

CI MESSAGE: [1044580]: BUILD PASSED

klecki

In the notebook (I know some of that are not from this PR):
This is often call spectral leakage -> This is often called spectral leakage

We can calculate a mel spectrogram in decibels by using the following DALI pipeline. - I think it would be nice to describe how we do it in DALI, or rather what does this pipeline do - that is what the sequence of applied operators means in regard to what we have described above.

klecki · 2020-01-02T16:18:16Z

dali/operators/audio/mfcc/mfcc.h

+
+    args_.normalize = spec.GetArgument<bool>("normalize");
+    if (args_.normalize) {
+      DALI_ENFORCE(args_.dct_type != 1, "Ortho-normalization is not supported for DCT type I");


Just a nitpick :P

Suggested change

DALI_ENFORCE(args_.dct_type != 1, "Ortho-normalization is not supported for DCT type I");

DALI_ENFORCE(args_.dct_type != 1, "Ortho-normalization is not supported for DCT type I.");

klecki · 2020-01-02T17:29:29Z

dali/operators/audio/mfcc/mfcc.cc

+  auto* data = inout.data;
+  auto shape = inout.shape;
+  auto strides = kernels::GetStrides(shape);
+  kernels::ForAxis(


I feel like this can be optimized a bit, don't know if it's worth the effort, but will probably yield better memory access pattern.
You can have a variant of ForAxis for a case like Dims=5, axis=2 and do soemthing like:

for (x0 in Dim0) for (x1 in Dim1): // we're on our target axis, now we will use the same lifter_coefficient for neighbour // elements, so instead of invoking this lambda and iterating in Dim2 by jumping around // the data, calculate it in groups for coefficient 0, than 1, etc for (x2 in Dim2) multiply all Dim3 * Dim4 elements by lifter[x2]

I agree that in certain configurations this could be optimized for a better access pattern. However, ForAxis is rather a general utility and this would be an optimization that is specific for this particular case (because we can reuse the lifter coefficient). We could write something custom for a certain layout but I think it would be best to keep simplicity/generality here unless we know that there is a real performance problem here.

klecki · 2020-01-02T17:29:59Z

dali/operators/audio/mfcc/mfcc.cc

+If `lifter>0`, the MFCCs will be scaled according to the following formula:
+
+  `MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)`


Maybe use the :: here for the formula?

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2020-01-03T14:58:14Z

!build

dali-automaton · 2020-01-03T15:00:06Z

CI MESSAGE: [1056217]: BUILD STARTED

dali-automaton · 2020-01-03T16:52:38Z

CI MESSAGE: [1056217]: BUILD PASSED

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2020-01-07T13:58:45Z

!build

dali-automaton · 2020-01-07T14:00:10Z

CI MESSAGE: [1060544]: BUILD STARTED

dali-automaton · 2020-01-07T14:52:36Z

CI MESSAGE: [1060544]: BUILD PASSED

jantonguirao added 8 commits December 13, 2019 19:02

DCT kernel

0330834

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Linter fixes

db9d0fb

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Add 2D layout tests

5d2ba76

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Orthogonal normalization

93418ec

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Reuse cosine table

e21c094

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Fix lint

3952d38

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Code review fixes

2c94cea

Signed-off-by: Joaquin Anton <janton@nvidia.com>

[WIP] MFCC Operator

33e3ebf

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao force-pushed the mfcc_cpu_op branch from 8e50c34 to 33e3ebf Compare December 18, 2019 12:15

jantonguirao added 3 commits December 18, 2019 16:52

Merge remote-tracking branch 'upstream/master' into mfcc_cpu_op

071a92e

Liftering

53e41ff

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Add MFCC tests (comparing against librosa's implementation)

cdd7465

Signed-off-by: Joaquin Anton <janton@nvidia.com>

MFCC example in spectrogram notebook

b63f213

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao changed the title ~~[WIP] MFCC CPU operator~~ MFCC CPU operator Dec 19, 2019

jantonguirao requested a review from a team December 19, 2019 16:14

klecki reviewed Dec 19, 2019

View reviewed changes

mzient reviewed Dec 19, 2019

View reviewed changes

JanuszL reviewed Dec 19, 2019

View reviewed changes

jantonguirao added 2 commits December 20, 2019 11:30

Merge remote-tracking branch 'upstream/master' into mfcc_cpu_op

8ab2a04

Code review fixes

cf4104e

Signed-off-by: Joaquin Anton <janton@nvidia.com>

JanuszL approved these changes Dec 20, 2019

View reviewed changes

Code review fixes

8f57ecf

Signed-off-by: Joaquin Anton <janton@nvidia.com>

JanuszL approved these changes Dec 20, 2019

View reviewed changes

klecki requested changes Jan 2, 2020

View reviewed changes

Fixes from code review

0276418

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Fix formatting

eb54ec7

Signed-off-by: Joaquin Anton <janton@nvidia.com>

klecki approved these changes Jan 7, 2020

View reviewed changes

jantonguirao merged commit 7421e77 into NVIDIA:master Jan 7, 2020

	Computes MFCCs from a mel spectrogram)code")
	Computes MFCCs from a mel spectrogram.)code")

	If `lifter>0`, the MFCCs will be scaled according to the following formula::
	If `lifter>0`, the MFCCs will be scaled according to the following formula:

	if (static_cast<int64_t>(lifter_coeffs_.size()) >= length \|\| lifter_ == 0)
	if (static_cast<int64_t>(lifter_coeffs_.size()) >= length \|\| lifter_ == 0.0)

	DALI_ENFORCE(args_.dct_type != 1, "Ortho-normalization is not supported for DCT type I");
	DALI_ENFORCE(args_.dct_type != 1, "Ortho-normalization is not supported for DCT type I.");

		If `lifter>0`, the MFCCs will be scaled according to the following formula:

		`MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)`

MFCC CPU operator #1577

MFCC CPU operator #1577

Conversation

jantonguirao commented Dec 13, 2019 • edited Loading

Why we need this PR?

What happened in this PR?

jantonguirao commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jantonguirao commented Dec 23, 2019

dali-automaton commented Dec 23, 2019

dali-automaton commented Dec 23, 2019

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jantonguirao commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

jantonguirao commented Jan 7, 2020

dali-automaton commented Jan 7, 2020

dali-automaton commented Jan 7, 2020

jantonguirao commented Dec 13, 2019 •

edited

Loading