Resampling decoder #1582

mzient · 2019-12-17T20:18:23Z

Why we need this PR?

It adds new feature: resampling and downmixing in audio decoder
Refactoring to improve code reuse
It fixes: removes dangerous global variables used in the operator

What happened in this PR?

What solution was applied:
- Major rework of the operator!
Affected modules and functionalities:
- Audio decoder operator + auxiliary functions + decoder primitives
Key points relevant for the review:
- Look for various combinations of arguments - many ifs there...
Validation and testing:
- unit test
- end-to-end python test against ground truth
Documentation (including examples):
- Operator doc-strings

THIS PR SUPERSEDES #1574

JIRA TASK: * don't remember

JanuszL · 2019-12-18T01:02:22Z

dali/test/python/test_operator_audio_decoder.py

+        list.append(np.array(bytearray(f.read()), np.uint8))
+    self.feed_input(self.raw_file, list)
+
+def test_decoded_vs_generated():


I would still compare with librosa to be aware how close we are just in case we need to debug some discrepancies.

OK, I've checked. Librosa does a bad thing here - namely, they do this:

calculate output length as ceil(in_length * out_rate / in_rate)

resample at a slightly different effective rate: in_rate * out_length / in_length
So, they're changing the sampling rate instead of doing what is asked of them. At least it looks like it - I calculated the "corrected" (or rather librosa-like-distorted) rate and generated the new reference signal. Librosa matches this one much better than the original "ground truth".
I don't think it's our goal to reproduce this particular bug - because it is a bug; among other things, it precludes resampling in variable length chunks, because each output chunk will be resampled at a potentially different rate.

I can add some tests with generous epsilon - or abandon the weird rate of 12999 Hz, but I don't think it does much good.
Anyway, since there's a random playback speed distortion as an augmentation (and a ton of others), I don't think anyone will look into 0.01% change in sampling rate vs some other library.

OK, I've checked. Librosa does a bad thing here - namely, they do this:
calculate output length as ceil(in_length * out_rate / in_rate)

Doesn't they have an option to turn off this called fix ?

And maybe we should have a switch to support this. I believe they just wanted to make sure that the calculated number of output samples is an integer number.

we might want to leave the librosa comparing for just the typical conversions (22050 to 16000 and such)

I don't know what happens to the rate then, but the output length was different, so I couldn't compare the arrays directly.

mzient · 2019-12-18T02:26:57Z

!build

dali-automaton · 2019-12-18T02:56:46Z

CI MESSAGE: [1037901]: BUILD STARTED

dali-automaton · 2019-12-18T04:28:29Z

CI MESSAGE: [1037901]: BUILD PASSED

szalpal · 2019-12-18T09:31:38Z

include/dali/core/math_util.h

+DALI_HOST_DEV DALI_FORCEINLINE
+double sinc(double x) {
+  x *= M_PI;
+  if (std::abs(x) < 1e-10)
+    return 1 - x * x * 0.25;  // approximate by a parabola near the pole
+  return std::sin(x) / x;
+}


Two things here:

There should be a documentation specifying, what is x (deg or rad)

Out of curiosity, why this approximation?

What really deserves a comment is that the function calculates the so-called "normalized sinc". That definition already covers what x is.

For x == 0 there's a removable singularity, so we need a check anyway. For very small values of x calculating sin(x)/x may lose more precision than using an approximation.
By the way, the approximation should be 1 - x * x * (1.0f / 6) - and perhaps the limit 1e-10f set even closer to 0.

Checked: for double, around x = 1e-8 the approximation with 1-x^2/6 is more accurate than sin(x)/x. I used long double version as a reference.

szalpal · 2019-12-18T09:32:01Z

include/dali/core/math_util.h

+DALI_HOST_DEV DALI_FORCEINLINE
+float sinc(float x) {
+  x *= M_PI;
+  if (std::abs(x) < 1e-10f)
+    return 1 - x * x * 0.25f;  // approximate by a parabola near the pole
+  return std::sin(x) / x;
+}


As above, please specify what's x

szalpal · 2019-12-18T09:39:55Z

dali/operators/decoder/audio/audio_decoder.h

-  int length;
-  int sample_rate;  /// [Hz]
+  /// @brief Length, in samples, of the recording
+  int64_t length;


In DALI we tend to use "sample" to name a tensor in a batch. While here it's pretty obvious (from the context), that this is certainly not a tensor in a batch, I'd propose to name it "audio sample", to distinguish these two.

It you're talking about the comment, I'll replace it with "(multi-channel) samples", because that's what it is. In libsnd they are using frames for multi-channel samples, but that's not that common and even their own documenation uses "multi-channel samples" in some places.

szalpal · 2019-12-18T09:44:50Z

dali/kernels/signal/downmixing.h

+template <int static_channels = -1, typename Out, typename In>
+void DownmixChannels(
+    Out *out, const In *in, int64_t samples, int channels,
+    const float *weights, bool normalize_weights = false) {


What's the benefit of having channels statically specified in this context?

It helps the compiler unroll the loop over channels, opening new optimization opportunities. For 2D resamping this amounted to 20-30% speedup, I assumed it wouldn't be much different here.

szalpal · 2019-12-18T09:47:23Z

dali/kernels/signal/downmixing_test.cc

+  for (size_t i = 0; i < ref.size(); i++) {
+    EXPECT_NEAR(out[i], ref[i], 1e-6);
+  }


How about using EXPECT_FLOAT_EQ?

Should be OK, I guess. This one is a tad faster and still the dynamic range of values is small enough for a fixed epsilon.

szalpal · 2019-12-18T09:50:11Z

dali/kernels/signal/downmixing.h

+template <int static_channels = -1, typename Out, typename In>
+void DownmixChannels(
+    Out *out, const In *in, int64_t samples, int channels,
+    const float *weights, bool normalize_weights = false) {


I'd add a point in the doc, that this can be done in-place

Not if you want to vectorize the loop.

Can we add a restricted qualifier to in and out?

This is not a C++ qualifier, unfortunately. The __restrict keyword is in C and in CUDA.

GCC and Clang supports __restrict__, do you think we can use it or we want to stick to the standard?

Not if you want to vectorize the loop.

Then is Span_DefaultWeights incorrect? It does downmixing in-place

I'm not adding it here, but I will in resampling which is utterly impossible to do in-place.

OK, after a thought I'll add the in-place info - obviously with the caveat that Out and In should be of the same time - otherwise the compiler assumes no aliasing.

szalpal · 2019-12-18T09:52:18Z

dali/operators/decoder/audio/audio_decoder_op.cc


 namespace dali {

 DALI_SCHEMA(AudioDecoder)
-                .DocStr(R"code(Decode audio data.
+  .DocStr(R"code(Decode audio data.
 This operator is a generic way of handling encoded data in DALI.
 It supports most of well-known audio formats (wav, flac, ogg).

 This operator produces two outputs:

 * output[0]: batch of decoded data


Suggested change

* output[0]: batch of decoded data

* output[0]: batch of decoded data<br/>

szalpal · 2019-12-18T09:58:26Z

dali/operators/decoder/audio/audio_decoder_op.cc

+          "Resampling quality, 0 is lowest, 100 is highest.\n\n"
+          "0 corresponds to 3 lobes of the sinc filter;\n"
+          "50 gives 16 lobes and 100 gives 64 lobes.",


From my perspective, this:

0 corresponds to 3 lobes of the sinc filter 50 gives 16 lobes and 100 gives 64 lobes.

Is an implementation detail. I know, that "quality" doesn't mean too much, but neither the latter. Especially, that it can change in future, e.g. we can add another constrain to the "quality". I'd remove this and leave only relative measure 0..100.

Actually, librosa has some quality enum/string:
"kaiser_best" - 64-zero crossings
"kaiser_fast" - 16-zero crossings
I just followed that example.
Please note that I'm not specifying the envelope/window for the sinc, so we have some freedom here, still (e.g. we can swtich to Kaiser window when we find some implementation of Bessel functions in C++).

szalpal · 2019-12-18T10:08:41Z

dali/operators/decoder/audio/audio_decoder_op.h

+  }
+
+  std::vector<float> target_sample_rates_;
+  kernels::signal::resampling::Resampler resampler_;


Shouldn't the resampler_ ber per-sample?

Not unless per-sample quality is needed - and I have quite strong opinion that it won't and I wouldn't waste memory and cache locality on that assumption unless we see at least potential use case (i.e. if we find it in actual network or at least offline augmentation for that network).

szalpal · 2019-12-18T10:45:35Z

The doc right now doesn't look too pretty:

(look the paragraph-broken list)

szalpal · 2019-12-18T12:53:49Z

Also in the "quality" param

JanuszL · 2019-12-19T00:21:10Z

dali/kernels/signal/resampling.h

+   * The function can seamlessly resample the input and produce the result in chunks.
+   * To reuse memory and still simulate chunk processing, adjust the in/out pointers.
+   */
+  template <typename Out>


Have you considered the static definition of a number of channels? Would it help with the perf?

It would. I can do the VALUE_SWITCH trick here, no problem.

mzient · 2019-12-19T10:13:42Z

!build

dali-automaton · 2019-12-19T10:15:12Z

CI MESSAGE: [1040143]: BUILD STARTED

szalpal · 2019-12-19T11:29:12Z

Aside comment, to the tests. I recently found out, thet gtest discourages using underscore in test names:
https://github.com/google/googletest/blob/master/googletest/docs/faq.md#why-should-test-suite-names-and-test-names-not-contain-underscore
Could you change the names?

dali-automaton · 2019-12-19T12:54:40Z

CI MESSAGE: [1040143]: BUILD PASSED

jantonguirao · 2019-12-19T10:11:56Z

dali/kernels/signal/downmixing.h

+// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.


I'd rename this file to downmix.h

jantonguirao · 2019-12-19T10:13:13Z

dali/kernels/signal/downmixing.h

+    Out *out, const In *in, int64_t samples, int channels,
+    const float *weights, bool normalize_weights = false) {
+  SmallVector<float, 8> normalized_weights;  // 8 channels should be enough for 7.1 audio
+  int actual_channels = static_channels < 0 ? channels : static_channels;


how about the case when static_channels==0?

jantonguirao · 2019-12-19T10:16:09Z

dali/kernels/signal/downmixing.h

+    weights = normalized_weights.data();  // use this pointer now
+  }
+  for (int64_t o = 0, i = 0; o < samples; o++, i += channels) {
+    float sum = ConvertNorm<float>(in[i]) * weights[0];


how about

Suggested change

float sum = ConvertNorm<float>(in[i]) * weights[0];

float sum = 0;

and start the loot at c = 0 ?

That way you get one more addition. Given the simplicity of the inner loop, it may actually matter.

jantonguirao · 2019-12-19T10:17:02Z

dali/kernels/signal/downmixing.h

+}
+
+/**
+ * Downmix data to a single channel.


Suggested change

* Downmix data to a single channel.

* @brief Downmix data to a single channel.

jantonguirao · 2019-12-19T10:19:43Z

dali/kernels/signal/resampling.h

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#ifndef DALI_KERNELS_SIGNAL_RESAMPLING_H_


why not resample.h?

Header / library names are usually nouns. Moreover, what's inside is the Resampler class, not a free function - though resampler.h also doesn't sound well.

jantonguirao · 2019-12-19T12:17:28Z

dali/operators/decoder/audio/audio_decoder_op.cc

+        audio.data[ofs] = ConvertSatNorm<OutputType>(tmp_buf[ofs]);
+      }
+    }
+  } else {


suggestion: I feel there are too many if/else levels here. We could at least get rid of the outer-most one by having this "decode-only" case at the top with an early return

I'd find it a bit confusing. Early returns are great for checks like "do we have to do anything?" or checking for error conditions - this is neither.
I thought about decoding and then postprocessing - but it's impossible, because we're potentially decoding to different buffers.

jantonguirao · 2019-12-19T12:57:22Z

dali/operators/decoder/audio/audio_decoder_op.h

+      double q = quality_;
+      DALI_ENFORCE(q >= 0 && q <= 100, "Resampling quality must be in [0..100] range");
+      // this should give 3 lobes for q = 0, 16 lobes for q = 50 and 64 lobes for q = 100
+      int lobes = std::round(0.007 * q * q - 0.09 * q + 3);


maybe a wikipedia reference here to know where the formula comes from?

It comes from nowhere - it's a solution of a linear system:

a * 0 + b * 0 + c = 3 a * 50^2 + b * 50 + c = 16 a * 100^2 + b * 100 + c = 64

It does exactly, what's written - gives you 3 at 0, 16 at 50 and 64 at 100.

so that quality concept is invented by us?

jantonguirao · 2019-12-19T12:59:27Z

dali/operators/decoder/audio/audio_decoder.h

@@ -53,13 +59,12 @@ class AudioDecoderBase {
 template<typename SampleType>
 class TypedAudioDecoderBase : public AudioDecoderBase {
 public:
-  void Decode(span<char> raw_output) override {
+  ptrdiff_t Decode(span<char> raw_output) override {


why not just int64_t?

That's the return type of span::size().

jantonguirao · 2019-12-19T13:01:45Z

dali/test/python/test_operator_audio_decoder.py

+        list.append(np.array(bytearray(f.read()), np.uint8))
+    self.feed_input(self.raw_file, list)
+
+def test_decoded_vs_generated():


we might want to leave the librosa comparing for just the typical conversions (22050 to 16000 and such)

jantonguirao · 2019-12-19T13:03:29Z

include/dali/core/math_util.h

@@ -167,6 +167,25 @@ constexpr double rad2deg(double rad) {
  return rad * r2d;
 }

+/// @brief Calculates normalized sinc i.e. `sin(pi * x) / (pi * x)`
+DALI_HOST_DEV DALI_FORCEINLINE


unnecessarily duplicated code. You can templatize this and replace 1.0 and 1.0f by T(1)

The threshold for transition to approximation is different - and putting
std::is_same<T, float>::value ? T(1e-5) : T(1e-8)
doesn't sound like a simplification to me.
Going further, it would require having some template magic for integer arguments or long double.

mzient · 2019-12-19T15:03:47Z

Aside comment, to the tests. I recently found out, thet gtest discourages using underscore in test names:
https://github.com/google/googletest/blob/master/googletest/docs/faq.md#why-should-test-suite-names-and-test-names-not-contain-underscore
Could you change the names?

In principle, I could, but they would be harder to read. When I type "Span_DefaultWeights" , it's pretty clear that it's about some "Span" and "DefaultWeights". "SpanDefaultWeights" sounds like something spans default weights (which makes no sense in this context) or "Span, Default, Weights", whatever is "Default", etc. SpanAndDefaultWeights is also so-so at best.
So, I'm willing to take the risk that it won't compile some time in the distant future, with some yet-to-be-seen version of gtest.

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

* Add multi-channel resampling. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

* Add output conversion to resampling. * Essentially rewrite the operator: * Add per-sample sampling rate * Add explicit downmixing option * Add multi-channel resampling * Remove dangerous global buffers (BUG!) Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix window function generator. Fix multi-channel resampling. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix bugs. Lint. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

* Add Python test against librosa resampling * Change sample center to sample zero Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2019-12-19T15:10:35Z

!build

dali-automaton · 2019-12-19T15:15:33Z

CI MESSAGE: [1040438]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2019-12-19T15:44:15Z

!build

dali-automaton · 2019-12-19T15:50:07Z

CI MESSAGE: [1040494]: BUILD STARTED

dali-automaton · 2019-12-19T15:50:25Z

CI MESSAGE: [1040494]: BUILD FAILED

dali-automaton · 2019-12-19T17:15:19Z

CI MESSAGE: [1040494]: BUILD PASSED

mzient force-pushed the resample_decoder branch 4 times, most recently from 3932c70 to efe0d7a Compare December 17, 2019 21:31

JanuszL reviewed Dec 18, 2019

View reviewed changes

mzient changed the title ~~[WIP] Resample decoder~~ Resampling decoder Dec 18, 2019

mzient requested review from szalpal, jantonguirao, klecki and awolant December 18, 2019 02:27

mzient mentioned this pull request Dec 18, 2019

Resample in AudioDecoder #1574

Closed

szalpal reviewed Dec 18, 2019

View reviewed changes

JanuszL reviewed Dec 19, 2019

View reviewed changes

jantonguirao reviewed Dec 19, 2019

View reviewed changes

szalpal added 6 commits December 19, 2019 16:04

wip

44bf740

wip

9968961

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

works

707cf7e

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

lint

ef75c1b

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

get the file back

337497f

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

j/w

379df4f

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal and others added 11 commits December 19, 2019 16:04

comment

235a2f9

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

Refactoring

fd05452

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

* More refactoring.

8dc81b6

* Add multi-channel resampling. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix debug build.

9fa390f

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Add unit tests for resampling functions.

04bcee3

Fix window function generator. Fix multi-channel resampling. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Add python test.

4f1db59

Fix bugs. Lint. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Add configurable resampling quality.

03c214a

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

* Fix Downmixing test

71e322c

* Add Python test against librosa resampling * Change sample center to sample zero Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix review issues.

db31a6e

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Review issues fixed (?).

dab07a4

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

jantonguirao approved these changes Dec 19, 2019

View reviewed changes

JanuszL approved these changes Dec 19, 2019

View reviewed changes

mzient added 2 commits December 19, 2019 16:27

Check sampling rates when resampling.

526c6a9

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

More sampling rate checks and tests.

979f5fd

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient force-pushed the resample_decoder branch from b489ef9 to 979f5fd Compare December 19, 2019 15:43

mzient merged commit ae6e0ee into NVIDIA:master Dec 19, 2019

	* output[0]: batch of decoded data
	* output[0]: batch of decoded data<br/>

	float sum = ConvertNorm<float>(in[i]) * weights[0];
	float sum = 0;

	* Downmix data to a single channel.
	* @brief Downmix data to a single channel.

Resampling decoder #1582

Resampling decoder #1582

Conversation

mzient commented Dec 17, 2019 • edited Loading

Why we need this PR?

What happened in this PR?

THIS PR SUPERSEDES #1574

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Dec 18, 2019

dali-automaton commented Dec 18, 2019

dali-automaton commented Dec 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szalpal commented Dec 18, 2019

szalpal commented Dec 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

szalpal commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Dec 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Dec 19, 2019 • edited Loading

Choose a reason for hiding this comment

mzient commented Dec 19, 2019

mzient commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

mzient commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

mzient commented Dec 17, 2019 •

edited

Loading

mzient Dec 19, 2019 •

edited

Loading

mzient Dec 19, 2019 •

edited

Loading