Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio resampling operator #3840

Merged
merged 8 commits into from
Apr 29, 2022
Merged

Audio resampling operator #3840

merged 8 commits into from
Apr 29, 2022

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Apr 22, 2022

Category:

New feature (non-breaking change which adds functionality)

Description:

Adds a standalone audio resampling operator.
The resampling can be parameterized with either:

  • input and output sampling rates (scale and length are calculated from these)
  • scale factor (the length is calculated from the scale)
  • new length (scale is the ratio of lengths)

Additional information:

Affected modules and functionalities:

Audio decoder op (some minor refactoring).

Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: TODO

JIRA TASK: DALI-2750

@mzient mzient changed the title Audio resample op Audio resampling operator Apr 22, 2022
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650544]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650623]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650544]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650916]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650623]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4650916]: BUILD PASSED


struct ResamplingParams {
int lobes = 16;
int lookup_size = 2048;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int lookup_size = 2048;
int lookup_size = 1025;

If we use lobes * 64 + 1 formula then for lobes = 16 we should provide 1025 by default, ... I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's taken from the default parameters to resample function. I could change it in both places.

@@ -0,0 +1,68 @@
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this a test to test_dali_cpu_only.py and test_dali_variable_batch_size.py.

namespace dali {

DALI_SCHEMA(experimental__AudioResample)
.DocStr(R"(Resamples a signal with a different sampling rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.DocStr(R"(Resamples a signal with a different sampling rate.
.DocStr(R"(Resamples a signal with a given sampling rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, neither is true. "given" suggests an absolute value and that's not how this operator works - you don't even need to supply sampling rates directly (just the ratio). Also, I could add a fast path for scale==1 (so if the rate isn't really different, there would be no perf hit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. I just wanted to point out that not necessarily the sampling rate need to be different.

Comment on lines 29 to 30
The resampling ratio can specified either directly or as a ratio of target to source sampling rate
or calculated from the ratio of requested output length to input length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The resampling ratio can specified either directly or as a ratio of target to source sampling rate
or calculated from the ratio of requested output length to input length.
The resampling ratio can specified either directly, or as a ratio of target to source sampling rate,
or calculated from the ratio of requested output length to input length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the rules for comma before "or" are more complex - there should be one if an independent clause follows or.

The resampling ratio can be specified directly, or it can be calculated as a ratio of input and output sampling rates.

Here, we have two independent clauses ("[ratio] can be be specified..." and "it can be calculated...").
In the original text, however, the clauses are dependent - they both refer to "be specified": "directly or as a ratio...".

Having said all that, the sentence is indeed incorrect, because:

  • it's missing the verb
  • it uses "either" in front of multiple choice

I think it should be:

The resampling ratio can be specified directly or as a ratio of target to source sampling rate, or calculated...

but perhaps the second comma is not mandatory, as the last clause can be treated as dependent wrt "be" (specified or calculated).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you

The ``in_rate`` and ``out_rate`` parameters cannot be specified together with ``scale`` or
``out_length``.)",
nullptr, true)
.AddOptionalArg<float>("out_rate", R"(Input sampling rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.AddOptionalArg<float>("out_rate", R"(Input sampling rate.
.AddOptionalArg<float>("out_rate", R"(Output sampling rate.

nullptr, true)
.AddOptionalArg<float>("out_rate", R"(Input sampling rate.

The sampling rate of the input sample. This parameter must be specified together with ``out_rate``.
Copy link
Contributor

@JanuszL JanuszL Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The sampling rate of the input sample. This parameter must be specified together with ``out_rate``.
The sampling rate of the output sample. This parameter must be specified together with ``in_rate``.

.AddOptionalArg<float>("out_rate", R"(Input sampling rate.

The sampling rate of the input sample. This parameter must be specified together with ``out_rate``.
The value is relative to ``out_rate`` and doesn't need to use any specific unit as long as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The value is relative to ``out_rate`` and doesn't need to use any specific unit as long as the
The value is relative to ``in_rate`` and doesn't need to use any specific unit as long as the


0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.)",
50.0f, false)
.AddOptionalArg<DALIDataType>("dtype", R"(The ouput type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is clear enough here, that the input should be either -1, 1 for floats, or full range for integers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, -1..1 for integers makes very little sense. I could add examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. On the other hand we don't do that for other ops....

Comment on lines +159 to +180
kernels::signal::resampling::Resampler R;
std::vector<std::vector<float>> in_fp32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kernels::signal::resampling::Resampler R;
std::vector<std::vector<float>> in_fp32;
kernels::signal::resampling::Resampler R_;
std::vector<std::vector<float>> in_fp32_;

}

template <typename T>
InTensorCPU<float> ConvertInput(std::vector<float> &tmp, const InTensorCPU<T> &in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if ConvertInput needs to be a member. All it uses is just dtype_, other parameters are passed as arguments.

DALI_ENFORCE(quality_ >= 0 && quality_ <= 100, make_string("``quality`` out of range: ",
quality_, "\nValid range is [0..100]."));
if (spec_.TryGetArgument(dtype_, "dtype")) {
// silence useless warning -----------------------------vvvvvvvvvvvvvvvv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this weird T x; (void)x; I get "unused local typedef" warning for T.

Comment on lines +45 to +46
TYPE_SWITCH(dtype_, type2id, T, (AUDIO_RESAMPLE_TYPES), (T x; (void)x;),
(DALI_FAIL(make_string("Unsupported output type: ", dtype_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TYPE_SWITCH(dtype_, type2id, T, (AUDIO_RESAMPLE_TYPES), (T x; (void)x;),
(DALI_FAIL(make_string("Unsupported output type: ", dtype_,
TYPE_SWITCH(dtype_, type2id, T, (AUDIO_RESAMPLE_TYPES),
(T x; (void)x;),
(DALI_FAIL(make_string("Unsupported output type: ", dtype_,

I think this reads better for the TYPE_SWITCH.

ArgValue<float> scale_{"scale", spec_};
ArgValue<int64_t> out_length_{"out_length", spec_};

std::vector<double> scales_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy to confuse with scale_.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4679776]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4679776]: BUILD PASSED

@@ -1020,6 +1020,7 @@ def test_subscript_dim_check():
operator_fn=fn.subscript_dim_check, num_subscripts=1)



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add fn.experimental.audio_resample to the below as well.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Add dynamic range conversion tests.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4689377]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4689377]: BUILD PASSED

@mzient mzient merged commit d6493aa into NVIDIA:main Apr 29, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
Add standalone (CPU) audio resampling operator with tests
* The operator can work with input/output sample rates, scaling factor or target length.
* The resampling is performed with the same algorithm as in audio decoder.
* Type conversion is supported, with 0-centered signed and midrange-centered unsigned semantics

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
Add standalone (CPU) audio resampling operator with tests
* The operator can work with input/output sample rates, scaling factor or target length.
* The resampling is performed with the same algorithm as in audio decoder.
* Type conversion is supported, with 0-centered signed and midrange-centered unsigned semantics

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants