-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable support for different layouts in the MelFilterBank GPU Op #2620
Enable support for different layouts in the MelFilterBank GPU Op #2620
Conversation
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
!build |
CI MESSAGE: [1991582]: BUILD STARTED |
the fft bin index and the window index, respectively. | ||
Expects an input with at least 2 dimensions. | ||
|
||
Please note that the CPU implementation supports only the layout with the last two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would be to rework CPU implementation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should use a separate implementation of the actual processing loop, but it should some 30-50 lines or so (if the kenrel isn't totally botched).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to implement the CPU part. I can do it by contributing to this PR so that we don't introduce the note about the lack of support in the CPU version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a problem in nightly, IMO. What I would not like to see is this partial support making it into tagged release.
CI MESSAGE: [1991582]: BUILD PASSED |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
int nfilter = args_.nfilter; | ||
|
||
std::memset(out, 0, sizeof(T) * nfilter * nwindows); | ||
for (int64_t fftbin = fftbin_start_; fftbin <= fftbin_end_; fftbin++) { | ||
auto *in_row_start = in + fftbin * in_stride; | ||
auto *in_row_start = in + fftbin * nwindows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Having strides had some potential for more generic usage.
int f2 = interval_ends_[m + 2]; | ||
for (; fftbin < f1; ++fftbin) { | ||
auto weight_up = T(1) - weights_down_[fftbin]; | ||
if (args_.normalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it after the loops as val *= norm_factors_[m];
@@ -23,8 +23,9 @@ DALI_SCHEMA(MelFilterBank) | |||
.DocStr(R"code(Converts a spectrogram to a mel spectrogram by applying a bank of | |||
triangular filters. | |||
|
|||
Expects an input with at least 2 dimensions where the last two dimensions correspond to | |||
the fft bin index and the window index, respectively. | |||
Expects an input with at least 2 dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be a requirement - I mean, you can just calculate a single Mel spectrum (not necessarily a spectrogram).
It can be worked around by adding artificial dimensions for use by the kernels.
If, on the other hand, there are more than 2 dimensions, all trailing and leading dimensions can be collapsed.
Can we even handle a case when f
is somewhere in the middle - think, AfB
layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU kernel supports any f axis
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
while (axis > 1) { | ||
in_shape = collapse_dim(in_shape, 0); | ||
out_shape = collapse_dim(out_shape, 0); | ||
axis--; | ||
} | ||
while (axis < in_shape.size() - 2) { | ||
in_shape = collapse_dim(in_shape, in_shape.size() - 2); | ||
out_shape = collapse_dim(out_shape, out_shape.size() - 2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
while (axis > 1) { | |
in_shape = collapse_dim(in_shape, 0); | |
out_shape = collapse_dim(out_shape, 0); | |
axis--; | |
} | |
while (axis < in_shape.size() - 2) { | |
in_shape = collapse_dim(in_shape, in_shape.size() - 2); | |
out_shape = collapse_dim(out_shape, out_shape.size() - 2); | |
} | |
if (axis > 1) { | |
in_shape = collapse_dims(in_shape, {std::make_pair(0, axis)}); | |
out_shape = collapse_dims(out_shape, {0, axis}); | |
axis = 1; | |
} | |
if (axis < in_shape.size() - 2) { | |
in_shape = collapse_dims(in_shape, {std::make_pair(axis+1, in_shape.size()-axis-1)}); | |
out_shape = collapse_dims(out_shape, {std::make_pair(axis+1, in_shape.size()-axis-1)}); | |
} |
in_shape = collapse_dim(in_shape, in_shape.size() - 2); | ||
out_shape = collapse_dim(out_shape, out_shape.size() - 2); | ||
} | ||
bool is_freq_last = axis == in_shape.size() - 1 || in_shape[in_shape.size() - 1] == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -122,22 +122,28 @@ class MelFilterBankGpu<T, Dims>::Impl : public MelFilterImplBase<T, Dims> { | |||
} | |||
} | |||
|
|||
void Setup(ScratchpadEstimator &se, const TensorListShape<Dims> &in_shape) { | |||
void Setup(ScratchpadEstimator &se, TensorListShape<> in_shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void Setup(ScratchpadEstimator &se, TensorListShape<> in_shape) { | |
void Setup(ScratchpadEstimator &se, const TensorListShape<> &in_shape) { |
It looks like you're not modifying it - no need to copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftovers from a previous change I removed finally. Thanks.
(128, 16000.0, 0.0, 8000.0, (10, 513, 100), 'Ctf'), | ||
(128, 48000.0, 4000.0, 24000.0, (513, 100), 'tf'), | ||
(128, 44100.0, 0.0, 22050.0, (513, 100), 'tf'), | ||
(128, 44100.0, 1000.0, 22050.0, (513, 100), 'tf')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test with 1D input.
auto ndim = in_shape.sample_dim(); | ||
args_.axis = layout.empty() ? std::max(0, ndim - 2) : layout.find('f'); | ||
DALI_ENFORCE(args_.axis >= 0 && args_.axis < ndim, | ||
make_string("'f' axis not present in the layout. Got: ", layout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_string("'f' axis not present in the layout. Got: ", layout)); | |
make_string("'f' axis not present in the layout. Got: `", layout, "`")); |
auto ndim = in_shape.sample_dim(); | ||
args_.axis = layout.empty() ? std::max(0, ndim - 2) : layout.find('f'); | ||
DALI_ENFORCE(args_.axis >= 0 && args_.axis < ndim, | ||
make_string("'f' axis not present in the layout. Got: ", layout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_string("'f' axis not present in the layout. Got: ", layout)); | |
make_string("'f' axis not present in the layout. Got: `", layout, "`")); |
interval_ends_.resize(nfilter + 2); | ||
interval_ends_[0] = fftbin_start_; | ||
interval_ends_[nfilter + 1] = fftbin_end_ + 1; | ||
for (int interval = 1; interval < nfilter + 1; interval++, mel += mel_delta_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this loop be merged with previous one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, but I see no reason to do it. know it's initialization only, but the inner loop would likely break the optimizations in the inner one (as the compiler would deem it less frequent).
Also, it would contain an ugly if (interval == 0)
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [2001716]: BUILD STARTED |
CI MESSAGE: [2001716]: BUILD PASSED |
Signed-off-by: Rafal Banas.Rafal97@gmail.com
Why we need this PR?
What happened in this PR?
The frequency axis is queried from the layout string
Operators impl.
docstring, tests
I've extended the existing tests to run on "tf" layout
Docstring updated
JIRA TASK: NA