-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU MFCC operator. #2423
GPU MFCC operator. #2423
Conversation
dali/kernels/signal/dct/dct_gpu.cu
Outdated
DALI_HOST_DEV | ||
float operator()(float val) { return val * coeff_; } | ||
|
||
const float coeff_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either:
const float coeff_; | |
const float coeff; |
or
const float coeff_; | |
private: | |
const float coeff_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go for the second as you don't want to access coeff_
directly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
, in_shape_(batch_size_, dims_) { | ||
if (lifter_) { | ||
FillLifter(); | ||
const int max_ndct = 40; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where this 40 comes from?
I see it repeated in L65, maybe extract it to a common place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
]: | ||
yield check_operator_mfcc_wrong_args, device, batch_size, shape, \ | ||
axis, dct_type, lifter, n_mfcc, norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/kernels/signal/dct/dct_gpu.cu
Outdated
struct LiftersTable {}; | ||
|
||
template <> | ||
struct LiftersTable<true> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct LiftersTable<true> { | |
struct LifterTable<true> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class is not necessary at all. Why do we need this abstraction over a simple pointer to an array of coefficients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzient It's static optimization - to get rid of if
in case of no liftering. But as Joaquin suggested I can just use if
on the static parameter without all those classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
dali/kernels/signal/dct/dct_gpu.cu
Outdated
// The kernel processes data with the shape reduced to 3D. | ||
// Transform is applied over the middle axis. | ||
template <typename OutputType, typename InputType> | ||
template <typename OutputType, typename InputType, bool nonzero> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do either
1)
template <typename OutputType, typename InputType, bool nonzero> | |
template <typename OutputType, typename InputType, typename LifterTable> |
- or remove the zero version and rely on the bool directly:
sample.output[output_idx] = HasLifter ? coeff * out_val : out_val;
where HasLifter is your nonzero
template argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
used HasLifter param
|
||
auto lifter = 0.0f; | ||
coeffs.Calculate(10, lifter); | ||
ASSERT_TRUE(coeffs.empty()); | ||
|
||
lifter = 1.234f; | ||
coeffs.Calculate(10, lifter); | ||
check_lifter_coeffs(coeffs, lifter, 10); | ||
check_lifter_coeffs(span<const float>(coeffs.data(), coeffs.size()), lifter, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_cspan(coeffs)
should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even make_span<coeffs>
would - there's an implicit conversion to a span of const-qualified objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/kernels/signal/dct/dct_gpu.cu
Outdated
explicit Lifter(float coeff): coeff_(coeff) {} | ||
|
||
DALI_HOST_DEV | ||
float operator()(float val) { return val * coeff_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float operator()(float val) { return val * coeff_; } | |
constexpr float operator()(float val) const { return val * coeff_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
dali/kernels/signal/dct/dct_gpu.cu
Outdated
const float coeff_; | ||
}; | ||
|
||
struct IdLifter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use identity
from "core/util.h"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
dali/kernels/signal/dct/dct_gpu.cu
Outdated
template <> | ||
struct LiftersTable<false> { | ||
DALI_HOST_DEV | ||
IdLifter lifter(int) {return IdLifter{}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IdLifter lifter(int) {return IdLifter{}; } | |
static identity lifter(int) {return {}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove LifterTabble
dali/kernels/signal/dct/dct_gpu.cu
Outdated
__global__ void ApplyDct(const typename Dct1DGpu<OutputType, InputType>::SampleDesc *samples, | ||
const BlockDesc<3> *blocks) { | ||
const BlockDesc<3> *blocks, LiftersTable<nonzero> lifters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const BlockDesc<3> *blocks, LiftersTable<nonzero> lifters) { | |
const BlockDesc<3> *blocks, const float *lifter_coeffs = nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/kernels/signal/dct/dct_gpu.cu
Outdated
@@ -51,7 +90,7 @@ __global__ void ApplyDct(const typename Dct1DGpu<OutputType, InputType>::SampleD | |||
out_val += *input * cos_row[i]; | |||
input += in_stride[1]; | |||
} | |||
sample.output[output_idx] = out_val; | |||
sample.output[output_idx] = lifter(out_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample.output[output_idx] = lifter(out_val); | |
if (lifter_coeffs) | |
out_val *= lifter_coeffs[y]; | |
sample.output[output_idx] = out_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used static parameter
dali/kernels/signal/dct/dct_gpu.cu
Outdated
span<const DctArgs> args, | ||
int axis) { | ||
span<const DctArgs> args, int axis, | ||
span<const float>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer pass unused arguments to setup. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/kernels/signal/dct/dct_gpu.cu
Outdated
@@ -120,7 +159,8 @@ template <typename OutputType, typename InputType> | |||
DLL_PUBLIC void Dct1DGpu<OutputType, InputType>::Run(KernelContext &ctx, | |||
const OutListGPU<OutputType> &out, | |||
const InListGPU<InputType> &in, | |||
span<const DctArgs>, int) { | |||
span<const DctArgs>, int, | |||
span<const float> lifter_coeffs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a device pointer? If so, it should be marked as such:
span<const float> lifter_coeffs) { | |
span<const float> lifter_coeffs_dev) { |
or just use
span<const float> lifter_coeffs) { | |
InTensorGPU<float, 1> lifter_coeffs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used InTensorGPU
dali/kernels/signal/dct/dct_gpu.h
Outdated
@@ -77,12 +77,14 @@ class DLL_PUBLIC Dct1DGpu { | |||
|
|||
DLL_PUBLIC KernelRequirements Setup(KernelContext &context, | |||
const InListGPU<InputType> &in, | |||
span<const DctArgs> args, int axis); | |||
span<const DctArgs> args, int axis, | |||
span<const float> lifter_coeffs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used in Setup, don't add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/kernels/signal/dct/dct_gpu.h
Outdated
|
||
DLL_PUBLIC void Run(KernelContext &context, | ||
const OutListGPU<OutputType> &out, | ||
const InListGPU<InputType> &in, | ||
span<const DctArgs> args, int axis); | ||
span<const DctArgs> args, int axis, | ||
span<const float> lifter_coeffs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise - preferably use InTesnorGPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
2ebb7b5
to
a77641e
Compare
dali/operators/audio/mfcc/mfcc.cu
Outdated
int added_length = target_length - start_idx; | ||
coeffs_.resize(target_length, stream); | ||
int threads = std::min(added_length, 256); | ||
CalcLifterKernel<<<1, threads, 0, stream>>>(coeffs_.data(), start_idx, target_length, lifter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very small job - perhaps it'd be better to utilize more SMs and launch div_ceil(added_length, threads)
blocks and remove the loop from the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking another angle at it: since this is a very small job and it's done just once, I doubt there's any performance gain from calculating it on device - and maybe there's some value in calculating the coeffs on host and copying them to device, so they match exactly across backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved coefficients calculation to CPU.
dali/operators/audio/mfcc/mfcc.cu
Outdated
__global__ void CalcLifterKernel(float *coeffs, int64_t start_idx, int64_t target_length, | ||
float lifter) { | ||
float ampl_mult = lifter / 2; | ||
float phase_mult = M_PI / lifter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float phase_mult = M_PI / lifter; | |
float phase_mult = static_cast<float>(M_PI) / lifter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/operators/audio/mfcc/mfcc.cu
Outdated
float ampl_mult = lifter / 2; | ||
float phase_mult = M_PI / lifter; | ||
for (int64_t i = start_idx + threadIdx.x; i < target_length; i += blockDim.x) | ||
coeffs[i] = 1.0 + ampl_mult * sinf(phase_mult * (i + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coeffs[i] = 1.0 + ampl_mult * sinf(phase_mult * (i + 1)); | |
coeffs[i] = 1.0f + ampl_mult * sinf(phase_mult * (i + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const workspace_t<GPUBackend> &ws) { | ||
GetArguments(ws); | ||
auto &input = ws.InputRef<GPUBackend>(0); | ||
TYPE_SWITCH(input.type().id(), type2id, T, MFCC_SUPPORTED_TYPES, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot going on inside. I'd extract it to SetupTyped - this would give superior compiler diagnostics and precise run-time error traceback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added statically typed detail::SetupKernel
dali/operators/audio/mfcc/mfcc.cu
Outdated
using Kernel = kernels::signal::dct::Dct1DGpu<T>; | ||
auto in_view = view<const T>(input); | ||
auto out_view = view<T>(ws.OutputRef<GPUBackend>(0)); | ||
span<const float> lifter_span(lifter_coeffs_.data(), lifter_coeffs_.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, for GPU data use a tensor view:
span<const float> lifter_span(lifter_coeffs_.data(), lifter_coeffs_.size()); | |
auto lifter_coeffs = make_tensor_gpu<1>(lifter_coeffs_.data(), {lifter_coeffs_.size()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/operators/audio/mfcc/mfcc.cu
Outdated
int64_t max_ndct = 0; | ||
for (int i = 0; i < nsamples_; ++i) { | ||
int64_t ndct = output_desc[0].shape[i][axis_]; | ||
if (ndct > max_ndct) max_ndct = ndct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally break line for if statements without braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
!build |
CI MESSAGE: [1754530]: BUILD STARTED |
__global__ void ApplyDct(const typename Dct1DGpu<OutputType, InputType>::SampleDesc *samples, | ||
const BlockDesc<3> *blocks) { | ||
const BlockDesc<3> *blocks, const float *lifter_coeffs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (linter might complain?)
const BlockDesc<3> *blocks, const float *lifter_coeffs) { | |
const BlockDesc<3> *blocks, const float *lifter_coeffs) { |
CI MESSAGE: [1754530]: BUILD PASSED |
Extend GPU DCT kernel to support liftering and add MFCC operator for GPU. Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
Why we need this PR?
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
DCT kernel was extended to support lifter coefficients. The operator is a simple wrapper.
DCT GPU kernel, new MFCC GPU operator.
Changes in the kernel. Lifter coefficients calculation.
I've extended DCT kernel tests to support lifter coefficients and added GPU to MFCC python tests.
NA
JIRA TASK: DALI-1664