-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinate Flip GPU operator #1895
Conversation
dali/operators/coord/coord_flip.cu
Outdated
|
||
void CoordFlipGPU::RunImpl(workspace_t<GPUBackend> &ws) { | ||
const auto &input = ws.InputRef<GPUBackend>(0); | ||
DALI_ENFORCE(input.type().id() == DALI_FLOAT, "Input is expected to be float"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move to SetupImpl (same as CPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cudaMemcpyAsync(sample_descs_gpu_, sample_descs_.data(), sz, cudaMemcpyHostToDevice, stream)); | ||
|
||
dim3 block(32, 32); | ||
auto blocks_per_sample = std::max(32, 1024 / batch_size_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this gridDim.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just a good enough number. The second term is meant to reduce the number of total blocks in case of a batch with many samples
dali/operators/coord/coord_flip.cu
Outdated
CUDA_CALL( | ||
cudaMemcpyAsync(sample_descs_gpu_, sample_descs_.data(), sz, cudaMemcpyHostToDevice, stream)); | ||
|
||
dim3 block(32, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit in using 2D blocks instead of 1D of the same volume?
That would probably simplify the addressing a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
819bf25
to
57cee51
Compare
dali/operators/coord/coord_flip.cu
Outdated
int64_t tid = threadIdx.y * blockDim.x + threadIdx.x; | ||
for (int64_t idx = offset + tid; idx < sample.size; idx += grid_size) { | ||
int d = idx % ndim; | ||
bool flip = static_cast<bool>(sample.flip_dim_mask & (1 << d)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think implicit conversion would do just fine, like it would in if
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dali/operators/coord/coord_flip.cu
Outdated
for (int64_t idx = offset + tid; idx < sample.size; idx += grid_size) { | ||
int d = idx % ndim; | ||
bool flip = static_cast<bool>(sample.flip_dim_mask & (1 << d)); | ||
sample.out[idx] = flip ? T(1) - sample.in[idx] : sample.in[idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flip center?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
57cee51
to
41d28d1
Compare
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
09c81ee
to
6c6834a
Compare
6c6834a
to
561c55c
Compare
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [1291137]: BUILD STARTED |
CI MESSAGE: [1291137]: BUILD PASSED |
Signed-off-by: Joaquin Anton janton@nvidia.com
Why we need this PR?
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Added Coordinate Flip GPU operator
New operator
The operator implementation
Python operator tests added
NA
JIRA TASK: [DALI-1392]