Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU MultiPaste #2681

Merged
merged 34 commits into from
Mar 12, 2021
Merged

GPU MultiPaste #2681

merged 34 commits into from
Mar 12, 2021

Conversation

TheTimemaster
Copy link
Contributor

@TheTimemaster TheTimemaster commented Feb 16, 2021

Why we need this PR?

  • It adds a new feature needed to do MultiPaste on GPU

What happened in this PR?

  • What solution was applied:
    The workflow is, that images are divided into an unequal grid where each cell contains data from only one paste, that one which happened last over this area. This grid is counted in the operator. Then kernel just copies data, but before each pixel, it has to check if it should jump to the next input cell vertically or horizontally.

  • Affected modules and functionalities:
    GPU Implementation for Multipaste operator and kernel.

  • Key points relevant for the review:
    MultiPaste GPU kernel + multipaste.cu operator

  • Validation and testing:
    For the grid sweeping algorithm, it was implemented and tested outside dali: https://pastebin.com/0QSnu3Bw. Now automatic tests are added that run the same cases but on GPU.

  • Documentation (including examples):
    No new documentation added

JIRA TASK: NA

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
…d few TODO items

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
… output_size.

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
…tes.

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
…emoved intersection checking bug and memory alloc bug.

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
@TheTimemaster TheTimemaster changed the title [WIP] GPU MultiPaste GPU MultiPaste Feb 24, 2021
Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
@TheTimemaster
Copy link
Contributor Author

The GPU operator seems to be working now and passes the tests most of time.

Sometimes though, with a roughly 20% chance, there is a bug that produces random non-filled regions in the output:
image

I made non-pasted regions be gray, not black. That would produce the following output, that has a gray background, but still has black glitched regions:
image

Finally, I tried removing copying of the image completely, and instead, it would assign a gray shade depending on the grid cell. This should never output 0 to any pixel, also the values were not taken from any pointer. Yet the glitched black regions remained:
image

Also, the operation did not crash, so each thread should have gone through the whole for loop, yet it seems some of them sometimes do not.

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
const InListGPU<InputType, ndims> &in,
span<paste::MultiPasteSampleInput<ndims - 1>> samples
) {
assert(ndims == 3);
Copy link
Contributor

@mzient mzient Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work with rectangles that are out of output bounds?
For example:
output is 640x480
input starts at 600, 400 and is specified as 100x100.

We should either clip (preferable) or throw, but I don't see it happening here.
If we clip, then we should completely reject boxes that are totally outside the output area.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I added throwing exceptions from the operator if some box is outside in/out bounds.

@@ -232,7 +237,8 @@ def test_operator_multipaste():
[4, 2, (128, 128), (128, 128), False, False, False, False, False, types.UINT8],
[4, 2, (128, 128), (128, 128), False, False, False, False, False, types.INT16],
[4, 2, (128, 128), (128, 128), False, False, False, False, False, types.INT32],
[4, 2, (128, 128), (128, 128), False, False, False, False, False, types.FLOAT]
[4, 2, (128, 128), (128, 128), False, False, False, False, False, types.FLOAT],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests with out-of-bounds/clipping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added those tests now

Comment on lines 150 to 153
if use_gpu:
kwargs["device"] = "gpu"

pasted = fn.multi_paste(resized.gpu() if use_gpu else resized, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if use_gpu:
kwargs["device"] = "gpu"
pasted = fn.multi_paste(resized.gpu() if use_gpu else resized, **kwargs)
pasted = fn.multi_paste(resized.gpu() if use_gpu else resized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove kwargs["device"], but there are other kwargs that still have to be passed

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
show_images(bs, r)
manual_verify(bs, input, r, in_idx_l, in_anchors_l, shapes_l, out_anchors_l, [out_size + (3,)] * bs, out_dtype)

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nose has assert_raises for this kind of test. Maybe it would be good to split it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the assert_raises method, but it only asserts the type (RuntimeError) of the exception, and any DALI_FAIL results in this type

@awolant
Copy link
Contributor

awolant commented Mar 10, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2150773]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2150773]: BUILD FAILED

@JanuszL
Copy link
Contributor

JanuszL commented Mar 10, 2021

Clang complains about:

              ^
In file included from /opt/dali/dali/operators/image/paste/multipaste.cu:17:
/opt/dali/dali/kernels/imgproc/paste/paste_gpu.h:213:38: error: comparison of integers of different signs: 'unsigned int' and 'const int' [-Werror,-Wsign-compare]
  while (threadIdx.x + block.start.x >= my_patches[min_patch_x].patch_end[1]) min_patch_x++;
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/dali/dali/kernels/imgproc/paste/paste_gpu.h:286:12: note: in instantiation of function template specialization 'dali::kernels::paste::PasteKernel<unsigned char, int, 2>' requested here
    paste::PasteKernel<<<patch_dim, block_dim, 0, stream>>>(
           ^
/opt/dali/dali/kernels/kernel_manager.h:249:24: note: in instantiation of member function 'dali::kernels::PasteGPU<unsigned char, int, 3>::Run' requested here
    inst.get<Kernel>().Run(context, std::forward<OutInArgs>(out_in_args)...);
                       ^
/opt/dali/dali/kernels/kernel_manager.h:221:5: note: in instantiation of function template specialization 'dali::kernels::KernelManager::Run<dali::kernels::PasteGPU<unsigned char, int, 3>, dali::TensorListView<dali::StorageGPU, unsigned char, 3> &, dali::TensorListView<dali::StorageGPU, const int, 3> &, dali::span<dali::kernels::paste::MultiPasteSampleInput<2>, -1> >' requested here
    Run<Kernel>(sa, instance_idx, context, std::forward<OutInArgs>(out_in_args)...);
    ^
/opt/dali/dali/operators/image/paste/multipaste.cu:102:19: note: in instantiation of function template specialization 'dali::kernels::KernelManager::Run<dali::kernels::PasteGPU<unsigned char, int, 3>, dali::TensorListView<dali::StorageGPU, unsigned char, 3> &, dali::TensorListView<dali::StorageGPU, const int, 3> &, dali::span<dali::kernels::paste::MultiPasteSampleInput<2>, -1> >' requested here
  kernel_manager_.Run<Kernel>(ws.thread_idx(), 0, ctx, out_view, in_view, make_span(samples));
                  ^
/opt/dali/dali/operators/image/paste/multipaste.cu:110:15: note: in instantiation of function template specialization 'dali::MultiPasteGPU::RunImplExplicitlyTyped<int, unsigned char>' requested here
              RunImplExplicitlyTyped<InputType, OutputType>(ws);

@TheTimemaster - can you check it?

@JanuszL
Copy link
Contributor

JanuszL commented Mar 10, 2021

You can try EXTRA_CMAKE_OPTIONS="-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DDALI_CLANG_ONLY=ON" CUDA_VERSION=100 ./build.sh to run clang build inside the container.

Signed-off-by: Piotr Kowalewski <piotr.kowalewski.main@gmail.com>
@JanuszL
Copy link
Contributor

JanuszL commented Mar 11, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2157524]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2157524]: BUILD PASSED

@awolant awolant merged commit d4e3db0 into NVIDIA:master Mar 12, 2021
@JanuszL JanuszL mentioned this pull request May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants