Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Stride to Subscript and Slice Kernel #5007

Merged
merged 25 commits into from
Aug 24, 2023

Conversation

5had3z
Copy link
Contributor

@5had3z 5had3z commented Aug 20, 2023

Category:

New Features

Description:

Added striding to subscript operator by implementing step in slice kernel and adding parameters.

Additional information:

Affected modules and functionalities:

Slice operator and kernels are the main changes as well as the test suite.

Key points relevant for the review:

Removed dimension flattening and pre-anchoring of slice kernel as this would futher complicate logic or result in more duplicated "specialised" kernels. I would conjecture that for a memory-bandwidth limited kenel such as rearranging/copying data, a little more interger logic should't have too much of an adverse effect.

Check if the test_constant_ranges has sufficient coverage of stride permutations, rather than focusing of the logic in subscript.h (maybe I should've looked at numpy's source code for their range conditions, but I seem to match it on final value).

All the "formatting" changes were done with clang-format and black.

Tests:

Added tests with different start : end : step in operator_2.test_subscript and checked for parity with numpy on both gpu and cpu mats. All operator_2 tests pass.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@5had3z
Copy link
Contributor Author

5had3z commented Aug 20, 2023

If there are performance concerns with the simplification of the gpu kernel, could you please instruct me how to efficiently use nsight to analyse kernels in this project?

@mzient
Copy link
Contributor

mzient commented Aug 21, 2023

Hello @5had3z
Thank you for your contribution!
I have one request regarding this PR - please separate the concerns and create a standalone PR with the .devcontainer. This will streamline the review process.

@5had3z
Copy link
Contributor Author

5had3z commented Aug 21, 2023

Done, reverted all changes not pertaining to striding/stepping, will open new one for devcontainer.

@5had3z 5had3z changed the title Added Stride to Subscript and Slice Kernel + .devcontainer Added Stride to Subscript and Slice Kernel Aug 21, 2023
@5had3z 5had3z mentioned this pull request Aug 21, 2023
18 tasks
Copy link
Contributor

@jantonguirao jantonguirao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you very much for your contribution!

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9465267]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9465267]: BUILD FAILED

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests fail due to incorrectly initialized step.

@mzient
Copy link
Contributor

mzient commented Aug 22, 2023

@5had3z Failing tests aside, could you also rebase your branch on latest main?

@jantonguirao
Copy link
Contributor

@5had3z to reproduce the test failures, you can run:

./dali/python/nvidia/dali/test/dali_kernel_test.bin --gtest_filter=*SliceGPU*

Thanks

… 1 slicekernel uses strides to iterate

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
… anchor and stride modification), fix regression subscript.cc bad types, fix hi/lo anchor logic for negative stride, passes operator_2 tests, temp ommit gpu indexing test

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
…work properly, all tests passing

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
@5had3z
Copy link
Contributor Author

5had3z commented Aug 22, 2023

@mzient I realised that UnitCubeShape wasn't actually a thing after I tried to pull and build so I added TensorShape<>::filled_shape (and changed ::empty_shape to call it with value=0). I don't want to break the remote by syncrhonising my local, so I'm re-cloning, making the changes and pushing again. Will take a while to rebuild and test.

@mzient
Copy link
Contributor

mzient commented Aug 22, 2023

@mzient I realised that UnitCubeShape wasn't actually a thing after I tried to pull and build so I added TensorShape<>::filled_shape (and changed ::empty_shape to call it with value=0). I don't want to break the remote by syncrhonising my local, so I'm re-cloning, making the changes and pushing again. Will take a while to rebuild and test.

If you don't have new local changes, you can get the latest remote by doing:

# assuming that your fork's remote name is 5had3z
git fetch 5had3z
# assuming that your current branch is feat/strided-slice
git reset --hard 5had3z/feat/strided-slice

It's easier than re-cloning and won't trigger quite as lengthy rebuilds.

@5had3z
Copy link
Contributor Author

5had3z commented Aug 22, 2023

@mzient I realised that UnitCubeShape wasn't actually a thing after I tried to pull and build so I added TensorShape<>::filled_shape (and changed ::empty_shape to call it with value=0). I don't want to break the remote by syncrhonising my local, so I'm re-cloning, making the changes and pushing again. Will take a while to rebuild and test.

If you don't have new local changes, you can get the latest remote by doing:

# assuming that your fork's remote name is 5had3z
git fetch 5had3z
# assuming that your current branch is feat/strided-slice
git reset --hard 5had3z/feat/strided-slice

It's easier than re-cloning and won't trigger quite as lengthy rebuilds.

Its alright, I'm trying to watch sopranos at the same time 😂. Build isn't too bad on 5950X if you restrict nvcc to sm86. Running tests now.

@5had3z
Copy link
Contributor Author

5had3z commented Aug 22, 2023

if you restrict nvcc to sm86

Speaking of, upping cmake to >1.24 to take advantage of easily switching between cuda arch "all" and "native" could be useful.
https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html

@mzient
Copy link
Contributor

mzient commented Aug 22, 2023

if you restrict nvcc to sm86

Speaking of, upping cmake to >1.24 to take advantage of easily switching between cuda arch "all" and "native" could be useful. https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html

We already do have a CMake switch for it, e.g. -DCUDA_TARGET_ARCHS="75;80". AFAIK we populate a native CMake argument based on that list, with -real for all listed and -virtual for the oldest one.

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9467364]: BUILD FAILED

assert(dim > 0);
TensorShape<> result;
result.resize(dim);
// TODO: should use std::fill but lack .begin() and .end()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment. Firstly, TensorShape does have begin and end, as clearly indicated by the following ranged for. Secondly, at least in my opinion, assigning elements in a ranged for reads better than the very verbose call to std::fill (STL ranges change that but we're not there yet with the standard support).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, LSP complained it didn't exist and I wasn't thinking carefully (classic 1am moment).
I plan to play with trying to compile with cxx_std_23 later.

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@5had3z Sorry, you've touched a very fundamental header (tensor_shape.h) so the changes need to be made very carefully. On the bright side, I've discovered some long-standing bugs and we have an opportunity to fix them now.

…ary template for dynamic specialisation, remove wrong comment

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9479881]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9479881]: BUILD FAILED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9482171]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9482171]: BUILD PASSED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9484228]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9484655]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9484655]: BUILD PASSED

@jantonguirao jantonguirao merged commit bc133ec into NVIDIA:main Aug 24, 2023
2 of 3 checks passed
@5had3z 5had3z deleted the feat/strided-slice branch August 24, 2023 09:50
JanuszL pushed a commit to JanuszL/DALI that referenced this pull request Oct 13, 2023
Enables numpy style slicing with strides to tensor subscript operator by supporting a `steps` member to slice params.

Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Co-authored-by: Michal Zientkiewicz <michalz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants