-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Stride to Subscript and Slice Kernel #5007
Conversation
If there are performance concerns with the simplification of the gpu kernel, could you please instruct me how to efficiently use nsight to analyse kernels in this project? |
Hello @5had3z |
Done, reverted all changes not pertaining to striding/stepping, will open new one for devcontainer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you very much for your contribution!
CI MESSAGE: [9465267]: BUILD STARTED |
CI MESSAGE: [9465267]: BUILD FAILED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests fail due to incorrectly initialized step.
@5had3z Failing tests aside, could you also rebase your branch on latest main? |
@5had3z to reproduce the test failures, you can run:
Thanks |
… 1 slicekernel uses strides to iterate Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
… anchor and stride modification), fix regression subscript.cc bad types, fix hi/lo anchor logic for negative stride, passes operator_2 tests, temp ommit gpu indexing test Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
…work properly, all tests passing Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
@mzient I realised that UnitCubeShape wasn't actually a thing after I tried to pull and build so I added TensorShape<>::filled_shape (and changed ::empty_shape to call it with value=0). I don't want to break the remote by syncrhonising my local, so I'm re-cloning, making the changes and pushing again. Will take a while to rebuild and test. |
If you don't have new local changes, you can get the latest remote by doing: # assuming that your fork's remote name is 5had3z
git fetch 5had3z
# assuming that your current branch is feat/strided-slice
git reset --hard 5had3z/feat/strided-slice It's easier than re-cloning and won't trigger quite as lengthy rebuilds. |
Its alright, I'm trying to watch sopranos at the same time 😂. Build isn't too bad on 5950X if you restrict nvcc to sm86. Running tests now. |
Speaking of, upping cmake to >1.24 to take advantage of easily switching between cuda arch "all" and "native" could be useful. |
We already do have a CMake switch for it, e.g. |
Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
CI MESSAGE: [9467364]: BUILD FAILED |
include/dali/core/tensor_shape.h
Outdated
assert(dim > 0); | ||
TensorShape<> result; | ||
result.resize(dim); | ||
// TODO: should use std::fill but lack .begin() and .end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this comment. Firstly, TensorShape
does have begin
and end
, as clearly indicated by the following ranged for. Secondly, at least in my opinion, assigning elements in a ranged for
reads better than the very verbose call to std::fill (STL ranges change that but we're not there yet with the standard support).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, LSP complained it didn't exist and I wasn't thinking carefully (classic 1am moment).
I plan to play with trying to compile with cxx_std_23 later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@5had3z Sorry, you've touched a very fundamental header (tensor_shape.h) so the changes need to be made very carefully. On the bright side, I've discovered some long-standing bugs and we have an opportunity to fix them now.
…ary template for dynamic specialisation, remove wrong comment Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au>
CI MESSAGE: [9479881]: BUILD STARTED |
CI MESSAGE: [9479881]: BUILD FAILED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [9482171]: BUILD STARTED |
CI MESSAGE: [9482171]: BUILD PASSED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [9484228]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [9484655]: BUILD STARTED |
CI MESSAGE: [9484655]: BUILD PASSED |
Enables numpy style slicing with strides to tensor subscript operator by supporting a `steps` member to slice params. Signed-off-by: Bryce Ferenczi <frenzi@hotmail.com.au> Co-authored-by: Michal Zientkiewicz <michalz@nvidia.com>
Category:
New Features
Description:
Added striding to subscript operator by implementing step in slice kernel and adding parameters.
Additional information:
Affected modules and functionalities:
Slice operator and kernels are the main changes as well as the test suite.
Key points relevant for the review:
Removed dimension flattening and pre-anchoring of slice kernel as this would futher complicate logic or result in more duplicated "specialised" kernels. I would conjecture that for a memory-bandwidth limited kenel such as rearranging/copying data, a little more interger logic should't have too much of an adverse effect.
Check if the test_constant_ranges has sufficient coverage of stride permutations, rather than focusing of the logic in subscript.h (maybe I should've looked at numpy's source code for their range conditions, but I seem to match it on final value).
All the "formatting" changes were done with clang-format and black.
Tests:
Added tests with different start : end : step in operator_2.test_subscript and checked for parity with numpy on both gpu and cpu mats. All operator_2 tests pass.
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A