Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose CopyRegions for tensor copy operations #329

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Crydsch
Copy link
Contributor

@Crydsch Crydsch commented Aug 17, 2023

Hey,
in my current project I have the need to only copy parts of a tensor to/from the device.
I also found your Issue #24 regarding this.

So I had a go at it!

I extended the kp::OpTensorSyncDevice by a vector of copyRegions
and added function overloads all the way up to Sequence.eval().
I made sure that the overload only add functionality and still copy the entire tensor per default.
Additionally there is now a Test, showcasing the approach.
All very experimental at this stage..

Open questions are:

  • Should we add a new type kp::CopyRegion which takes offsets and size in number of elements instead of bytes?
    This would ease usage and fit nicely with kompute's higher level functions.
  • How should we handle out of bounds regions?
    I suggest just clamping the region to the tensor size in the kp::OpTensorSyncDevice constructor.
    With a warning of course..

I'd like to know what you think of this approach.
If you are interested I can work on this an turn it into a PR.

Signed-off-by: crydsch <crydsch@lph.zone>
Signed-off-by: crydsch <crydsch@lph.zone>
@Crydsch
Copy link
Contributor Author

Crydsch commented Aug 26, 2023

Hey, i had some time and took another look into this.
I initially thought i had to add a special template function to pass the regions, but since noticed that this is not really necessary (although much nicer).

But in principle the functionality could be added as a new operation class.
This could be done entirely in user code just by deriving from OpBase.

The only necessary change to enable this is to expose the copy region in the tensor class.
So changed my approach for this PR and added the required function overloads.

I'll push another PR then with the new operations.
PS: To see if the CI runs successfully with this change I enabled the TestOpTensorSync, not sure why it was disabled.
If you want to keep it disabled, I'll remove the commit again.

@Crydsch Crydsch marked this pull request as ready for review August 26, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant