Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Use IndexMap to transform NDArray #12949

Merged
merged 9 commits into from
Oct 5, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Sep 30, 2022

I've hit a weird use case where I want to manually transform runtime::NDArray (attached to AllocateConst node) according to the index map used in transform_layout. This is needed to support AllocateConst node in Metaschedule RewriteLayout postproc.

I can define it as a free function in the file where it is actually used. Having it available as part of the IndexMap interface makes it convenient to expose this to python and unit-test it. Let me know if this is a reasonable API addition.

cc @vinx13 @Lunderberg @junrushao

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a fantastic addition to the API, and very useful functionality. My one question is whether we want to use DeviceAPI::CopyDataFromTo to avoid the round trip from device to host and back.

In addition to the AllocateConst usecase you mentioned, it would also be very useful in preparing input/output/expected buffers for unit tests, rather than needing to specify the transformation logic in both IndexMap and np.transpose calls.

@masahi
Copy link
Member Author

masahi commented Sep 30, 2022

My one question is whether we want to use DeviceAPI::CopyDataFromTo

@Lunderberg I've just looked into this possibility, I wish I could use DeviceAPI::CopyDataFromTo but it's a protected method so I cannot use it from IndexMap

virtual void CopyDataFromTo(const void* from, size_t from_offset, void* to, size_t to_offset,

Other variants of CopyDataFromTo all seem to take DLTensor as argument, so I cannot copy byte slices (element-by-element copy).

@Lunderberg
Copy link
Contributor

Rats. I figured it was worth a shot. Two additional questions coming to mind:

  1. Would it work to construct a DLTensor of shape [1], then iterate over DLTensor::byte_offset for each element? I think that would work within the public methods using DLTensor, but the per-element overhead might be large.
  2. If there the per-element overhead of a virtual function call is large, would it be worth using DetectIterMap to attempt to regions that are contiguous in both source and destination layout, in order to copy those entire regions?

@masahi
Copy link
Member Author

masahi commented Sep 30, 2022

I agree that (1) is possible, but making such small copies on the device side sounds very slow (e.g. GPU), which would defeat the purpose of removing the host - device round trip.

On (2), it sounds like it would add too much complexity to the otherwise very simple code. I'm not sure how such "contiguous region detection" is effective in practice, but if @vinx13 and @junrushao also think it's a good idea, I'm happy to explore this approach.

@junrushao
Copy link
Member

This is definitely cool Masa! Similar functionality existed in Relay's FoldConstant where we are explicitly doing post-hoc layout rewriting, where layout is inferred from TIR (ugly admittedly), and the difference is that it's actually compiling the transformation into multi-threaded TIR which is potentially faster

@junrushao
Copy link
Member

I'm not sure how such "contiguous region detection" is effective in practice

It sounds a bit more complicated than the PR is supposed to be. In light that we have FoldConstant in Relay, shall we consider it as future work and potentially merge those two functionalities together? I'm not super sure

@Lunderberg
Copy link
Contributor

Good points. I'd been thinking in terms of a small number of discontiguous breaks between largely contiguous regions, but that probably would be rather rare. Agreed that it isn't worth the extra complexity.

@masahi
Copy link
Member Author

masahi commented Oct 5, 2022

@Lunderberg @vinx13 @junrushao Can we merge this? I have another PR ready to be sent that depends on this.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. LGTM!

@junrushao junrushao merged commit 9618e6a into apache:main Oct 5, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
I've hit a weird use case where I want to manually transform `runtime::NDArray` (attached to `AllocateConst` node) according to the index map used in `transform_layout`. This is needed to support `AllocateConst` node in Metaschedule `RewriteLayout` postproc.

I can define it as a free function in the file where it is actually used. Having it available as part of the `IndexMap` interface makes it convenient to expose this to python and unit-test it. Let me know if this is a reasonable API addition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants