New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need efficient reinterpret
for extracting array elements
#31305
Comments
There's two unrelated problems here:
|
Are those problems realistically resolvable? If not, would it still make sense to define a new method in |
1 sounds similar to #29867 but it's not really scalable to add that "fix" to all array wrappers. |
It is legal for the optimizer to move the allocation into the error path. We should figure out how to teach it to do that. |
Would that be legal? It's an issue with PTX too, where these 1-byte loads are very costly (hence a custom |
Combining in IR should definitely be legal. And the backend knows whether unaligned access is supported by the hardware (yes on x86, no on ptx). Since |
The performance of
reinterpret
is impressive, however there is still one bit of functionality (which is sometimes needed for efficient random-access IO) which is a little slow. Extracting individual elements from a buffer usingreinterpret
is only as fast as using pointers if the size of theeltype
of the buffer matches the size of the type being converted to.See this example:
There are a few things going on here. First,
reinterpret
can only be called on an entire array, not just a slice of it, so a view of the array must be allocated beforereinterpret
can even be called. Then, the return value is itself an array, so if you only want a single value, you need to index it (I think what's going on is that, since the returned array is itself a view which must be allocated, the allocation of both of these views are the two allocations seen in the benchmark).It would be really nice if we can get a version of
reinterpret
which would allow one to extract individual elements without allocating. It would also be nice to have areinterpret
that takes views from the middle of an array rather than the start of it, however the performance of that is a little less crucial since you'd only be eliminating 1 allocation of a view for an entire array.I'm not sure what that function would best be called or how to structure its arguments. Were I to make a PR for this, I'd probably put in a function a bit like my
unsafe
method here (which ought to be safe if properly bounds checked).Thoughts?
The text was updated successfully, but these errors were encountered: