-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance issue with increment_stepper #1695
Comments
Hi, You might be interested in reading this part of the documentation:
The Usually you try to optimize by avoiding the assignment with steppers when possible (which means that the containers involved in the assignment are contiguous in memory, have the same layout, and you don't have any broadcasting). Do you have small examples of code that use the steppers and are problematic regarding the performance? |
Hi Johan, Thanks for your quick reply and detailed suggestion. My project is a bit complicated as has many layers and relies on multiple libraries. It's hard for me to replicate a small example for you even though I would love to. But this function could be the starting point: |
the source of the problem might be that z5 doesn't set a static layout in the |
Thanks @wolfv for your quick response. So I checked through your document here, it seems that in const auto bufferView = xt::adapt(buffer, view.shape(), row_major); or
this will set a static layout in |
actually it should be On the other hand I just saw that there is a strided view being used somewhere. I have PR to make them faster here: #1627 I need to fix the last issue with it. |
@wolfv might be right that the issue might be with z5 not specifying a layout type. However, I need to use strided views, which might complicate this matter a bit. I wrote up some context in the issue @weilewei has opened about this in the z5 repo, see constantinpape/z5#118 (comment). I am happy to continue the discussion there for performance issues that are more z5 related. |
Hi guys, I think I'm facing a situation that seems to be related to this issue and I would like to have your advice. More precisely, I have the following lines of code:
After compiling with |
This is expected: as soon as non linear broadcasting is involved, the assignment has no choice but using the steppers. Some expressions allow the stepper to use SIMD intrinsics, but that is still slower than linear assignment. You can find more details about assignment in this page |
Hi,
I am running a software that relies on xtensor and found out that function
increment_stepper
is being called multiple times and thus costing a lot of time when I am using a performance tracking tool (Arm-forge Map). I look into the function body and did not quite understand its meaning.When you have time, 1) could you please explain a bit about stepper_tools and its member function increment_stepper? 2) do you see any potential performance improvement in the for and/or while loop?
Any suggestion will be helpful! Thanks!
reference:
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xiterator.hpp#L145
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xassign.hpp#L480
The text was updated successfully, but these errors were encountered: