Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue with increment_stepper #1695

Open
weilewei opened this issue Jul 25, 2019 · 9 comments
Open

performance issue with increment_stepper #1695

weilewei opened this issue Jul 25, 2019 · 9 comments
Labels

Comments

@weilewei
Copy link

Hi,

I am running a software that relies on xtensor and found out that function increment_stepper is being called multiple times and thus costing a lot of time when I am using a performance tracking tool (Arm-forge Map). I look into the function body and did not quite understand its meaning.

When you have time, 1) could you please explain a bit about stepper_tools and its member function increment_stepper? 2) do you see any potential performance improvement in the for and/or while loop?

Any suggestion will be helpful! Thanks!
reference:
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xiterator.hpp#L145
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xassign.hpp#L480

@JohanMabille
Copy link
Member

JohanMabille commented Jul 25, 2019

Hi,

You might be interested in reading this part of the documentation:

An iterator holds a stepper and a multi-dimensional index. A call to operator++ increases the index and calls the step method of the stepper accordingly. The way the index is increased depends on the layout used for iterating.

The increment_stepper is the bridge between the index increment in the iterator and the step of the stepper. I don't really see how we could improve performance of these functions (but I am definitely open to any suggestion, and any PR that improves them would be really welcome).

Usually you try to optimize by avoiding the assignment with steppers when possible (which means that the containers involved in the assignment are contiguous in memory, have the same layout, and you don't have any broadcasting).

Do you have small examples of code that use the steppers and are problematic regarding the performance?

@weilewei
Copy link
Author

weilewei commented Jul 26, 2019

Hi Johan,

Thanks for your quick reply and detailed suggestion. My project is a bit complicated as has many layers and relies on multiple libraries. It's hard for me to replicate a small example for you even though I would love to. But this function could be the starting point:
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_util.hxx#L129
Then most of the time were spent on when view.dimension() == 1, and eventually call the steppers. From the function name, I can tell that it is copying data into the memory buffer. Also, this function itself will be called many times.
Here is the screenshot from the performance evaluation. Hope it helps. Let me know if you have any suggestions.
Screen Shot 2019-07-26 at 11 17 34 AM

@wolfv
Copy link
Member

wolfv commented Jul 26, 2019

the source of the problem might be that z5 doesn't set a static layout in the xt::adapt call, which will then fall back to dynamic layout and use the (slow) steppers.

@weilewei
Copy link
Author

weilewei commented Jul 26, 2019

Thanks @wolfv for your quick response. So I checked through your document here, it seems that in
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_util.hxx#L136
, if we can have the modified function call

const auto bufferView = xt::adapt(buffer, view.shape(), row_major);

or

const auto bufferView = xt::adapt(buffer, view.shape(), column_major);

this will set a static layout in xt::adapt so that we can have a better performance? Also, let's assume that such modification does not impact what Z5 intents to do. Let me know if you have any suggestion.

@wolfv
Copy link
Member

wolfv commented Jul 27, 2019

actually it should be xt::adapt<xt::column_major>(buffer, view.shape()); I think.

On the other hand I just saw that there is a strided view being used somewhere. I have PR to make them faster here: #1627

I need to fix the last issue with it.
Could you check if that PR fixes your issue?

@constantinpape
Copy link
Contributor

@wolfv might be right that the issue might be with z5 not specifying a layout type. However, I need to use strided views, which might complicate this matter a bit.

I wrote up some context in the issue @weilewei has opened about this in the z5 repo, see constantinpape/z5#118 (comment).

I am happy to continue the discussion there for performance issues that are more z5 related.

@weilewei
Copy link
Author

weilewei commented Aug 8, 2019

Thanks for the updates @wolfv. @halehawk and I will look into this soon.

@anuesseler
Copy link

Hi guys,

I think I'm facing a situation that seems to be related to this issue and I would like to have your advice. More precisely, I have the following lines of code:

size_t N = 10;

std::vector<size_t> shape{N,N,N,N};

xt::xtensor<double, 4> A = xt::random::rand<double>(shape);
xt::xtensor<double, 4> B = xt::random::rand<double>(shape);

auto C = xt::linalg::tensordot(A,B,{2},{1});

auto&& D = xt::eval(xt::reshape_view(xt::transpose(C, {0,3,1,4,2,5}), {N*N, N, N, N*N}));

xt::xtensor<double, 4> E(std::move(D));

After compiling with g++ -g and profiling with valgrind I noticed that a significant amount of time is spent in increment_stepper (see screenshot below). Is this the expected behavior or are there any changes I could do to improve my code (apart from using optimization flags)?

profile_xtensor_dot_reshape

@JohanMabille
Copy link
Member

This is expected: as soon as non linear broadcasting is involved, the assignment has no choice but using the steppers. Some expressions allow the stepper to use SIMD intrinsics, but that is still slower than linear assignment.

You can find more details about assignment in this page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants