performance issue with increment_stepper #1695

weilewei · 2019-07-25T23:22:25Z

Hi,

I am running a software that relies on xtensor and found out that function increment_stepper is being called multiple times and thus costing a lot of time when I am using a performance tracking tool (Arm-forge Map). I look into the function body and did not quite understand its meaning.

When you have time, 1) could you please explain a bit about stepper_tools and its member function increment_stepper? 2) do you see any potential performance improvement in the for and/or while loop?

Any suggestion will be helpful! Thanks!
reference:
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xiterator.hpp#L145
https://github.com/QuantStack/xtensor/blob/30c8d3dd0d8bbbf0e18de11e3357b61934bfcb67/include/xtensor/xassign.hpp#L480

The text was updated successfully, but these errors were encountered:

JohanMabille · 2019-07-25T23:57:29Z

Hi,

You might be interested in reading this part of the documentation:

An iterator holds a stepper and a multi-dimensional index. A call to operator++ increases the index and calls the step method of the stepper accordingly. The way the index is increased depends on the layout used for iterating.

The increment_stepper is the bridge between the index increment in the iterator and the step of the stepper. I don't really see how we could improve performance of these functions (but I am definitely open to any suggestion, and any PR that improves them would be really welcome).

Usually you try to optimize by avoiding the assignment with steppers when possible (which means that the containers involved in the assignment are contiguous in memory, have the same layout, and you don't have any broadcasting).

Do you have small examples of code that use the steppers and are problematic regarding the performance?

weilewei · 2019-07-26T17:22:07Z

Hi Johan,

Thanks for your quick reply and detailed suggestion. My project is a bit complicated as has many layers and relies on multiple libraries. It's hard for me to replicate a small example for you even though I would love to. But this function could be the starting point:
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_util.hxx#L129
Then most of the time were spent on when view.dimension() == 1, and eventually call the steppers. From the function name, I can tell that it is copying data into the memory buffer. Also, this function itself will be called many times.
Here is the screenshot from the performance evaluation. Hope it helps. Let me know if you have any suggestions.

wolfv · 2019-07-26T18:02:00Z

the source of the problem might be that z5 doesn't set a static layout in the xt::adapt call, which will then fall back to dynamic layout and use the (slow) steppers.

weilewei · 2019-07-26T19:51:06Z

Thanks @wolfv for your quick response. So I checked through your document here, it seems that in
https://github.com/constantinpape/z5/blob/master/include/z5/multiarray/xtensor_util.hxx#L136
, if we can have the modified function call

const auto bufferView = xt::adapt(buffer, view.shape(), row_major);

or

const auto bufferView = xt::adapt(buffer, view.shape(), column_major);

this will set a static layout in xt::adapt so that we can have a better performance? Also, let's assume that such modification does not impact what Z5 intents to do. Let me know if you have any suggestion.

wolfv · 2019-07-27T05:55:03Z

actually it should be xt::adapt<xt::column_major>(buffer, view.shape()); I think.

On the other hand I just saw that there is a strided view being used somewhere. I have PR to make them faster here: #1627

I need to fix the last issue with it.
Could you check if that PR fixes your issue?

constantinpape · 2019-08-03T13:20:21Z

@wolfv might be right that the issue might be with z5 not specifying a layout type. However, I need to use strided views, which might complicate this matter a bit.

I wrote up some context in the issue @weilewei has opened about this in the z5 repo, see constantinpape/z5#118 (comment).

I am happy to continue the discussion there for performance issues that are more z5 related.

weilewei · 2019-08-08T16:15:41Z

Thanks for the updates @wolfv. @halehawk and I will look into this soon.

anuesseler · 2020-02-15T19:34:53Z

Hi guys,

I think I'm facing a situation that seems to be related to this issue and I would like to have your advice. More precisely, I have the following lines of code:

size_t N = 10;

std::vector<size_t> shape{N,N,N,N};

xt::xtensor<double, 4> A = xt::random::rand<double>(shape);
xt::xtensor<double, 4> B = xt::random::rand<double>(shape);

auto C = xt::linalg::tensordot(A,B,{2},{1});

auto&& D = xt::eval(xt::reshape_view(xt::transpose(C, {0,3,1,4,2,5}), {N*N, N, N, N*N}));

xt::xtensor<double, 4> E(std::move(D));

After compiling with g++ -g and profiling with valgrind I noticed that a significant amount of time is spent in increment_stepper (see screenshot below). Is this the expected behavior or are there any changes I could do to improve my code (apart from using optimization flags)?

JohanMabille · 2020-02-17T10:15:45Z

This is expected: as soon as non linear broadcasting is involved, the assignment has no choice but using the steppers. Some expressions allow the stepper to use SIMD intrinsics, but that is still slower than linear assignment.

You can find more details about assignment in this page

weilewei mentioned this issue Jul 26, 2019

Z5 performance constantinpape/z5#118

Open

JohanMabille added Discussion Question and removed Discussion labels Aug 27, 2019

spectre-ns mentioned this issue Oct 22, 2023

[Performance] Complex view are very slow compared to hand rolled loops #2717

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance issue with increment_stepper #1695

performance issue with increment_stepper #1695

weilewei commented Jul 25, 2019

JohanMabille commented Jul 25, 2019 •

edited

Loading

weilewei commented Jul 26, 2019 •

edited

Loading

wolfv commented Jul 26, 2019

weilewei commented Jul 26, 2019 •

edited

Loading

wolfv commented Jul 27, 2019

constantinpape commented Aug 3, 2019

weilewei commented Aug 8, 2019

anuesseler commented Feb 15, 2020

JohanMabille commented Feb 17, 2020

performance issue with increment_stepper #1695

performance issue with increment_stepper #1695

Comments

weilewei commented Jul 25, 2019

JohanMabille commented Jul 25, 2019 • edited Loading

weilewei commented Jul 26, 2019 • edited Loading

wolfv commented Jul 26, 2019

weilewei commented Jul 26, 2019 • edited Loading

wolfv commented Jul 27, 2019

constantinpape commented Aug 3, 2019

weilewei commented Aug 8, 2019

anuesseler commented Feb 15, 2020

JohanMabille commented Feb 17, 2020

JohanMabille commented Jul 25, 2019 •

edited

Loading

weilewei commented Jul 26, 2019 •

edited

Loading

weilewei commented Jul 26, 2019 •

edited

Loading