Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To improve performance in cumsum #7342

Closed
cndesantana opened this issue Jun 20, 2014 · 11 comments
Closed

To improve performance in cumsum #7342

cndesantana opened this issue Jun 20, 2014 · 11 comments
Labels
performance Must go faster
Milestone

Comments

@cndesantana
Copy link

Dear all,

By checking the profile of a program we are developing we noted that the "bottleneck" seems to be in a cumulative sum along a dimension in a matrix, for what we use the function cumsum.

We are doing something like this:

DI = rand(5,5);
Dc = cumsum(DI,2);

How to improve performance in cumsum? According to some comments in the forums, it seems that cumsum / cummax / cummin / cumprod, etc have suboptimal performance currently, which are about 20x slower than the sum/prod etc.

Best

@JeffBezanson
Copy link
Sponsor Member

What version of julia are you using?

@lindahua
Copy link
Contributor

The current implementation of cumsum isn't cache-friendly. The cumsum and friends really need to be reimplemented (just like what we did to sum and friends a while ago). Since they are not as widely used as sum, reimplementing these functions have not been a high priority.

@cndesantana
Copy link
Author

Hi Jeff,

I am using a Debian version of Julia: 0.2.1 from 11-02-2014.

Best,

Charles


From: Jeff Bezanson [notifications@github.com]
Sent: Saturday, June 21, 2014 12:56 AM
To: JuliaLang/julia
Cc: De Santana, Charles
Subject: Re: [julia] To improve performance in cumsum (#7342)

What version of julia are you using?


Reply to this email directly or view it on GitHubhttps://github.com//issues/7342#issuecomment-46734951.

@ViralBShah
Copy link
Member

The code is cache friendly for cumsum along columns, but not rows, in a 2d case. Even so, the performance suffers because of the div check in every iteration.

@timholy
Copy link
Sponsor Member

timholy commented Jun 21, 2014

A version that, in 2d, runs roughly 10x faster is just a couple of lines:

@ngenerate N typeof(A) function mycumsum!{T,N}(A::AbstractArray{T,N}, axis::Integer)
    @nexprs N d->(isaxis_d = axis == d)
    @nloops N i d->((1+isaxis_d):size(A, d)) d->(j_d = i_d - isaxis_d) begin
        @nref(N, A, i) += @nref(N, A, j)
    end
    A
end

(Note this works in-place, so it's a slightly different API). However, it doesn't use pairwise summation. Are we worried about that?

@JeffBezanson
Copy link
Sponsor Member

Wow, I thought @lindahua had already rewritten this one. Doing a div (and rem!) for every element is crazy bad. We're not doing pairwise summation for this now, so switching to this version would be a big improvement with no compromise. @timholy let's put this in Base!

@lindahua
Copy link
Contributor

even sum(x, dim) is not doing pairwise summation when dim >= 2. We just need some something similar to the sum implementation here.

@timholy
Copy link
Sponsor Member

timholy commented Jun 21, 2014

@JeffBezanson, ah, I was looking at the wrong implementation. Yes, the div is the worst of it. Actually, much as I like Cartesian 😄, in that case there's an even easier solution:

if i == init_next
    B[i] = A[i]
    init_next += axis_stride
...

Should be basically as fast as the Cartesian version.

@lindahua, of course you're right. Good point.

@timholy
Copy link
Sponsor Member

timholy commented Jun 21, 2014

Oh, wait, no, that's not right. I'll put the Cartesian version in.

timholy added a commit that referenced this issue Jun 21, 2014
@ViralBShah ViralBShah added this to the 0.3 milestone Jun 22, 2014
timholy added a commit that referenced this issue Jun 22, 2014
@timholy
Copy link
Sponsor Member

timholy commented Jun 22, 2014

@cndesantana, there's a significantly faster version in master now. To get it, you'll have to compile your own version of Julia from source, or wait for a nightly build package.

@cndesantana
Copy link
Author

Dear guys,

You are amazing! I am learning a lot from your messages!! Thank you for your hard work, I will try to compile my own version and I will let you know about it in the general discussion list.

Thank you very much! Long life to Julia!

Best,

Charles


From: Tim Holy [notifications@github.com]
Sent: Sunday, June 22, 2014 1:27 PM
To: JuliaLang/julia
Cc: De Santana, Charles
Subject: Re: [julia] To improve performance in cumsum (#7342)

@cndesantanahttps://github.com/cndesantana, there's a significantly faster version in master now. To get it, you'll have to compile your own version of Julia from source, or wait for a nightly build package.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7342#issuecomment-46778598.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants