Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extrema() is missing [,dims] #15376

Closed
bjarthur opened this issue Mar 6, 2016 · 6 comments
Closed

extrema() is missing [,dims] #15376

bjarthur opened this issue Mar 6, 2016 · 6 comments
Labels
good first issue Indicates a good issue for first-time contributors to Julia

Comments

@bjarthur
Copy link
Contributor

bjarthur commented Mar 6, 2016

maximum, minimum, unique, sum, etc. all have the option to reduce across dimensions, but not extrema. the fastest i've come up with is:

function extrema(a::AbstractArray, dim)
    dim = tuple(dim...)
    mi = squeeze(minimum(a,dim), dim)
    ma = squeeze(maximum(a,dim), dim)
    reshape(collect(zip(mi,ma)), size(mi))
end

i feel there must be a way to combine the loop repetition inside the calls to maximum and minimum, but i'm not even a white belt in Functor Fu. this, for example, is much slower:

extrema(a,dim) = squeeze( mapslices(extrema,a,dim), dim)

anyone got any better ideas? or should i submit a PR? thanks.

@tkelman
Copy link
Contributor

tkelman commented Mar 7, 2016

Are you comparing performance on master or 0.4?

@bjarthur
Copy link
Contributor Author

bjarthur commented Mar 7, 2016

the differences between master and 0.4 are negligible:

julia> VERSION
v"0.4.3"

julia> foo=rand(100,101,102);

julia> function extrema1(a::AbstractArray, dim)
           dim = tuple(dim...)
           mi = squeeze(minimum(a,dim), dim)
           ma = squeeze(maximum(a,dim), dim)
           reshape(collect(zip(mi,ma)), size(mi))
       end
extrema1 (generic function with 1 method)

julia> @time extrema1(foo,1);
  0.530319 seconds (433.87 k allocations: 20.979 MB)

julia> @time extrema1(foo,1);
  0.003566 seconds (40 allocations: 323.516 KB)

julia> @time extrema1(foo,1);
  0.003971 seconds (40 allocations: 323.516 KB)

julia> @time extrema1(foo,1);
  0.004340 seconds (40 allocations: 323.516 KB)

julia> extrema2(a::AbstractArray,dim) = squeeze( mapslices(extrema,a,dim), dim)
extrema2 (generic function with 1 method)

julia> @time extrema2(foo,1);
  0.972634 seconds (1.10 M allocations: 53.656 MB, 1.17% gc time)

julia> @time extrema2(foo,1);
  0.097835 seconds (463.67 k allocations: 22.483 MB, 4.42% gc time)

julia> @time extrema2(foo,1);
  0.088196 seconds (463.67 k allocations: 22.483 MB, 3.66% gc time)

julia> @time extrema2(foo,1);
  0.094305 seconds (463.67 k allocations: 22.483 MB, 3.06% gc time)
julia> VERSION
v"0.5.0-dev+3038"

julia> foo=rand(100,101,102);

julia> function extrema1(a::AbstractArray, dim)
                  dim = tuple(dim...)
                  mi = squeeze(minimum(a,dim), dim)
                  ma = squeeze(maximum(a,dim), dim)
                  reshape(collect(zip(mi,ma)), size(mi))
              end
extrema1 (generic function with 1 method)

julia> @time extrema1(foo,1);
  0.732535 seconds (435.69 k allocations: 22.081 MB, 0.96% gc time)

julia> @time extrema1(foo,1);
  0.003764 seconds (34 allocations: 323.172 KB)

julia> @time extrema1(foo,1);
  0.005146 seconds (34 allocations: 323.172 KB)

julia> @time extrema1(foo,1);
  0.005158 seconds (34 allocations: 323.172 KB)

julia> extrema2(a::AbstractArray,dim) = squeeze( mapslices(extrema,a,dim), dim)
extrema2 (generic function with 1 method)

julia> @time extrema2(foo,1);
  1.082338 seconds (1.28 M allocations: 62.019 MB, 1.44% gc time)

julia> @time extrema2(foo,1);
  0.088977 seconds (494.56 k allocations: 23.426 MB, 4.14% gc time)

julia> @time extrema2(foo,1);
  0.080576 seconds (494.56 k allocations: 23.426 MB, 3.93% gc time)

julia> @time extrema2(foo,1);
  0.085413 seconds (494.56 k allocations: 23.426 MB, 3.21% gc time)

@timholy
Copy link
Member

timholy commented Mar 7, 2016

If you want to try implementing it, this blog post should get you well on your way. See also code in reducedims for an optimization when the first dimension is one that you're reducing along. (In that case, you don't have to store the result in the array, you can use a temporary variable.)

@vtjnash vtjnash added the good first issue Indicates a good issue for first-time contributors to Julia label Mar 8, 2016
@bjarthur
Copy link
Contributor Author

i've made four versions of extrema with a dims argument, and the functor version i first posted is faster than those using mapslices, cartesian, and CartesianRange. see this gist.

i'm guessing CartestianRange is slow because of #9080.

should i submit a PR for the functor version?

note that this helps to resolve #3893.

@timholy
Copy link
Member

timholy commented Mar 15, 2016

I'd focus on the CartesianRange version. The effect in #9080 is modest, but I bet you're seeing a big slowdown. In your gist, Istart is not a type-stable variable: it starts life as a Vector{Int} and then morphs into a CartesianIndex{N}. On its own that would kill performance, but here you can't just solve it by variable renaming because there's a second issue: N is not inferrable because Vectors do not encode their length in their type parameters. (If you haven't discovered it yet, @code_warntype is a lovely tool. Allocate appropriate A and B and then use @code_warntype extrema_cartesianrange!(B, A). Then look for red. Not all red is evil, but anything in the "variables" section marked in red indicates an almost-certain problem. See the docs for more detail.)

The easiest approach would be not to worry about repeating work on the i=1 components, so that Istart = first(RA) with RA = CartesianRange(sA). You could alternatively use Istart = CartesianIndex(ntuple(f, Val{N})) where N has been extracted as the dimensionality of A and B in the function declaration and f is a suitably-defined anonymous function. This will be type-stable on julia-0.5 (but not 0.4) thanks to Jeff's recent work.

@bjarthur
Copy link
Contributor Author

thanks @timholy for the tip on type stability. since the cartesian version was already stable, it was easiest to just add an @inbounds to it. it is now as fast as the functor version, and uses half the memory. i will submit a PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia
Projects
None yet
Development

No branches or pull requests

4 participants