Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further refactoring of reduce/mapreduce functions #7061

Merged
merged 16 commits into from
Jun 2, 2014
Merged

Conversation

lindahua
Copy link
Contributor

Originally, reduce and mapreduce functions are not implemented in a coherent way. Similar algorithms are repeatedly implemented for different functions, resulting in inconsistent behaviors, subtle bugs, and higher maintenance cost.

In this PR, we refactor the codebase for all these functions. The primary goal is to organize them into a coherent framework, without compromising flexibility and performance.

The core of this framework is the mapreduce function, defined as below:

function _mapreduce{T}(f, op, A::AbstractArray{T})
    n = length(A)
    if n == 0
        return mr_empty(f, op, T)
    elseif n == 1
        return r_promote(op, evaluate(f, A[1]))
    elseif n < 16
        @inbounds fx1 = evaluate(f, A[1])
        @inbounds fx2 = evaluate(f, A[2])
        s = evaluate(op, fx1, fx2)
        i = 2
        while i < n
            @inbounds fx = evaluate(f, A[i+=1])
            s = evaluate(op, s, fx)
        end
        return s
    else
        return mapreduce_impl(f, op, A, 1, n)
    end
end

mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, A)
mapreduce(f, op, a::Number) = evaluate(f, a)

function mapreduce(f, op::Function, A::AbstractArray)
    is(op, +) ? _mapreduce(f, AddFun(), A) :
    is(op, *) ? _mapreduce(f, MulFun(), A) :
    is(op, &) ? _mapreduce(f, AndFun(), A) :
    is(op, |) ? _mapreduce(f, OrFun(), A) :
    _mapreduce(f, op, A)
end

The mapreduce function dispatches the actual operation depending on the length of the input array:

  • n = 0: call mr_empty to handle empty array
  • n = 1: call r_promote to promote the value, in order to attain type stability
  • 2 <= n < 16: use a simple loop to compute the result (since the array is small, it is not worth the efforts to call more sophisticated functions)
  • n >= 16: call mapreduce_impl

For different types of map-function f and reduce-operation op, one can specialize mr_empty to handle empty arrays differently (it raises an error by default), and specialize mapreduce_impl to implement faster or more accurate algorithms for certain operations.

For example, when op is of type AddFun, mr_empty returns zero whenever appropriate, and mapreduce_impl uses pairwise summation with four-way unrolling. In another example, when op is AndFun or OrFun, the mapreduce_impl uses an algorithm that may terminate earlier without scanning the whole array.

Other functions are merely one-liner that delegate to mapreduce:

reduce(op, v0, itr) = mapreduce(IdFun(), op, v0, itr)
reduce(op, itr) = mapreduce(IdFun(), op, itr)
reduce(op, a::Number) = a

sum(f::Function, a) = mapreduce(f, AddFun(), a)
sum(a) = mapreduce(IdFun(), AddFun(), a)
sumabs(a) = mapreduce(AbsFun(), AddFun(), a)
sumabs2(a) = mapreduce(Abs2Fun(), AddFun(), a)

prod(f::Function, a) = mapreduce(f, MulFun(), a)
prod(a) = mapreduce(IdFun(), MulFun(), a)

# ......

In this way, consistent behavior is achieved. For example, the following statements actually invoke the same underlying algorithm:

sum(x)
reduce(+, x)
reduce(AddFun(), x)
mapreduce(IdFun(), +, x)
...

With this refactoring, the source file reduce.jl is cut from 696 lines to 444 lines (a net loss of over 250 lines).

The test script test/reduce.jl was also re-organized and expanded, which provides better coverage of these functions.

The refactoring pass all tests (including Steven's type stability tests).

if done(itr, s)
error("argument is empty")
end
done(itr, s) && error("argument is empty")
(v, s) = next(itr, s)
vmin = v
vmax = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are supposed to come after the first while loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When itr is empty, there's no way to compute extremes (not even able to get v from the first element). In such cases, we have no other choices than raising an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only vmin = v and vmax = v need to move. Otherwise the function returns NaN if the first element of the array is NaN, because the loop below only touches v and not vmin/vmax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Thanks. I just made a commit that fixes this.

@simonster
Copy link
Member

This looks good to me. It's quite impressive that you've managed cut away a third of this file. It might be good to move some of your description of the structure of the file in the PR into the code itself.

@timholy
Copy link
Sponsor Member

timholy commented Jun 1, 2014

Unfortunately I can't take the time to review this now, but such consistency and brevity sound great!

@lindahua
Copy link
Contributor Author

lindahua commented Jun 1, 2014

There's no changes to API and this has been pretty well tested. If there is no objection, I will merge it tomorrow.

lindahua added a commit that referenced this pull request Jun 2, 2014
Further refactoring of reduce/mapreduce functions
@lindahua lindahua merged commit 4930f60 into master Jun 2, 2014
simonster added a commit that referenced this pull request Jun 3, 2014
This was broken by #7061. I have added a test to avoid future breakage.
@lindahua lindahua deleted the dh/reduce3 branch June 7, 2014 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants