findfirst(A) returns index of first non-zero element, or 0 #925

Merged
merged 5 commits into from Jun 19, 2012

Conversation

Projects
None yet
4 participants
Member

HarlanH commented Jun 12, 2012

per mailing list discussion

Harlan Harris findfirst(A) gives index of first non-zero element
or 0 if none found. Per dev-list discussion.
7c38668
Member

pao commented Jun 12, 2012

Better MATLAB syntactic compatibility would be to have a two-argument find(), where the second argument is the number of elements to find. Whether we want that or not is up for discussion, of course, though I personally favor the two-argument approach since we have multiple dispatch.

http://www.mathworks.com/help/techdoc/ref/find.html

Member

pao commented Jun 12, 2012

Oh, just saw your post on -dev. Those are good points. Got to think about that.

Member

HarlanH commented Jun 13, 2012

Yeah, I see the appeal of having findfirst-like behavior in find(), but am not currently convinced it's a good idea. On a related note, findfirst() in this pull request always finds the first non-zero element. That always works, as you can do something like findfirst([1,2,3,4,5] .== 3), returning the index 3, but it'd be better to have a two-argument version of findfirst, like findfirst([1,2,3,4,5], 3) that does a true short-circuit for speed. I'm going to update this pull request to include that form as well...

Member

HarlanH commented Jun 13, 2012

also added findfirst(A, function) form, which tests until the function returns true. Note that this is syntactically the opposite of the map() and filter() functions, which might suggest that findfirst(function, A) would be better. If so, then perhaps findfirst(3, A) would be better than the current form. Consistency is an unobtainable virtue.

Owner

JeffBezanson commented Jun 14, 2012

Hate to rain on the parade, and admittedly this is an obscure case, but this is ambiguous in the case of an array of functions. Considering both that and the usual convention for higher-order functions, it's probably better to use findfirst(function, A). Then we can also get rid of all the Ts and {T}s.

Member

HarlanH commented Jun 14, 2012

Oh, so it is! OK, so you're suggesting the signatures be:

findfirst(testf::function, A::StridedArray)
findfirst{T}(v::T, A::StridedArray{T})
findfirst(A::StridedArray)

And maybe also expanding find likewise, as:

find(testf::function, A::StridedArray)
find{T}(v::T, A::StridedArray{T})
find(A::StridedArray)

(And findn too, but like I said, I don't understand the metaprogramming for that, so someone smarter than I will have to do the work...)

Owner

JeffBezanson commented Jun 16, 2012

If you flip them both, the ambiguity is still there. At the same time, having find(A, v) do something different than the same call in matlab is confusing. I would just keep the function-argument version for now.

Owner

ViralBShah commented Jun 17, 2012

I prefer using the 2 argument version of find as in matlab, and making the second argument be either a number or a function. The 3 argument form that matlab supports is a bit ugly, and perhaps we can do better.

Owner

JeffBezanson commented Jun 18, 2012

Cannot be merged; please rebase.

Harlan Harris added some commits Jun 19, 2012

Harlan Harris Merge branch 'master' of git://github.com/JuliaLang/julia
Conflicts:
	base/array.jl
b585dbc
Harlan Harris fix merge issue (again); 2-arg find()
Conflicts:

	base/array.jl
a670754
Member

HarlanH commented Jun 19, 2012

I dealt with the merge issue. Jeff, looks like you replaced the zero(T) with 0 in your earlier changes to find?

I also created two-argument forms of find(), to match findfirst(). I'm not sure if the algorithm I used is the most efficient or not. It makes only one pass over the source array, instead of two as the one-argument cases do, but it grows a target array from scratch before copying it (to get rid of the padding). I should probably do some timings to find out. I also probably didn't do the copy in quite the right way.

Owner

JeffBezanson commented Jun 19, 2012

In the timings I've done of this sort of thing, it's generally better to determine the result size first if doing so is reasonably cheap. An extra constant-space pass over the array is better than allocating extra space.

I replaced zero(T) with 0 just because it is nicer and probably not really different performance-wise.

Member

HarlanH commented Jun 19, 2012

I did some simple performance testing. If the array you're iterating over is sparse (with respect to the test item), the one-pass method with a growing array is twice as fast. The functional method doesn't get in-lined, so it's an order of magnitude slower. findfirst(x) is vastly faster than find(x)[1], of course.

# build a 10M array, then time various find operations on it
x = [zeros(10000), 1]
x = [x, x] # 20K
x = [x, x] # 40K
x = [x, x] # 80K
x = [x, x] # 160K
x = [x, x] # 320K
x = [x, x] # 640K
x = [x, x] # 1.2M
x = [x, x] # 2.4M
x = [x, x] # 4.8M
x = [x, x] # 9.6M

function f1(x)
    for i=1:10
        y = find(x)
    end
end
function f2(x)
    for i = 1:10
        y = find(x, 1.0)
    end
end
function f3(x)
    for i = 1:10
        y = find(x, y->y==1)
    end
end
function ff1(x)
    for i = 1:10
        y = findfirst(x)
    end
end
function ff2(x)
    for i = 1:10
        y = findfirst(x,1.0)
    end
end
function ff3(x)
    for i = 1:10
        y = findfirst(x,y->y==1)
    end
end
@time f1(x)
@time f1(x)
@time f2(x)
@time f2(x)
@time f3(x)
@time f3(x)
@time ff1(x)
@time ff1(x)
@time ff2(x)
@time ff2(x)
@time ff3(x)
@time ff3(x)
julia> @time f1(x)
elapsed time: 0.4995710849761963 seconds

julia> @time f1(x)
elapsed time: 0.49610280990600586 seconds

julia> @time f2(x)
elapsed time: 0.27991819381713867 seconds

julia> @time f2(x)
elapsed time: 0.2588460445404053 seconds

julia> @time f3(x)
elapsed time: 7.236483097076416 seconds

julia> @time f3(x)
elapsed time: 7.2504119873046875 seconds

julia> @time ff1(x)
elapsed time: 0.00019311904907226562 seconds

julia> @time ff1(x)
elapsed time: 0.000308990478515625 seconds

julia> @time ff2(x)
elapsed time: 0.008929967880249023 seconds

julia> @time ff2(x)
elapsed time: 0.00028395652770996094 seconds

julia> @time ff3(x)
elapsed time: 0.006864070892333984 seconds

julia> @time ff3(x)
elapsed time: 0.005472898483276367 seconds

@JeffBezanson JeffBezanson added a commit that referenced this pull request Jun 19, 2012

@JeffBezanson JeffBezanson Merge pull request #925 from HarlanH/master
findfirst(A) returns index of first non-zero element, or 0
7816b4a

@JeffBezanson JeffBezanson merged commit 7816b4a into JuliaLang:master Jun 19, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment