Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding number of elements to return for first and last #23969

Closed
bkamins opened this issue Oct 2, 2017 · 10 comments
Closed

Adding number of elements to return for first and last #23969

bkamins opened this issue Oct 2, 2017 · 10 comments

Comments

@bkamins
Copy link
Member

bkamins commented Oct 2, 2017

Following the discussion in #23765 regarding implementing first and last for general collections I wanted to clarify implementation assumptions before submitting a PR (or deciding against implementing it).

Current status: first is defined for general collections, but last is not, as it is only defined for cases where it is possible to retrieve last element in O(1), effectively when endof is defined.

So now the question is for what class of iterables we want to have first and last. Assume we want to get k elements from collection of size n.

  • what should be the acceptable cost of first: O(1) or O(k)? if it is O(1) - we are restricted to indeaxable types, where something like x[1:k] would work; if we accept O(k) for general iterables then we have IterTools.takestrict generates an appropriate object;
  • what should be the acceptable cost of last: O(1) or O(n)? - here the situation is even more complex (in the general case you have to use next iterating from the beginning and identify the location from which you have k positions till the end and additionally it could fail as some iterables can be infinite)

The case why first and last with nchar functions are useful for strings is simple: they are indexable but it was not simple to get the appropriate locations due to variable character width and both operations are executed in O(k) time.

But I am not so clear about the general case - either it is already relatively simple to get head and tail of the collection (like in arrays) or each case would have to be treated separately to ensure efficiency.
Additionally a general solution for first already exists in the form of IterTools.takestrict. It could be moved to Base as first. And a general solution for last is problematic as iterables may be infinite (e.g. IterTools.repeatedly).

@StefanKarpinski @stevengj @nalimilan: given the above what would you see as the appropriate approach as I do not see a clean general solution (but maybe I am missing something here)?

@stevengj
Copy link
Member

stevengj commented Oct 2, 2017

x[1:k] is O(k). I don't see how you can get O(1) for first(x, k) with any type if the return value is a copy of the data (as opposed to a view).

I think it would be nice to support last for general iterables, even though this is in general O(n).

@TotalVerb
Copy link
Contributor

There's an issue floating around somewhere for reverse iteration, which can be a dependency for last.

@bkamins
Copy link
Member Author

bkamins commented Oct 2, 2017

@stevengj If I understand you correctly then the output should be materialized, and a natural choice for collection col is to keep it in Vector{eltype(col)} (something similar to collect but restricting number of elements collected).

Then it could be something like:

function first(col, n)
    v = Vector{eltype(col)}(n)
    i = start(col)
    for j in 1:n
        done(col, i) && throw(ArgumentError("collection must have at least $n elements"))
        v[j], i = next(col, i)
    end
    v
end

function last(col, n)
    iteratorsize(col) == IsInfinite() && throw(ArgumentError("collection must be finite"))
    v = Vector{eltype(col)}(n)
    b = i = start(col)
    c = 0
    while !done(col, i) && c < n
        _, i = next(col, i)
        c += 1
    end
    c < n && throw(ArgumentError("collection must have at least $n elements"))
    while !done(col, i)
        _, i = next(col, i)
        _, b = next(col, b)
    end
    c = 0
    while !done(col, b)
        c +=1
        v[c], b = next(col, b)
    end
    v
end

@TotalVerb I could not locate this issue - could you provide the number or what to look for?

@stevengj
Copy link
Member

stevengj commented Oct 2, 2017

I can't find an open issue. See, for example #4590 and this comment. Would be good to open an issue for this.

Update: filed #23972.

@nalimilan
Copy link
Member

last could initially be implemented only for AbstractArray and AbstractString, we can always extend it later once the reverse iteration protocol is implemented.

@bkamins
Copy link
Member Author

bkamins commented Oct 9, 2017

Agreed - but I assume it should only be guaranteed to work for arrays that have IndexLinear. Correct?
However, I think that PR for AbstractArrays should be different from #23960 (which I believe is ready to merge) as AbstractArrays can be tricky to handle properly in non-standard cases.

@stevengj
Copy link
Member

@bkamins, I think reverse iteration could be made to work with general cartesian indexing.

@bkamins
Copy link
Member Author

bkamins commented Oct 10, 2017

@stevengj Thank you for pointing this. I earlier thought that start/next/done are guaranteed only for IndexLinear.
But I guess it is best to implement it when #23972 is decided (as then a general solution can be applied here).
Otherwise probably adding endof for CartesianRange and dec (as reverse of inc) in multidimensional.jl and building the rest on it should work.

@stevengj
Copy link
Member

stevengj commented Nov 6, 2017

Now that #24187 is merged, we can have last(itr, n) = first(Iterators.reverse(itr), n)

@sostock
Copy link
Contributor

sostock commented May 11, 2021

first(itr, n) and last(itr, n) were added in Julia 1.6.

@sostock sostock closed this as completed May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants