New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration protocol change #25261

Merged
merged 2 commits into from May 18, 2018

Conversation

@Keno
Member

Keno commented Dec 23, 2017

I imported @vtjnash's lowering change and then went ahead and rewrote the base iterator wrappers to use the new iteration protocol instead so people can see it in action. This passes the iterators test, but I haven't run other tests yet, so I suspect some fail.

The strategy here basically what is described in #18823, with the exception of using unions rather than nullables. Additionally, I kept done around as an optional addition (the semantic guarantee
is that if done is not an error, then it determines whether or not the next call to iterate will return true). This was necessary, because the isempty function was derived from the iteration protocol previously, which is no longer possible with the new code. However, after playing with this, it is not actually clear to me that any use of done other than to implement isempty is useful with the new iteration protocol, so maybe the correct thing here is just to have iterators optionally implement isempty just as they would length now (for iterators where this is determinable).

Couple of things on my todo list:

  • Fix other tests
  • Write documentation for new iteration protocol
  • Add lookahead iterator that adds a reliable done method to any iterator by prefetching an element and keeping it in the state (essentially the generic version of the trick people were using to implement done for such iterators before)

There was also some performance concerns needed to be looked at (and @JeffBezanson and @vtjnash wanted to help out with), so hopefully this branch can help with that.

Comments and feedback greatly appreciated (again though, WIP, not supposed to pass tests, etc, so there's probably bugs, but hopefully this is useful to get a feel for what this change does).

@Keno Keno requested review from vtjnash, JeffBezanson and timholy Dec 23, 2017

@Keno

This comment has been minimized.

Member

Keno commented Dec 23, 2017

Another thing to discuss is whether the absence of start in this proposal is an advantage or a disadvantage. In essence, this proposal uses the absence of a state argument to iterate as way to write the separate method based on the observation that for simple iterators, the initial state argument is often trivial:

iterate(x::Array, i=1) = i > length(x) ? nothing : (a[i], i+1)

However, it has the distinct disadvantage that it can't carry any state. I didn't appreciate this aspect before (since it's sort of implicit in our current iteration protocol), but there are certain iterators that want to do some pre-processing at the beginning of an iteration, e.g. imagine:

struct EachLineFile; path::String; end
start(f::File) = (itr = EachLine(open(f.path)); (itr, start(itr)))
iterate(f::File, state=start(f)) = iterate(state...)

which you can of course do just as above for the first argument, but I think it might validly be a problem for separation of concerns. More practical considerations might be something like #22467 (which may also be the case for the EachLineFile operator above depending how you implement it).

On the other hand, there may be iterators for which constructing a valid, type-stable state object is not possible without obtaining the element.

I think the best solution I can currently imagine is making the iteration protocol something like the following:

y = iterate(coll, start(coll))
while y !== nothing
    val, state = y
    # body
    y = iterate(coll, state)
end

this has the advantage that it's file if the return type of start is different from that of the regular state object, because they never get assigned to the same variable.

We could even do something like:

start(x::Any) = start
iterate(x::Any, ::typeof(start)) = iterate(x)

to preserve the nice one liner above for simple iterators, but I worry about whether that'll be too complicated.

@Keno

This comment has been minimized.

Member

Keno commented Dec 23, 2017

Another possibility:

it = prepare(coll)
y = iterate(it)
while y !== nothing
    val, state = y
    # body
    y = iterate(it, state)
end

with

prepare(x::Any) = x
@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Dec 23, 2017

Leaving breadcrumbs to ec91f63 from PR #23642, where I had to introduce a MaybeValue hack to ProductIterator to avoid the performance impact of returning a value equal to nothing when there's no value to return. It would be nice to be able to get rid of it. EDIT: I've just seen you had removed it, great! Nanosoldier covers this, so it should be easy to check whether or to what extent performance suffers.

Probably related: @davidanthoff discussed the case of iterators which need to allocate a resource on the first iteration, see #22466.

@davidanthoff

This comment has been minimized.

Contributor

davidanthoff commented Dec 24, 2017

I think the prepare idea might allow me to get rid of the whole getiterator mess in https://github.com/davidanthoff/IteratorInterfaceExtensions.jl for Query.jl and IterableTables.jl. I'm not sure, need to think about it a bit more, but right now I'm pretty positive. If that was the case, it would be a major, major benefit and simplification for that ecosystem. If we also could solve the resource management issue, boy, would that be wonderful.

I hope this PR will stay open for a while? With the holidays it might be tricky to find the time needed to think about this... (I plan to eat, eat and eat the next couple of days, and do nothing else).

@timholy

This comment has been minimized.

Member

timholy commented Dec 26, 2017

I had a brief look and it seemed pretty reasonable to me. It might be interesting to try to implement the CartesianIndices iteration "natively" (using iterate) rather than relying on its start/next/done methods. Given my interest in seeing #9080 put to bed, I can take a crack at that if you don't have time.

With regards to prepare, your strategy seems quite reasonable. I'd slightly favor the name iterator(x) over prepare(x), as with the latter it's not entirely clear what it's being prepared for. For example, if x is a file handle, I could imagine several different kinds of preparation depending on what you're planning to do with that file.

Thanks so much for doing this!

@nalimilan

This comment has been minimized.

Contributor

nalimilan commented Dec 26, 2017

I agree with @timholy it would make sense to use iterator instead of prepare. That's also really close to getiterator from IteratorInterfaceExtensions, which was indeed general enough to possibly warrant inclusion in Base when discussing the registration of the package (JuliaLang/METADATA.jl#11692). The poster child for this is calling iterator on a data frame to get an iterator encoding information on the column types even though the data frame itself doesn't. In that case, we don't really "prepare" anything, we just return an iterator which will ensure type stability of the iterations.

@Keno

This comment has been minimized.

Member

Keno commented Dec 26, 2017

I have more changes locally, and I'm working on finishing up the PR without either of the proposed extensions (since they're extensions, it's fairly easy to do either in addition on top). However, having spent some more time thinking about this, I'm not sure I'm convinced that it is useful or necessary. It seems always possible to encode whatever information into the state object. I guess the fundamental question is what can you do with the state other than pass it back to iterate. For some iterators, we generally consider the ability to copy the state and start iterating at the same point again. However, whether that works for a given iterator is somewhat ill specified at this point. I'll keep thinking about this and finish up the PR in the mean time, but it's not clear to me.

@timholy

This comment has been minimized.

Member

timholy commented Dec 26, 2017

I agree that it seems like it should be able to bundle everything into start. It's been ages since I've read the issue about "iterators that don't know they're done until they try," but it seems like it should be possible to try for the first object and stash it in state---the same union-splitting that makes this PR possible should also make that possible.

@Keno

This comment has been minimized.

Member

Keno commented Dec 27, 2017

On slack, @vtjnash and I came up with the following proposed lowering:

it = iterator(coll)
advance = iterate(it...)
while advance !== nothing
   f(head(advance))
   advance = iterate(tail(advance)...)
end

@vtjnash also proposes the following extension to handle resource cleanup:

it = iterator(x)
hascleanup = isapplicable(cleanup, it)
if hascleanup
    let isclean = Atomic(false)
        finalizer( it -> (atomic_set!(isclean, true) || cleanup(it)), it)
    end
end
while true
...
end
hascleanup && cleanup(it...) # auto-inserted for any break or return

There's a couple of things to like about this. With a default implementation of

iterator(x::Any) = (x,)

this reduces to the original proposal in this PR for simple iterators. An implementation like,

iterator(x::MyIt) = (x, 1)

could be used to give initial state if necessary (or that could still be put into the iterate function itself).
For an iterator that's self-mutating, we could just have

function iterate(x::SelfMutatingIterator)
    mutate!(x)
    eof(x) && return nothing
    (x.val, x)
end

which I think nicely expresses the notion that such an iterator doesn't really have state.

@JeffBezanson JeffBezanson added the triage label Dec 28, 2017

@Keno

This comment has been minimized.

Member

Keno commented Dec 28, 2017

Slight revision from triage:

it = iterator(coll)
advance = iterate(it...)
while advance !== nothing
   f(head(advance))
   advance = iterate(first(it), tail(advance)...)
end
@vtjnash

This comment has been minimized.

Member

vtjnash commented Dec 28, 2017

That seems to have a typo – what is first(it)? (probably my fault for writing the original wrong) Here's the fixed version:

it = iterate(iterator(coll)...) # NOTE: this perhaps should simplify to just `iterate(coll)`
while it !== nothing
   do(head(it))
   it = iterate(tail(it)...)
end
@Keno

This comment has been minimized.

Member

Keno commented Dec 29, 2017

No, it's not typo. It's the first element of the tuple returned by iterator.

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Dec 31, 2017

I notice a small readability issue: previously we could write value, state = next(itr, state), but that now becomes x = iterate(itr, x[2]); x[1], x[2]. Maybe we could return a named tuple, so that this can be written with x.value, x.state?

@@ -435,9 +424,9 @@ julia> collect(Iterators.rest([1,2,3,4], 2))
```
"""
rest(itr,state) = Rest(itr,state)
rest(itr) = itr

This comment has been minimized.

@JeffBezanson

JeffBezanson Dec 31, 2017

Member

This is a bit confusing since often the functions first and rest are synonyms for head and tail, i.e. rest(x) drops the first element.

This comment has been minimized.

@Keno

Keno Dec 31, 2017

Member

I'm happy to drop this method, though it is useful (see the use in _totuple).

@StefanKarpinski

This comment has been minimized.

Member

StefanKarpinski commented Dec 31, 2017

Maybe we could return a named tuple, so that this can be written with x.value, x.state?

Why not use a real struct type then? IterationPair or IterationValueState or something.

@vtjnash

This comment has been minimized.

Member

vtjnash commented Dec 31, 2017

The ergonomics of not using a Tuple are pretty terrible for the Method signature. I don't want to write:

(iterate(x, state)::IterationPair) = (x[state], state + 1)

Have multiple ways of accessing the return value (x[1], first(x), x.value) also likely means test coverage will be worse.

@vtjnash

This comment has been minimized.

Member

vtjnash commented Dec 31, 2017

If we want to recover / preserve the existing structure, we could provide something like:

val, state = @iterate (x, state) || return

But this is just an example of normal Union{T, Nothing} handling, so it'll benefit from whatever special syntax we eventually provide for that:

val, state = @nullable iterate(x, state) || return
val, state = iterate?(x, state) ||? return
etc., TBD, ...
i, state = next(destiter, state)
dest[i] = x
y == nothing &&
throw(ArgumentError(string("source has fewer elements than required")))

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

Multi line conditionals should probably use if

Also, why are you calling string on a String literal?

x2, state = next(a, state)
done(a, state) && return hash(x2, hash(x1, h))
y1 = iterate(a)
y1 == nothing && return h

This comment has been minimized.

@vtjnash
for acurr in a
if f(acurr)
a[i] = acurr
i, state = next(idx, state)
y = iterate(idx, state)
y == nothing && (i += 1; break)

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

Multiple statements should probably be on multiple lines, but in this case, just assert that the number of indexes and the number of elements are matched:

@assert (y::Tuple{Any, Any}) !== nothing
# promote_shape guarantees that A and B have the
# same iteration space
while ay !== nothing
@inbounds r[ri] = ($f)(ay[1], by[1])

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

Not sure this annotation is on the right expression. We just want it on the r[ri], right?

@@ -143,7 +143,7 @@ function Dict(kv)
try
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv))
catch e
if !applicable(start, kv) || !all(x->isa(x,Union{Tuple,Pair}),kv)
if (!applicable(start, kv) && !applicable(iterate, kv)) || !all(x->isa(x,Union{Tuple,Pair}),kv)

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

iterate is always applicable (fallback to calling start), so this will just have to be broken now perhaps (or improved to actually look at the error?)

while true
for c in completions
(i > endof(c) || c[i] != cc) && return ret
end
ret = string(ret, cc)
i >= endof(c1) && return ret
i = nexti
cc, nexti = next(c1, i)
cc, nexti = c1, i+1

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

i+1 isn’t a valid string index.

@@ -443,7 +443,7 @@ init_load_path(ccall(:jl_get_julia_bindir, Any, ()))
INCLUDE_STATE = 3 # include = include_relative
import Base64
#import Base64

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

Unrelated?

# @deprecate_binding Profile root_module(:Profile) true ", run `using Profile` instead"
# @deprecate_binding Dates root_module(:Dates) true ", run `using Dates` instead"
# @deprecate_binding Distributed root_module(:Distributed) true ", run `using Distributed` instead"
# end

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

Unrelated?

`(block
;; *** either this or force all for loop vars local
,.(map (lambda (r) `(local ,r))
(lhs-vars (cadr (car ranges))))

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

indent

@@ -2269,7 +2274,7 @@
(cdr (cadr e))
(list (cadr e))))
(first #t))
(expand-for (if first 'while 'inner-while)
(expand-for first

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

drop lowering code for inner-while also (a few lines above here)

strip_iter_union(X::Type{Union{Nothing, T}}) where {T} = T
strip_iter_union(::Type{Nothing}) = Nothing
strip_iter_union(::Type{Union{}}) = Union{}
strip_iter_union(T) = T

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

this'll be very slow (because of compilation), and possibly unreliable (because T = Union{Nothing, T} is also a valid solution), use typesubtract instead

head[i] = y[1]
while i < n
y = iterate(c, y[2])
y == nothing && return (resize!(head, i), ())

This comment has been minimized.

@vtjnash

vtjnash Dec 31, 2017

Member

are you making this intentionally type-unstable?

This comment has been minimized.

@Keno

Keno Dec 31, 2017

Member

Frankly I just wanted to get the tests to pass so I could start looking at the performance, though I don't think it matters too much since it's only used in pmap, which is already fairly complex.

@Keno Keno force-pushed the kf/iterate branch 2 times, most recently from 75eeebb to 9f5c04f Dec 31, 2017

@@ -1979,6 +1985,10 @@ function hash(a::AbstractArray{T}, h::UInt) where T
if isa(a, AbstractVector) && applicable(-, x2, x1)
n = 1
local step, laststep, laststate

This comment has been minimized.

@nalimilan

nalimilan Jan 2, 2018

Contributor

laststate isn't used anymore.

@Keno

This comment has been minimized.

Member

Keno commented Jan 5, 2018

After much triaging, it seems like the favored approach is to do this changed as originally proposed in this PR. As of yet unresolved is whether to add a canonical iterator hook, though given the absence of splatting in this iteration protocol, that wouldn't be allowed to specify an initial state.

@Keno Keno force-pushed the kf/iterate branch 3 times, most recently from 203402a to 50a5c7d May 15, 2018

@Keno Keno force-pushed the kf/iterate branch from 50a5c7d to 60dad55 May 17, 2018

@Keno

This comment has been minimized.

Member

Keno commented May 17, 2018

Planning to merge once CI passes.
One final go for good measure:

@nanosoldier runbenchmarks(ALL, vs=":master")

@vtjnash

This comment has been minimized.

Member

vtjnash commented May 17, 2018

I think ideally, we should deal with bug-fixes from merging SSAIR found by PkgDev (https://pkg.julialang.org/pulse.html) before we add a breaking change on top

@Keno

This comment has been minimized.

Member

Keno commented May 17, 2018

You'll have to convince people to delay tagging the alpha for that reason.

@Keno

This comment has been minimized.

Member

Keno commented May 17, 2018

Test failure is LLVM failing to vectorize because it doesn't like our loop new structure (early exit at the top). Shouldn't be too hard to fix. Will take a look in the morning. Alternatively, we may want to change the lowering after all to make the loop structure more obvious to LLVM.

@nanosoldier

This comment has been minimized.

nanosoldier commented May 17, 2018

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno Keno force-pushed the kf/iterate branch from 60dad55 to b4543c8 May 17, 2018

@Keno

This comment has been minimized.

Member

Keno commented May 17, 2018

Fixing LLVM was too hard, I updated this to use the loop inverted lowering, which should fix the test case.

@nanosoldier runbenchmarks(ALL, vs=":master")

Keno and others added some commits Dec 22, 2017

Change iteration protocol
This changes the iteration protocol from `start`/`next`/`done` to `iterate`.
The new lowering of a for loop is as follows:

```
for x in itr
    ...
end
```

becomes

```
next = iterate(itr)
while next !== nothing
    x, state = next::Tuple{Any, Any}
    ...
    next = iterate(itr, state)
end
```

The semantics are as apparent from the above lowering. `iterate` returns
either `nothing` or a tuple of value and state. The state is passed
to any subsequent operation. The first iteration is indicated, by not passing
the second, state argument to the `iterate` method.

Adaptors in both directions are provided to keep the legacy iteration
protocol working for now. However, performance of the legacy iteration
protocol will be severely pessimized.

As an optional add-on for mutable iterators, a new `isdone` function is
provided. This function is intended as an O(1) approximate query for
iterator completion, where such a calculation is possible without mutation
and/or is significantly faster than attempting to obtain the element itself.
The function makes use of 3-value logic. `missing` is always an acceptable
answer, in which case the caller should go ahead and attempt the iteration
to obtain a definite result. If the result is not `missing`, it must be
exact (i.e. if true, the next call to iterate must return `nothing`, if
false it must not return nothing).
Use loop inverted loop lowering
The primary idea of the new iteration protocol is that for
a function like:
```
function iterate(itr)
   done(itr) ? nothing : next(itr)
end
```
we can fuse the `done` comparison into the loop condition and
recover the same loop structure we had before (while retaining
the flexibility of not requiring the done function to be separate),
i.e. for
```
y = iterate(itr)
y === nothing && break
```
we want to have after inlining and early optimization:
```
done(itr) && break
y = next(itr)
```
LLVM performs this optimization in jump threading. However, we run
into a problem. At the top of the loop we have:
```
y = iterate
top:
%cond = y === nothing
br i1 %cond, %exit, %loop
....
```
We'd want to thread over the `top` block (this makes sense, since
by the discussion above, we need to merge our condition into the
loop exit condition). However, LLVM (quite sensibly) refuses to
thread over loop headers and since `top` is both a loop header
and a loop exit, we fail to perform the appropriate transformation.

However, there's a simple fix. Instead of emitting a foor loop as
```
y = iterate(itr)
while y !== nothing
    x, state = y
    ...
    y = iterate(itr, state)
end
```
we can emit it as
```
y = iterate(itr)
if y !== nothing
    while true
       x, state = y
       ...
       y = iterate(itr, state)
       y === nothing && break
    end
end
```
This transformation is known as `loop inversion` (or a special
case of `loop rotation`. In our case the primary benefit is
that we can fuse the condition contained in the initial `iterate`
call into the bypass if, which then lets LLVM understand our loop
structure.

Co-authored-by: Jeff Bezanson <jeff@juliacomputing.com>

@Keno Keno force-pushed the kf/iterate branch from b4543c8 to 62fbad2 May 18, 2018

@nanosoldier

This comment has been minimized.

nanosoldier commented May 18, 2018

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno Keno merged commit aa301aa into master May 18, 2018

4 of 5 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci: build-i686 Your tests passed on CircleCI!
Details
ci/circleci: build-x86_64 Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
julia freebsd ci Build done
Details

@martinholters martinholters deleted the kf/iterate branch May 18, 2018

@Sacha0

This comment has been minimized.

Member

Sacha0 commented May 18, 2018

Boom!

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented May 18, 2018

Do we, uh, have a handle on those 600x regressions?

@Keno

This comment has been minimized.

Member

Keno commented May 18, 2018

Yes, they trigger the deprecation warning for the old iteration protocol.

maleadt added a commit to maleadt/LLVM.jl that referenced this pull request May 18, 2018

maleadt added a commit to JuliaGPU/CUDAdrv.jl that referenced this pull request May 18, 2018

martinholters added a commit to martinholters/DataStructures.jl that referenced this pull request May 18, 2018

LegacyIterationCompat{T, typeof(val), typeof(state)}(val, state)
end
function next(itr::I, state::LegacyIterationCompat{I,T,S}) where {I,T,S}

This comment has been minimized.

@martinholters

martinholters May 18, 2018

Member

This (and similar for done) are real ambiguity magnets for all iterators out there that define next(it::MyIterator, state). Of course, next should never be called with arguments that hit the ambiguity, but detect_ambiguities is not amused. But I suppose there is no easy way to prevent that?

@samoconnor

This comment has been minimized.

Contributor

samoconnor commented on base/parse.jl in 1a1d6b6 Oct 1, 2018

@Keno @StefanKarpinski
i <= ncodeunits(s) should be i <= endpos ?

When I do parse(Int, "0") where "0" is an AbstractString that has ncodeunits > lastindex, i get this:
TypeError: in parseint_preamble, in typeassert, expected Tuple{Char,Int64}, got Nothing

This comment has been minimized.

Contributor

samoconnor replied Oct 1, 2018

@KristofferC KristofferC removed the needs news label Nov 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment