Rename LinearFast etc. and define an indexing `enumerate(::IndexMethod, iter)` method #16378

timholy · 2016-05-15T13:08:36Z

enumerate lets you count along as you access elements of an iterator/array. This can be viewed as equivalent to key/value iteration, but this perspective basically assumes linear indexing. Sometimes you'd prefer it if the returned "key" is of the most efficient indexing type. Hence, ~~visit~~eachindexvalue.

I'm not wedded to the name, so let the bikeshedding begin. This was inspired by working on #16260 (which introduces another reason to be careful about the distinction between counting and indexing), but is sufficiently stand-alone that I thought it merited a separate PR.

I also noticed that enumerate doesn't inline next for LinearSlow arrays. On a quick benchmark, adding @inline gave a 3x speed improvement. In that case visit is still 2x faster (it was 6x faster), but I haven't yet tried to figure out the reason for the remaining gap.

nalimilan · 2016-05-15T13:24:37Z

+1, but needs docs. You should probably mention visit in the docs about enumerate and vice-versa.

tkelman · 2016-05-15T16:34:53Z

base/exports.jl

@@ -981,6 +982,7 @@ export
    enumerate,
    next,
    start,
+    visit,


I don't think we need to export both uppercase and lowercase any more. Let's stick with just one of them exported. Maybe only one of them defined at all.

I basically agree with this. So the question is, do we want for (i, a) in enumerate(A) or for (i, a) in Enumerate(A)? I kind of prefer the lowercase version, and un-export the type.

timholy · 2016-05-15T20:50:46Z

As an alternative to a new exported name like visit, one potential model is checkbounds, which can be called as checkbounds(A, inds...) (throws if out-of-bounds) or checkbounds(Bool, A, inds...) (returns true/false). We could make it enumerate(T, A) to have enumerate return indices rather than a count.

However, here this alternative seems to have two problems:

The very name enumerate seems to scream "count," which key/value iteration most definitely isn't (or at least, isn't for LinearSlow arrays or future 1d arrays with unconventional indices)
The best option I can come up with for what would go in the first slot of enumerate is the function indices, i.e., for (I, a) in enumerate(indices, A). Reasonable, or weird?

Consequently, for the moment I suspect that exporting visit is still probably the best option.

kmsquire · 2016-05-15T21:45:07Z

Most iterators seem to have lowercase functions, and if needed, they're backed by CamelCase types which hold the iterator state. To me, this feels like a good convention to stick to, although it has the usual downside of polluting the address space.

(As an aside, I'm liking golang's convention of qualifying most functions with their (short) package name. Numerical computing in Python seems to be going a similar direction (import numpy as np, import pandas as pd, etc.).)

timholy · 2016-05-15T22:34:30Z

I'm just wondering why we export the "backing" type at all. Seems like the user-friendly constructor is enough.

toivoh · 2016-05-16T04:47:34Z

The name visit seems to evoke the Visitor pattern to me, which is quite different as far as I understand. But it's not so easy to come up with a better name. label could be one possibility, since you're labeling elements with their indices, but the word might be too strongly associated with goto.
Python uses the name items for (key, value) pairs. Maybe eachitem to show the connection to eachindex?

Keno · 2016-05-16T05:33:07Z

I called it indenumerate in AbstractTrees.

kmsquire · 2016-05-16T06:58:12Z

In Scala, this is called zipWithIndex. zipwithindex doesn't really work (for me), but maybe zip_with_index?

timholy · 2016-05-16T11:05:20Z

keyvalue? EDIT: or indexvalue? keyvalue seems to imply that keys should work, which it doesn't now (but that could change).

JeffBezanson · 2016-05-16T13:36:11Z

Is this faster than zip(eachindex(A), A)? I'm guessing it might be, since the state is effectively reused for both iterators.

kmsquire · 2016-05-16T14:25:28Z

Even though there's a correlation between keys and indices, there's enough of a difference that, to me, they should remain separate concepts.

keyvalue? EDIT: or indexvalue? keyvalue seems to imply that keys should work, which it doesn't now (but that could change).

Since we already have eachindex and each* seems to have become a pattern in Julia, eachindexvalue would seem logical, if a little long.

timholy · 2016-05-16T15:00:15Z

Yes, considerably faster:

julia> function itervisit(A)
           s = zero(eltype(A))
           n = 0
           for (I, a) in visit(A)
               s += a
               n += I[1]>2
           end
           s, n
       end
itervisit (generic function with 1 method)

julia> function iterzip(A)
           s = zero(eltype(A))
           n = 0
           for (I, a) in zip(eachindex(A), A)
               s += a
               n += I[1]>2
           end
           s, n
       end
iterzip (generic function with 1 method)

julia> A = rand(10000, 1000);

julia> B = sub(A, 1:size(A,1)-1, :);

julia> @benchmark itervisit(A)
================ Benchmark Results ========================
     Time per evaluation: 13.47 ms [9.70 ms, 17.24 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 1.67 s


julia> @benchmark iterzip(A)
================ Benchmark Results ========================
     Time per evaluation: 19.27 ms [19.22 ms, 19.31 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 2.22 s


julia> @benchmark itervisit(B)
================ Benchmark Results ========================
     Time per evaluation: 33.56 ms [32.94 ms, 34.17 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 48.00 bytes
   Number of allocations: 1 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 3.63 s


julia> @benchmark iterzip(B)
================ Benchmark Results ========================
     Time per evaluation: 52.60 ms [51.96 ms, 53.25 ms]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 5.57 s

Interesting, though, that there's an allocation in itervisit for the LinearSlow case. I hadn't noticed that before, worth investigating. Might get even faster.

timholy · 2016-05-19T11:20:02Z

I went with @kmsquire's suggestion of eachindexvalue and updated the docs. Should be ready to go once CI passes (it does locally).

timholy · 2016-05-19T15:05:47Z

The AppVeyor looks like a new and very exciting (but unrelated) failure. Anyone seen that before?

lstagner · 2016-05-20T00:24:45Z

None of my business but I thought I would suggest traverse instead of eachindexvalue

kmsquire · 2016-05-20T04:20:19Z

None of my business but I thought I would suggest traverse instead of eachindexvalue

@lstagner you should definitely feel free to make suggestions like this!

I'm personally not wed to eachindexvalue--I just think it's the most consistent with eachindex. I would actually prefer indices and indexvalues. To me, traverse doesn't evokes the idea of returning an index and a value for each element in an array... but anyway, we'll see where this goes.

oxinabox · 2017-02-15T06:51:12Z

Is there intent to rebase this for 0.6, or 1.0?

lstagner · 2017-02-15T07:20:54Z

Since this got brought up again one last bit of bikeshedding.

I still think this function should be called traversal or traverse

In the most general sense an array (or any iterable collection) can be thought of type of graph where each vertex has a label (index, key,...) and a value. This iterator then returns the label/value pairs of this graph in the most efficient way. Since the pairs have a special order, in this case efficient memory order, the collection of pairs is a traversal of the graph.

Here is the definition of traversal

In computer science, graph traversal refers to the process of visiting (checking and/or updating) each vertex in a graph/tree. Such traversals are classified by the order in which the vertices are visited.

In principle this function could be generalized to any iterable collection.

timholy · 2017-02-15T10:32:59Z

To me the fundamental obstacle is having both enumerate and whatever we decide to call this---maybe I'm being too much of a purist, but I really don't like having two things which do almost the same thing. If you can figure out how to resolve that problem satisfactorily, I'd be excited to revise this and merge it. Until then, less so.

nalimilan · 2017-02-15T12:52:56Z

This is a pattern we have in other cases too, like for find (vs. findn), findfirst, findmax, etc., where one could want either a linear or a cartesian index (or any kind of custom index). Cf. the Find & Search Julep.

I suggest to use a general mechanism like enumerate(LinearIndex, x) vs. enumerate(CartesianIndex, x) vs. enumerate(FastestIndex, x). The latter could use a better name, but the idea is that you would call it when you don't care about the kind of index and want either linear or cartesian depending on whether the array is LinearFast or LinearSlow. Of course enumerate(x) would keep it's current meaning.

timholy · 2017-02-15T13:50:55Z

Hmm, maybe in conjunction with a renaming (#20175 (comment)) this might be viable. Putting this up for consideration for 0.6. I saw that tomorrow is the stated deadline; I can't do it today, but I probably could tackle it by the end of the day tomorrow. I'll mark this now, and folks coordinating the 0.6 release can comment on whether they think it's important enough to squeeze in.

timholy · 2017-02-16T18:15:51Z

One big problem, though, if we return instances rather than types:

julia> CartesianIndex()
CartesianIndex{0}(())

But we don't actually want a 0-dimensional cartesian index, we want it to match the dimensionality of the input array. We have this:

julia> CartesianIndex{3}()
CartesianIndex{3}((1, 1, 1))

but now I worry that this is less obviously a "trait" than it is a value.

timholy · 2017-02-16T18:53:19Z

OK, review comments addressed with the exception of #16378 (comment). If we really want to close the feature/deprecation window today, I think it's too risky to try to fix something like this now, and we should just go with the improvement we have.

It passes locally, but perhaps better to let it get through CI.

tkelman · 2017-02-16T18:56:06Z

shouldn't harm anything, but just in case @nanosoldier runbenchmarks(ALL, vs = ":master")

mbauman · 2017-02-16T18:59:10Z

Yes, I think we'd have to move to a type-based system in order for CartesianIndex to make sense as a trait. But even then, there are multiple possibilities for what a cartesian array should return as its index style — does it just return CartesianIndex? Or CartesianIndex{ndims(A)}? Could a three-dimensional array specify that it's most efficient to index it as a CartesianIndex{2}? (This isn't all that esoteric; it's how it'd be best to index a 3-d array that uses a compressed-column storage like SparseCSCMatrix.) We'd also have to decide on if we do const LinearIndex = Int for symmetry or if we introduce a new index type. We'd also lose the abstract trait tree, and could no longer dispatch on the general ::IndexStyle. And, really, I think it'd make us hammer down a definitive answer as to what a linear index means sooner than we're ready to.

I think it's too late in the release cycle to open all those cans of worms. This rename is already a huge win. I think that the difference in names can be attributed to the difference between how you index and what you index with… and I think that is defensible.

nanosoldier · 2017-02-16T19:26:04Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @jrevels

tkelman · 2017-02-16T19:32:54Z

deprecation maybe not working as it should?
ERROR: LoadError: indexing not defined for BaseBenchmarks.ArrayBenchmarks.ArrayLF{Int32,2}

nalimilan · 2017-02-16T20:03:24Z

But even then, there are multiple possibilities for what a cartesian array should return as its index style — does it just return CartesianIndex? Or CartesianIndex{ndims(A)}? Could a three-dimensional array specify that it's most efficient to index it as a CartesianIndex{2}? (This isn't all that esoteric; it's how it'd be best to index a 3-d array that uses a compressed-column storage like SparseCSCMatrix.)

I would say it's completely orthogonal to the choice of merging IndexCartesian and CartesianIndex. In both cases, we could support returning the dimensionality as a type parameter later, but that would be an extension of the system anyway.

We'd also have to decide on if we do const LinearIndex = Int for symmetry or if we introduce a new index type.

Well, Int is ambiguous as it could mean 1-based indices or any other kind of offset indices; or it could be row or column-major. So that really needs to be a separate LinearIndex type (after defining what it means).

I admit that passing CartesianIndex() to choose the return type sounds like an abuse (even if it probably wouldn't create any practical problems). I guess it's OK to go with the new names for now, but I hope we can find something better for 1.0...

timholy · 2017-02-16T22:03:23Z

Good catch, @tkelman. Should be fixed now. @nanosoldier runbenchmarks(ALL, vs = ":master").

mbauman · 2017-02-16T22:29:25Z

@nalimilan Yes, sorry, I didn't make it clear my questions were rhetorical. Point is simply that I don't think the use of CartesianIndex would be particularly obvious, even if we go up to the type domain.

mbauman · 2017-02-16T22:34:19Z

base/deprecated.jl

@@ -1272,6 +1272,11 @@ for f in (:airyai, :airyaiprime, :airybi, :airybiprime, :airyaix, :airyaiprimex,
    end
 end

+@deprecate_binding LinearIndexing IndexStyle


This macro will export the old bindings. The @deprecate macro has an optional third argument to disable this behavior — perhaps you could add that to @deprecate_binding, too.

Good catch. Fixed (I hope).

nanosoldier · 2017-02-17T01:59:21Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

`enumerate(A)` doesn't guarantee that the counter corresponds to the index; so when you need an index, call this method.

Ref JuliaLang/julia#16378

malmaud · 2017-02-17T17:59:09Z

Should we think about exporting a short form for enumerate(IndexStyle(S), S)? That's probably both a common case and quite verbose to write out.

timholy · 2017-02-17T18:22:56Z

See discussion above. There seems to be a shortage of good names in the English language.

…ing traits rename (JuliaLang#16378).

…ing traits rename (#16378). (#21997)

…ing traits rename (#16378). (#21997) (cherry picked from commit 18d7af8)

timholy force-pushed the teh/visit branch from 8873710 to f833c36 Compare May 15, 2016 13:15

timholy added the needs docs Documentation for this change is required label May 15, 2016

tkelman reviewed May 15, 2016
View reviewed changes

timholy force-pushed the teh/visit branch from f833c36 to 9d24b83 Compare May 19, 2016 11:02

timholy removed the needs docs Documentation for this change is required label May 19, 2016

timholy mentioned this pull request Jul 15, 2016

Safe non-traditional array indexing #16973

Closed

timholy added the decision A decision on this change is needed label Jul 19, 2016

timholy mentioned this pull request Aug 11, 2016

Expand docs on enumerate's relationship with indexing #17978

Merged

KristofferC mentioned this pull request Feb 15, 2017

inline iterator functions for enumerate #20616

Merged

timholy added this to the 0.6.0 milestone Feb 15, 2017

timholy force-pushed the teh/visit branch from 8edf880 to 981604a Compare February 16, 2017 18:51

timholy force-pushed the teh/visit branch from 981604a to b9fab65 Compare February 16, 2017 22:03

mbauman reviewed Feb 16, 2017

View reviewed changes

timholy added 3 commits February 16, 2017 20:20

Extend atsign-deprecate_binding to skip export of old name

2ebc7c4

Rename LinearIndexing->IndexStyle, LinearSlow->IndexLinear, etc.

22cd6b0

Add enumerate(::IndexStyle, A) for index/value iteration

fb17155

`enumerate(A)` doesn't guarantee that the counter corresponds to the index; so when you need an index, call this method.

timholy force-pushed the teh/visit branch from b9fab65 to fb17155 Compare February 17, 2017 02:45

tkelman merged commit cb50dee into master Feb 17, 2017

tkelman deleted the teh/visit branch February 17, 2017 08:22

maleadt added a commit to JuliaGPU/CUDAnative.jl that referenced this pull request Feb 17, 2017

Fix deprecated indexing traits.

1aaad47

Ref JuliaLang/julia#16378

timholy mentioned this pull request Feb 17, 2017

Support IndexStyle, IndexLinear, IndexCartesian JuliaLang/Compat.jl#329

Merged

malmaud mentioned this pull request Feb 19, 2017

CartesianIndex version of find/findnz? #20684

Closed

Sacha0 added deprecation This change introduces or involves a deprecation needs news A NEWS entry is required for this change labels May 20, 2017

Sacha0 added a commit to Sacha0/julia that referenced this pull request May 20, 2017

Add NEWS.md entry for enumerate(::IndexStyle, itr) addition and index…

46bf9cf

…ing traits rename (JuliaLang#16378).

tkelman pushed a commit that referenced this pull request May 24, 2017

Add NEWS.md entry for enumerate(::IndexStyle, itr) addition and index…

18d7af8

…ing traits rename (#16378). (#21997)

Sacha0 removed the needs news A NEWS entry is required for this change label May 25, 2017

tkelman pushed a commit that referenced this pull request Jun 3, 2017

Add NEWS.md entry for enumerate(::IndexStyle, itr) addition and index…

6b7f68b

…ing traits rename (#16378). (#21997) (cherry picked from commit 18d7af8)

Rename LinearFast etc. and define an indexing enumerate(::IndexMethod, iter) method #16378

Rename LinearFast etc. and define an indexing enumerate(::IndexMethod, iter) method #16378

Conversation

timholy commented May 15, 2016 • edited

nalimilan commented May 15, 2016

tkelman May 15, 2016

Choose a reason for hiding this comment

timholy May 15, 2016

Choose a reason for hiding this comment

timholy commented May 15, 2016

kmsquire commented May 15, 2016

timholy commented May 15, 2016

toivoh commented May 16, 2016

Keno commented May 16, 2016

kmsquire commented May 16, 2016

timholy commented May 16, 2016 • edited

JeffBezanson commented May 16, 2016

kmsquire commented May 16, 2016

timholy commented May 16, 2016

timholy commented May 19, 2016

timholy commented May 19, 2016

lstagner commented May 20, 2016

kmsquire commented May 20, 2016

oxinabox commented Feb 15, 2017

lstagner commented Feb 15, 2017

timholy commented Feb 15, 2017

nalimilan commented Feb 15, 2017

timholy commented Feb 15, 2017 • edited

timholy commented Feb 16, 2017

timholy commented Feb 16, 2017

tkelman commented Feb 16, 2017

mbauman commented Feb 16, 2017 • edited

nanosoldier commented Feb 16, 2017

tkelman commented Feb 16, 2017

nalimilan commented Feb 16, 2017

timholy commented Feb 16, 2017

mbauman commented Feb 16, 2017

mbauman Feb 16, 2017

Choose a reason for hiding this comment

timholy Feb 17, 2017

Choose a reason for hiding this comment

nanosoldier commented Feb 17, 2017

malmaud commented Feb 17, 2017 • edited

timholy commented Feb 17, 2017

Rename LinearFast etc. and define an indexing `enumerate(::IndexMethod, iter)` method #16378

Rename LinearFast etc. and define an indexing `enumerate(::IndexMethod, iter)` method #16378

timholy commented May 15, 2016 •

edited

timholy commented May 16, 2016 •

edited

timholy commented Feb 15, 2017 •

edited

mbauman commented Feb 16, 2017 •

edited

malmaud commented Feb 17, 2017 •

edited