the rangepocalypse #5585

StefanKarpinski · 2014-01-27T22:43:56Z

I know @JeffBezanson hates umbrella issues, but there are so many range-related problems cropping up that I wanted to have a single place to consolidate the various issues. Here are some of the relevant issues:

ranges terribly sensitive to floating point issues – ranges terribly sensitive to floating point #2333
integer range overflows – Integer range overflows #3079 and RFC: Basic BigRange support available #3087
problem with rand(typemin(Int32):typemax(Int32)) – Problem with rand(typemin(Int32):typemax(Int32)) #5550 and Fix overflow in #5550 #5555
performance / special case lowering – 0860767 and regression in performance of counting loops #5469

Less immediate, but related:

first and last on empty ranges – first() and last() on empty ranges #2537
Uint8 ranges – make [a, b] not concatenate #3737 (edit: more about cat than ranges)
float ranges: mismatch between length and bounds error – float ranges: mismatch between length and bounds error #329 (closed)
fix for last(t::Range) Fix for last{T}(t::Range{T}) #2734 (closed)
fix float range behavior – Fix float range behavior #2959 (closed)
repair floating point ranges with start==stop Repair floating point ranges with start==stop. #3961 (closed)
infinite loop for Uint8 range – Infinite loop for Uint8 range #5483 (closed)

We need a more coherent story for ranges. I suspect that ranges of ordinal types, including integers and chars need to be handled rather differently from floating-point ranges and such.

The text was updated successfully, but these errors were encountered:

jiahao · 2014-01-27T22:46:36Z

what a wide range of problems.

StefanKarpinski · 2014-01-27T22:57:12Z

JeffBezanson · 2014-01-28T05:38:53Z

We have a totally coherent story: ranges are a more compact representation of a certain class of dense vectors.

The only real problem is that the size of a dense vector is obviously limited by Int, which on a 64-bit system is totally reasonable. But with the range representation a much bigger "vector" becomes possible.

The tricky part is factoring and naming the types. We can probably keep Ranges. Range1 was never very useful for floating point, so it can probably become an alias for OrdinalRange1. I'd guess the best fate for Range is to make it an alias for Ranges, and after #1470 have constructors for it that make an appropriate concrete range. Or it could just become a function. If we switch floats to start-stop-len representation though, there will not be any real use for Range in any backwards-compatible sense.

StefanKarpinski · 2014-01-28T18:09:59Z

What they represent has never been an issue. How they are represented and implemented, however, is clearly a problem or there wouldn't be ten billion issues related to ranges. I think the way this needs to work is that we separate out "ordinal ranges" from "linspace ranges".

The "ordinal ranges" include integer, bigint, character, etc. ranges, and store the start, stop and step for those. Since using a step of 1 is so common, that can be factored out as its own case. The types should probably look like this:

abstract Ranges{T}
abstract OrdinalRange{T,S<:Integer} <: Ranges{T}

immutable UnitRange{T} <: OrdinalRange{T,Int}
    start::T
    stop::T
end

immutable StepRange{T,S<:Integer} <: OrdinalRange{T,S}
    start::T
    stop::T
    step::S
end

When you write 1:10 it would produce a UnitRange{Int} object, while 1:2:10 would be a StepRange{Int,Int} object. The type parameter on the .step field allows large steps to be used for things like -big(2)^1000:big(2)^100:big(2)^1000.

The "linspace ranges" need a different internal representation: start, stop and length. Each value is computed by taking a linear combination of the start and the stop. For the greatest accuracy, we might even want to allow downscaling: i.e. start, stop, length, and divisor, and ref is computed something like this:

function ref(r::LinearRange, k::Int)
    k -= 1; (0 <= k <= r.length) || error(BoundsError)
    ((r.length-k)*r.start + k*r.stop)/r.divisor
end

The type should probably look like this:

immutable LinearRange{T,S<:Integer} <: Ranges{T}
    start::T
    stop::T
    length::S
    divisor::T
end

I'm not sure about the divisor part since that's only necessary to improve a few very nasty corner cases. It would be nice to reclaim Range as the name of the abstract type instead of Ranges, which has always been a bit awkward.

quinnj · 2014-01-28T18:15:37Z

Can I vote against setting the S<:Integer parameter as an Integer subtype? That would disallow date ranges from easily plugging in to this since the step is a Period type. Just leaving it S doesn't hurt, since these are mostly internal, right?

StefanKarpinski · 2014-01-28T18:17:33Z

Yes, that's entirely fair and making sure this works well with dates and times a crucial design consideration. What kinds of constructions do you need / want for dates and times? Can you list them with examples here?

lindahua · 2014-01-28T18:18:15Z

I generally agree with @StefanKarpinski's proposal.

Just have to emphasize that a:b and a:s:b always need to be different types. view(a:b, :) and view(a:s:b, :) are very different things, for the former you may directly apply a BLAS function without making a copy.

StefanKarpinski · 2014-01-28T18:20:40Z

The type of the .length field in the LinearRange type is a bit problematic, actually. Should it be of the same type as .start and .stop? That may make the most sense since otherwise if it's say Int when the start/stop are Float64, there are lots of perfectly reasonable ranges that we can't represent because Int doesn't have enough range. I know you can't materialize ranges that are that big, but we should still be able to use them. If .length is the same types as .start and .stop then it will allow all integral lengths up to 2^53 to be represented exactly, and then only even-length ranges up to 2^54, and so on. We could allow it to be a type parameter still, however, so that if you want to directly construct one, you can explicitly choose the type of the length.

mschauer · 2014-01-28T18:24:16Z

Will the set of UnitRange start and stop be restricted to those who have a (Int-) length?

JeffBezanson · 2014-01-28T18:35:19Z

My current idea there would be to allow any start and stop, but compute the length with overflow checking.

StefanKarpinski · 2014-01-28T18:35:52Z

@mschauer, no, that's the whole point – the length representation is lousy for ordinal range types. This way you don't need to pick a length type since it's not needed for iterating the object. Computing the length and indexing into it need some thought. For length, I think that we could just do this:

length(r::OrdinalRange) = r.stop - r.start + 1

StefanKarpinski · 2014-01-28T18:36:47Z

Computing the length with overflow checking is a good idea.

JeffBezanson · 2014-01-28T18:37:55Z

The divisor representation has the problem that the intermediate result k*r.stop can overflow.

kmsquire · 2014-01-28T18:49:35Z

view(a:b, :) and view(a:s:b, :) are very different things, for the former you may directly apply a BLAS function without making a copy.

It's been a while since I've attempted interacting directly with BLAS, but I was under the impression that for some functions, at least, one could provide a stride for both x and y direction of matrices, which in theory could allow direct operations on (some) views.

lindahua · 2014-01-28T19:03:50Z

BLAS level 2 & 3 functions only work with matrices with (contiguous rank 1), i.e. each column needs to be contiguous. But this is not related to the discussion here though.

quinnj · 2014-01-28T19:29:41Z

I don't think there's much that's particularly fancy about date ranges. Something along the following

# Creates Range{Date} with default step of Day(1)
daterange = Date(2013,1,1):Date(2014,1,1)
# Date range with step of Week(1) from 2013-01-01 to 2014-01-01
daterange = Date(2013,1,1):Week(1):Date(2014,1,1)
# Datetime works similarly, but can also have TimePeriod steps
# Datetime range for every second of the day 2014-01-28
datetimerange = Datetime(2014,1,28):Second(1):Datetime(2014,1,29)

I think it's pretty simple and straightforward and provides the expected functionality. I originally thought of trying to incorporate the recur function into ranges, but as Jeff and I discussed, you're not really looking for a range type in that case, but the actual resulting array of dates. But for creating regular-stepped date ranges, this is simple and convenient.

@milktrader may have more ideas on anything else that we may want consider with date ranges specifically.

StefanKarpinski · 2014-01-28T20:22:35Z

I don't think they're crazy exotic or anything, but they are just exotic enough that they make a great example that we need to make sure we can support. It would be nice not to have to create a new, separate set of range types every time someone wants to make something rangelike with their own custom types.

EyeOfPython · 2014-01-30T13:02:53Z

Are we already supporting multidimensional Ranges ? Range in cartesian space could be useful, representin e.g. AABBs. At least it should be easy to adapt behaving like Range.

simonster · 2014-01-30T16:50:02Z

@EyeOfPython There is an NDRange example included with Julia, although it's broken and could be improved considerably with the new language features introduced since it was last updated 2 years ago.

lindahua · 2014-01-30T16:54:08Z

NDRange could be a great idea, but best discussed in a separate thread. The pressing problem here is to resolve the issues related to current 1D range systems as listed above.

JeffBezanson · 2014-01-30T19:18:18Z

I started prototyping the candidate new Range1 representation. The trickiest thing is that if we want to allow ranges like typemin(Int):typemax(Int), iterating becomes difficult since there is no invalid value available to mean "end". Therefore we need more state. I tried the following:

start{T<:Integer}(r::Range1{T}) = (false, r.start)
next{T<:Integer}(r::Range1{T}, i) = (tupleref(i,2), (true, oftype(T, tupleref(i,2)+1)))
done{T<:Integer}(r::Range1{T}, i) = (r.stop<r.start) | (tupleref(i,1) & (tupleref(i,2)==oftype(T, r.stop+1)))

That is kind of crazy, but amazingly enough LLVM is able to optimize everything away and we get loops that are sometimes even faster than before.

ref #5585

JeffBezanson · 2014-01-31T02:35:36Z

This seems to be CPU-dependent. On some machines that implementation is indeed significantly slower.

StefanKarpinski · 2014-01-31T03:04:36Z

Man, that's a real bummer. I considered starting the state at the value before the first one:

start{T<:Integer}(r::Range1{T}) = oftype(T,r.start-1)
next{T<:Integer}(r::Range1{T}, i) = (oftype(T,i+1), oftype(T,i+1))
done{T<:Integer}(r::Range1{T}, i) = i == r.stop

Of course, that considers typemin(Int):typemax(Int) to be empty, so that's no good.

simonster · 2014-01-31T04:36:52Z

This is a problem for 32-bit ranges, but on a 64-bit system I think you'd be dead before you could finish iterating over typemin(Int):typemax(Int).

lindahua · 2014-01-31T12:33:33Z

What about we simply reject (i.e. throw an error with clear message) upon the construction of a:b when b is typmax(Int)?

Why should we need to support a use case which is almost never used in real practice when supporting it requires disproportionate amount of efforts?

milktrader · 2014-01-31T12:48:34Z

I would second @karbarcca that Datetime ranges are sufficient as currently implemented. I haven't run into any issues yet.

julia> dates[1:2]
2-element Array{Date{ISOCalendar},1}:
 1980-01-03
 1980-01-04

julia> length(dates)
505

julia> tuesdays = x->dayofweek(x)==Tuesday
(anonymous function)

julia> tue = dates[1]:tuesdays:dates[505]
1980-01-08:(anonymous function):1981-12-29

julia> typeof(tue)
DateRange1{ISOCalendar} (constructor with 1 method)

julia> tue.step
(anonymous function)

julia> length(tue)
104

julia> everythirdweek = [tue][1]:weeks(3):[tue][104]
1980-01-08:3 weeks:1981-12-22

julia> length(everythirdweek)
35

julia> [everythirdweek][1:2]
2-element Array{Date{ISOCalendar},1}:
 1980-01-08
 1980-01-29

added note: the dates array above is just something I had lying around. To make something similar you would do something like this:

julia> dates = [date(1980,1,1):days(1):date(1981,12,31)];

StefanKarpinski · 2014-02-01T01:19:30Z

That's not actually what I interpreted @karbarcca as saying. The date/time range functionality is there but it's a shares almost no code with Base's range implementations. You shouldn't need to rewrite all the tricky functionality and logic for ranges because you have a new type of thing that you want to make ranges of. That's a complete failure of generic programming and composability.

milktrader · 2014-02-01T01:51:14Z

Aha, I figured I was missing some important point.

StefanKarpinski · 2014-02-01T02:09:37Z

See #5636 – a mind-reading float-range implementation.

JeffBezanson · 2014-03-18T20:19:33Z

Take a look at my latest attempt. Not done yet, but the basic type definitions can be examined and debated.
Boy implementing ranges is a pain in the ass!

StefanKarpinski · 2014-03-18T20:25:16Z

Yeah, I've been following that – looks pretty good. How do you feel about merging soon?

JeffBezanson · 2014-03-18T20:36:36Z

I'd like to as soon as we can get the rest of ranges.jl implemented.

Division is a minor wrinkle --- datetimes and periods support /, but maybe ranges should be defined in terms of div.

JeffBezanson · 2014-03-18T23:24:32Z

A big question is what to do with Range. Probably best to just reclaim it as the abstract type name since that's what we really want anyway.

Also maybe StepRange should be called StridedRange, to be more evocative to people used to arrays.

StefanKarpinski · 2014-03-19T14:50:22Z

I rather like StepRange, but I guess StridedRange would be ok. The idea of StridedArray is that it stored in memory with fixed strides, which doesn't really make sense for a range, which isn't stored. I agree that we should just reclaim the Range name.

JeffBezanson · 2014-03-27T01:56:24Z

I looked at a few other libraries and languages, and people tend to either accept a length argument, or accept a value one beyond the last actual value. This cleverly sidesteps the typemin(Int):typemax(Int) problem by making it impossible to express. I'm not sure there is any other solution. Throwing an error requires, at least, an extra branch at the start of every loop.

JeffBezanson · 2014-03-28T05:27:52Z

An interesting option would be to use the lifted (Bool,Int) state for StepRange since that is less performance-critical.

simonster · 2014-03-28T06:40:15Z

So, I tried this:

start(r::Range1{Int,Int}) = uint(0)
next(r::Range1{Int,Int}, i) = (int(r.start+i), box(Uint64,add_int_nuw(unbox(Uint64,i),unbox(Uint64,1))))
done(r::Range1{Int,Int}, i) = (r.sentinel-1 < r.start) | (i > (uint(r.sentinel)-1)-r.start)

where add_int_nuw is a new no unsigned wrap add intrinsic. i eventually wraps and gives an infinite loop for typemin(Int):typemax(Int), but if Int == Int64 that takes several centuries, so I think we can safely ignore that. (Not sure what to do for Int == Int32, where this behavior is observable and potentially problematic.) This seems to work with the loop vectorizer and generate reasonable looking IR, and the performance difference from master is within the margin of error on my system, although I can't promise this is true elsewhere. I'm sure I've missed an edge case, though...

simonster · 2014-03-28T15:03:34Z

Unfortunately this only seems to partly work with the loop vectorizer. The first function I define in a session gets vectorized, but not subsequent functions. There is no problem with the old iterator.

JeffBezanson · 2014-03-28T16:53:54Z

This is quite interesting.

That vectorizer behavior sounds like a bug; I don't see why it should be stateful like that.

JeffBezanson · 2014-03-28T17:05:18Z

Another idea I want to discuss: in some cases it is much more convenient to construct a range by length. We could add a function range(start, [step], length) for doing that. Then range and colon would be the two ways to make ranges, nice because neither depends on the implementation (e.g. we can change what types they return in the future).

StefanKarpinski · 2014-03-28T17:17:53Z

I like that idea a lot. Those are the two standard ways to do this. One wrinkle is that while range(0.3, 1.1, 9) is very similar to linspace(0.3, 1.1, 9), since the former constructs a FloatRange object, it gives this:

Base.showcompact(io::IO, x::Float64) = show(io,x)

julia> [0.3:0.1:1.1]
9-element Array{Float64,1}:
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1.0
 1.1

while linspace currently doesn't:

julia> linspace(0.3,1.1,9)
9-element Array{Float64,1}:
 0.3
 0.4
 0.5
 0.6000000000000001
 0.7000000000000001
 0.8
 0.9
 1.0000000000000002
 1.1

This is probably an argument for linspace doing the same endpoint lifting as FloatRange.

simonster · 2014-03-28T17:24:07Z

I realized I was benchmarking with a function where I wasn't actually using the iteration variable. If I use the iteration variable, then I get an extra add in the loop (two extra adds for summing the yielded values) and this is much slower, so it's not really viable.

StefanKarpinski · 2014-03-28T17:44:59Z

Ah, that makes sense. Tricksy compilers.

…t,len this Range1 could be the UnitRange of #5585, with Range1 deprecated also intended to address #5469 (performance)

JeffBezanson added a commit that referenced this issue Jan 30, 2014

change Range1 to start-stop representation instead of start-len

6b3f756

ref #5585

JeffBezanson added a commit that referenced this issue Mar 18, 2014

save checkin. at attempt to start on ordinal ranges (#5585)

af26eb7

JeffBezanson added this to the 0.3 milestone Mar 20, 2014

JeffBezanson mentioned this issue Mar 26, 2014

0x00:0xff is empty #6262

Closed

StefanKarpinski modified the milestones: 0.3-rc1, 0.3 Mar 28, 2014

JeffBezanson added a commit that referenced this issue Mar 31, 2014

alternate implementation of Range1 storing start,stop instead of star…

5d12393

…t,len this Range1 could be the UnitRange of #5585, with Range1 deprecated also intended to address #5469 (performance)

JeffBezanson added a commit that referenced this issue Mar 31, 2014

save checkin. at attempt to start on ordinal ranges (#5585)

d53e093

JeffBezanson added a commit that referenced this issue Apr 1, 2014

alternate implementation of Range1 storing start,stop instead of star…

ef3c641

…t,len this Range1 could be the UnitRange of #5585, with Range1 deprecated also intended to address #5469 (performance)

JeffBezanson added a commit that referenced this issue Apr 1, 2014

save checkin. at attempt to start on ordinal ranges (#5585)

5ba25ea

JeffBezanson closed this as completed in c402ce9 Apr 1, 2014

quinnj mentioned this issue Aug 21, 2014

performance of creating an array from a range #4784

Closed

mauro3 mentioned this issue Jun 18, 2015

Iteration over typemin(Uint64):typemax(Uint64) stops immediately and silently. #11742

Closed

gajomi mentioned this issue Jan 12, 2016

We need more unique and related methods #14649

Closed

4 tasks

pablosanjose mentioned this issue Oct 27, 2017

Sequences aren’t computed properly #24357

Closed

the rangepocalypse #5585

the rangepocalypse #5585

Comments

StefanKarpinski commented Jan 27, 2014

jiahao commented Jan 27, 2014

StefanKarpinski commented Jan 27, 2014

JeffBezanson commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

quinnj commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

lindahua commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

mschauer commented Jan 28, 2014

JeffBezanson commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

JeffBezanson commented Jan 28, 2014

kmsquire commented Jan 28, 2014

lindahua commented Jan 28, 2014

quinnj commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

EyeOfPython commented Jan 30, 2014

simonster commented Jan 30, 2014

lindahua commented Jan 30, 2014

JeffBezanson commented Jan 30, 2014

JeffBezanson commented Jan 31, 2014

StefanKarpinski commented Jan 31, 2014

simonster commented Jan 31, 2014

lindahua commented Jan 31, 2014

milktrader commented Jan 31, 2014

StefanKarpinski commented Feb 1, 2014

milktrader commented Feb 1, 2014

StefanKarpinski commented Feb 1, 2014

JeffBezanson commented Mar 18, 2014

StefanKarpinski commented Mar 18, 2014

JeffBezanson commented Mar 18, 2014

JeffBezanson commented Mar 18, 2014

StefanKarpinski commented Mar 19, 2014

JeffBezanson commented Mar 27, 2014

JeffBezanson commented Mar 28, 2014

simonster commented Mar 28, 2014

simonster commented Mar 28, 2014

JeffBezanson commented Mar 28, 2014

JeffBezanson commented Mar 28, 2014

StefanKarpinski commented Mar 28, 2014

simonster commented Mar 28, 2014

StefanKarpinski commented Mar 28, 2014