Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have an operator for expressing ranges as start/length #99

Closed
gavinking opened this issue Dec 11, 2011 · 57 comments
Closed

Should we have an operator for expressing ranges as start/length #99

gavinking opened this issue Dec 11, 2011 · 57 comments

Comments

@gavinking
Copy link
Member

We have the syntax start..end for expressing inclusive ranges (what I've been calling "spans"). This makes an appearance in two different operators, for example:

for (i in 0..end) { ... }
for (x in seq[0..end]) { ... }

In this thread:

http://groups.google.com/group/ceylon-dev/browse_thread/thread/82daa1e0750f133a?hl=en

The idea of having an operator for expressing ranges by start and length (what I've been calling "segments") came up. The syntax I like the most would be a :, but I think that ..: or ../ could also potentially work.

for (i in 0:length) { ... }
for (x in seq[0:length]) { ... }

Now, : is a potentially very useful symbol, so I'm a little loathe to "waste" it on something non-critical like this. So a different possibility would be something like ..: or ../.

Anyway, this is not going to be for M1, it's just an idea that is out there to see if it would be popular with other folks.

@quintesse
Copy link
Member

PS: How about exclusive spans? for (i in start..>end) , for (i in start<..end) and even for (i in start<..>end)

@ikasiuk
Copy link
Member

ikasiuk commented Dec 11, 2011

An example why this would be useful:

Element[] s2 = s1[start:length];

Using .. instead of : this becomes (assuming length>=0)

Element[] s2;
if (length > 0) {
    s2 = s1[start..start+length-1];
} else {
    s2 = {};
}

...though you could write it a bit shorter using the operators proposed by #88.
The if is necessary because start..start-1 has two elements, not zero (x..y always has at least one element, in contrast to x:y).

@gavinking
Copy link
Member Author

Well, a totally different approach occurs to me, that ties in with the proposal for comprehensions in #82. Instead of .. and : operators, we could adopt the notation used in mathematics. This would not be an operator as such, but rather an extension to the syntax of for:

loops:

for (0<=i<length) { .... }
for (start<=i<=end) { ... }

This is clearly more flexible, and no more verbose than .. and :. It does look slightly noisier.

comprehensions:

value cubes = { for (1<=i<=10) i**3 };

Again, more flexible, and no more verbose than .. and :.

subranges:

String world = "hello world"[for (6<=i<=11) i];
value subseq = sequence[for (0<=n<length) n];

Well, this is obviously a lot more verbose, but it's also waay more powerful. Basically this:

sequence[for (0<=n<=end) n]

would be more or less a shortcut way of writing:

{ for (0<=n<=end) if (n<sequence.size) sequence[n] }

I'm kinda liking this concept!

@gavinking
Copy link
Member Author

Note that we could likely even support an abbreviation for the very common case of 0 being the inclusive lower bound:

for (i<length) { .... }
value cubes = { for (i<=10) i**3 };
value subseq = sequence[for (n<length) n];

That winds up even cleaner and less verbose that .. and :.

@gavinking
Copy link
Member Author

Tako points out that step size is a problem with this syntax. (Currently we can write (start..end).by(step).) That's a definite downside. It can be worked around with the following idiom:

for (i<length/step) {  
    dosomethingwith(start+i*step); }
}

But that's clearly a little unnatural.

@quintesse
Copy link
Member

of course for (i<=10, i+=2) might be a possibility as well (I know you don't like it, but just putting it out there for discussion)

@gavinking
Copy link
Member Author

Tako points out that step size is a problem with this syntax.

Well, actually not so much. Here's a solution:

for ( x in { for(i<10) if(i%2==0) i } ) { ... }

Clearly not as clean as:

for (x in (0:10).by(2)) { ... }

but then I have very rarely used step sizes in Java, and this kind of thing is going to be even way less common in Ceylon where you simply don't iterate over array (i.e. sequence or List) indexes. It's always the wrong way to do it.

@ikasiuk
Copy link
Member

ikasiuk commented Dec 11, 2011

Or:

for (x in { for (i<5) i*2 }) { ... }

But I also rarely need this.

A question concerning subranges: does s[for (i<=n) i] allow an equally efficient implementation as s[0..n]? You can't easily map it to a simple method call like s.span(0, n).

@gavinking
Copy link
Member Author

So I'm convinced step sizes and reverse ranges, etc, are not an issue. Here's my reasons:

  1. If you really want a C-style for, it's really easy to emulate one with a while. It's not even more verbose.
  2. If we find we want the functionality of .. and :, we can easily provide segment() and span() functions. The reason for providing the operators was to abbreviate the common case, not to make it easier to write rare complex stuff like for (i in span(start,end).by(step)).
  3. Comprehensions are a much more flexible way to create sequences of numbers than .. and : are, though they're certainly optimized for more complex cases rather than the simpler cases of segments and spans.
  4. The proposal for enhancing the syntax of for enhances the more "ceylonic", more declarative approach, which is what the language is really designed for.
  5. It's arguably much more natural and simpler to optimize for (i<length) to a Java for than it is to optimize a Range instantiation. I'm not saying that you can't optimize for (i in 0:length), you most certainly can, it's just that there's bigger conceptual gap between the conceptual form and the optimized form.

Most of all, I simply adore the immediate readability of:

for (i<length) { ... }
for (xmin<=x<=xmax) { .... }
{ for (x<=max) x**3 }

That's the opposite of cryptic.

@gavinking
Copy link
Member Author

A question concerning subranges: does s[for (i<=n) i] allow an equally efficient implementation as s[0..n]? You can't easily map it to a simple method call like s.span(0, n).

That's a good question. When I characterized it as an abbreviation for a comprehension, that's not precisely right because the thing about the span() and segment() declared by Ranged is the covariant return type. So "hello world"[0..4] returns the string "hello".

But the way I'm now trying to conceptualize [for (...) ...] is that it maps to the following operation of Ranged:

interface Ranged<Index,Subrange> {
    shared formal Subrange subrange(Index... indices);
}

According to the functionality proposed by #82, this means that this operation can be called like this:

String hello = "hello world".subrange(for (i<=4) i);

which we would then let you abbreviate to:

String hello = "hello world" [for (i<=4) i];

Does that make sense?

@ikasiuk
Copy link
Member

ikasiuk commented Dec 11, 2011

Does that make sense?

If I understand it correctly, this means that in most cases s[for (i<n) i] is likely to be less efficient than s.segment(0, n)? Then I don't see why you would want to write the former - it's not even shorter, nor more readable. Not sure if it's even a good idea to allow this abbreviation for subrange then.

Apart from that: a readable syntax that fits well into the language and can replace both .. and :. Not bad.

@gavinking
Copy link
Member Author

If I understand it correctly, this means that in most cases s[for (i<n) i] is likely to be less efficient than s.segment(0, n)?

I think it could be perfectly efficient. I'm not going to go off and eagerly allocate an array under the covers. At most I'm going to create some sort of iterator.

@gavinking
Copy link
Member Author

P.S. I would not cry very much if we simply dropped the [0..end] / [for (i<=end) i] operator. I like it and think it's a very common thing to do, but it can be expressed using a comprehension.

@ikasiuk
Copy link
Member

ikasiuk commented Dec 12, 2011

I think it could be perfectly efficient. I'm not going to go off and eagerly allocate an array under the covers. At most I'm going to create some sort of iterator.

Sounds good. Something I still don't understand is the following...

shared formal Subrange subrange(Index... indices);

Any of the indices can be out of range, so wouldn't that make Subrange something like "Element?[]"? How do you get from there to

String hello = "hello world" [for (i<=4) i];

without sacrificing efficiency, especially considering that indexed access is potentially slow for String (or is it?) ?

@RossTate
Copy link
Member

I don't like the [for (i<=4) i] syntax. It's rather verbose and doesn't read very naturally for me. It's also _much_ less efficient.

Also anything with end-1 is buggy with overflow.

My favorite proposals are still start..end and start.<end, possibly with special syntax for for loops.

As for the original "start/length" question, what about start.*length?

Any of the indices can be out of range, so wouldn't that make Subrange something like "Element?[]"?

This is not a problem if you don't guarantee that the length of the subrange is the same as the number of indices supplied. That way you can skip bad indices.

@ikasiuk
Copy link
Member

ikasiuk commented Dec 13, 2011

Also anything with end-1 is buggy with overflow.

Not sure what you mean. Couldn't you just write "for (i<end) i"?

This is not a problem if you don't guarantee that the length of the subrange is the same as the number of indices supplied. That way you can skip bad indices.

Sure, but then you can't return anything that satisfies Sized, or you are forced to calculate the complete resulting list in memory before returning it.

I think the x<i<y syntax is much better for for loops and comprehensions, as Gavin has convincingly argued above. For subranges, I could live with just calling segment or span directly, and there is also items (defined in Correspondence) , which should work nicely with comprehensions.

@RossTate
Copy link
Member

Also anything with end-1 is buggy with overflow.

Not sure what you mean. Couldn't you just write for (i<end) i?

I'm saying that your suggestion is preferred over for (i in 0..end-1) because yours doesn't have problems with overflow whereas (if we were using unsigned 32-bit integers) end-1 would evaluate to 2**32-1 if end is 0.

@gavinking
Copy link
Member Author

I'm saying that your suggestion is preferred over for (i in 0..end-1) because yours doesn't have problems with overflow whereas (if we were using unsigned 32-bit integers) end-1 would evaluate to 2**32-1 if end is 0.

But the JVM doesn't have unsigned integers, so this is never possible.

@danberindei
Copy link

I'm saying that your suggestion is preferred over for (i in 0..end-1) because yours doesn't have problems with overflow whereas (if we were using unsigned 32-bit integers) end-1 would evaluate to 2**32-1 if end is 0.

But the JVM doesn't have unsigned integers, so this is never possible.

Yes, but the resulting range will be {-1, 0} (or is it {0}?), and that is almost certainly not what the user wanted.

I don't know what the intended use case for the inclusive range construct is, but I wouldn't expect it to be the default. Especially after reading Djikstra's Why numbering should start at zero.

So I would expect start..end to mean an inclusive start, exclusive end range, and the start <= i <= end syntax looks nice for the cases where you really need an inclusive range. But when you omit the start it looks just like a normal comparison and it looks a bit confusing.

Also, I believe having a way to define a range by (start, end) and by (start, length) will only serve to confuse newbies, since they are equivalent in the most common case (start == 0).

@gavinking
Copy link
Member Author

Djikstra's little argument is interesting, but to me ultimately unconvincing, and, frankly, seems to come close to assuming its conclusion in its premises.

A much better approach, I think, is to ask what in practice we generally get as inputs when we construct a range. My experience is that it's almost always the case that we receive either:

  1. two inclusive endpoints, or
  2. an inclusive start and a length.

I find the case of an exclusive endpoint to be something that simply doesn't naturally occur very often. Now, the advantage of a "Djikstra span" is that both of the above can be re-expressed using addition (and no subtraction) in terms of an inclusive start and exclusive end, so if you're hung up on trying to find a single best way to write a range, that's a reasonable compromise.

But we don't have to accept that there should be exactly one way to write down a range. If there's more than one way, then we can make both 1 and 2 expressible directly, without even the use of addition.

http://groups.google.com/group/ceylon-dev/msg/c7b6e7a38e21d3b8?hl=en

@danberindei
Copy link

A much better approach, I think, is to ask what in practice we generally get as inputs when we construct a range. My experience is that it's almost always the case that we receive either:

two inclusive endpoints, or
an inclusive start and a length.

An inclusive range on both ends does seem to to be the standard in functional languages, so maybe it really is more common. I concede that for i in 1..5 does look better for people who are not used to count starting from 0, but array/list indexes start from 0 anyway, so sooner or later they will have to deal with exclusive ranges.

I often find myself iterating on a list and splitting it in consecutive segments, so with inclusive ranges I would need to subtract 1 from the end to ensure that they don't overlap. I would also need subtraction if I were to use start:count ranges.

But we don't have to accept that there should be exactly one way to write down a range. If there's more than one way, then we can make both 1 and 2 expressible directly, without even the use of addition.

Actually that's what I'm trying to argue, that having more than one way to write a range will only lead to confusion. To quote Djikstra:

The programming language Mesa, developed at Xerox PARC, has special notations for intervals of integers in all four conventions. Extensive experience with Mesa has shown that the use of the other three conventions has been a constant source of clumsiness and mistakes, and on account of that experience Mesa programmers are now strongly advised not to use the latter three available features.

Ruby actually has both start..end (inclusive) and start...end (exclusive), but searching for arguments for one over the other I only found a guy who didn't know about the exclusive variant (http://programmers.stackexchange.com/a/65032).

@gavinking
Copy link
Member Author

Actually that's what I'm trying to argue, that having more than one way to write a range will only lead to confusion.

That's possible, but to me it doesn't seem likely. I might be wrong.

To quote Djikstra:

I definitely don't want to get into an argument with Djikstra, especially since he's much smarter than me and sadly not here to defend himself, but I don't find second-hand reports about experience with a 30-year-old language very convincing. Remember that back in those days almost every usage of the range construct was for iterating an array. That's exactly the opposite situation from Ceylon, where we never use ranges for iterating collections!

FTR, Mesa used the notation [0..10], (0..10), (0..10], [0..10) to represent inclusive and exclusive ranges.

@danberindei
Copy link

but I don't find second-hand reports about experience with a 30-year-old language very convincing.

I would have liked to link to a more recent report, but I couldn't find any. The only language I saw that offers both inclusive and exclusive ranges is Ruby, and the exclusive variant seems a bit unpopular there.

That's exactly the opposite situation from Ceylon, where we never use ranges for iterating collections!

I would say it's also the other way around: you never use ranges for iterating collections in Ceylon because it only has inclusive ranges, and inclusive ranges are never appropriate for working with collections!

And it's not just about iterating collections: even splitting a range in two is more awkward with inclusive ranges.

@gavinking
Copy link
Member Author

For the record, one way to rescue the current .. inclusive range syntax and repurpose it to encompass exclusive ranges would be to allow a null index to indicate an empty range, and provide a utility lastIndex() function. Then you could write:

for (i in 0..lastIndex(length)) { ... }

cells[0..lastIndex(length)]

The lastIndex() function would be defined like this:

Integer? lastIndex(Integer length) {
    return length>0 then length-1;
}

I think this is probably a competitive alternative to the : operator.

@gavinking
Copy link
Member Author

Another idea to make the mathematics-style syntax more palatable. I think we could probably eliminate the need for the for keyword in the comprehension syntax. (Don't hold me to this, because I still need to really think that through.)

Then you would be able to write comprehensions like:

{ (i<len) cells[i] }

{ (x in xs) x**2 }

And subranges like:

cells[(i<len) i]

I'm not sure if that's better or worse. But it does address the complaint about verbosity.

@quintesse
Copy link
Member

0..lastIndex(length) doesn't seem like a good idea to me. Because it would mean I can do 1..1 to get a sequence of just one element, but somehow 0..0 is special and gets me nothing. Don't like it.

It would be much easier if we just turn m..n into an empty sequence whenever n < m. That would mean we can't use that syntax to have sequences that count downward (unless someone can come up with a nice syntax to define a step size, which could be negative).

In general I do want to vote to keep this syntax, regardless if you want to implement the math-style syntax or not. This is really nice syntax, easy to understand and most people love it when you show it to them. The math-style could be used for more advanced situations if you want.

@quintesse
Copy link
Member

One way to define the step size, which I've seen used in other language is to use the syntax: 1,3..10, which would count from 1 to 10 with a step size of 2. Of course this can lead to several ugly-ish situations, like for example 10 not being part of the output even though we explicitly mention it. And 1,3..1 would still only give one element even though we mention the number 3. Besides the fact that it might be very difficult to parse because we don't actually use anything to delimit a range (the other languages I've seen this normally use [], eg. [1,3..9]

@ikasiuk
Copy link
Member

ikasiuk commented Feb 2, 2012

It would be much easier if we just turn m..n into an empty sequence whenever n < m. That would mean we can't use that syntax to have sequences that count downward (unless someone can come up with a nice syntax to define a step size, which could be negative).

In general I do want to vote to keep this syntax, regardless if you want to implement the math-style syntax or not. This is really nice syntax, easy to understand and most people love it when you show it to them. The math-style could be used for more advanced situations if you want.

Very true. The m..n syntax is really nice and user-friendly but the backward iteration for n<m breaks it by introducing pathological behavior. So the logical consequence is to only let it iterate forward. If we have the mathematical syntax for the more complex cases then the restrictions of the simple m..n syntax shouldn't be a problem.

Actually, the mathematical syntax could even allow backward iteration in a straight-forward way: if 0<i<10 iterates forward, then 10>i>0 could iterate backward. That's much better than backward iteration with the m..n syntax because it doesn't happen implicitly.

And for custom step sizes I can easily do without syntactic sugar.

@ghost ghost assigned gavinking Nov 17, 2012
@tombentley
Copy link
Member

@gavinking SegmentOp isn't yet in the spec. I assume it support negative lengths?

@chochos
Copy link
Member

chochos commented Nov 22, 2012

Tom, There's no SegmentOp; x[a:b] shows up as an IndexOp IIRC (I already implemented it in js), but it has a different attribute when used with colon instead of periods.

@tombentley
Copy link
Member

@chochos there is! The typechecker lets me can say this

value seq = 0:3;

to create the sequence {0, 1, 2}.

@chochos
Copy link
Member

chochos commented Nov 22, 2012

oooooooh that's another thing entirely... but shouldn't that be a Range? Are you going to transform this at compile time into a sequence with all the elements you need, or should we create some kind of RangeSequence or something like that in the language module?

@gavinking
Copy link
Member Author

It's a Range for length>0, an Empty otherwise.

@chochos
Copy link
Member

chochos commented Nov 22, 2012

since this is not in the spec yet... what types does this operator accept? only Integers, or Ordinals in general?

@FroMage
Copy link
Member

FroMage commented Nov 22, 2012

Gavin: I believe this is your cue to update the spec ;)

@tombentley
Copy link
Member

Lhs is Ordinal and rhs is integer atm, but I don't think that works, which I'll explain after I've eaten...

@chochos
Copy link
Member

chochos commented Nov 22, 2012

That's what I was afraid of... so to get the end of the Range you have to calculate the Range end by looping rhs times incrementing the lhs...

@ikasiuk
Copy link
Member

ikasiuk commented Nov 22, 2012

@chochos We actually already mentioned a solution to this problem a few months ago:
ceylon/ceylon.language#92 (comment)

@chochos
Copy link
Member

chochos commented Nov 22, 2012

Right... but it hasn't been added yet and I don't know if it will. Meanwhile, calculating the end of the Range with the loop works; and having that method in Ordinal that returns another Ordinal at a certain distance will just be implemented with the same loop.

@ikasiuk
Copy link
Member

ikasiuk commented Nov 22, 2012

The idea is the same as with the distanceFrom method, i.e. implement it in an efficient way in the concrete class which satisfies Ordinal instead of using a loop.

A loop is not ok because we don't know how many iterations it will have, so it can too easily result in a almost-endless loop.

@tombentley
Copy link
Member

The problem I was thinking of was this:

maxInteger:2

Presumably the result should be {maxInteger, minInteger}, but I don't think we can't use Range to express this. Using a Range you'd have the sequence {maxIndex, maxIndex-1, ... minIndex+1, minIndex}. This means (start:length).size != length and also that the elements are not strictly increasing. With the {maxInteger, minInteger} interpretation we get the unintuitive result that the elements are not contiguous (though the difference between the first and last element is still 1)

All this is for the JVM, of course.

Maybe reconsidering ceylon/ceylon.language#50 (i.e. detecting the overflow and throwing) would help.

@tombentley
Copy link
Member

Corrected: I meant maxInteger in the example, not maxIndex

@tombentley
Copy link
Member

but I don't think we can't use Range

'can use Range', dammit! I really must learn English at some point.

@ikasiuk
Copy link
Member

ikasiuk commented Nov 22, 2012

It would be ok with me if maxInteger:2 just gives me {maxInteger} or {maxInteger, maxInteger}. Or the : operator could throw an exception (general overflow detection for arithmetic operators is not an option IMO because it makes integer arithmetic too inefficient).

@tombentley
Copy link
Member

I agree that detecting overflow in general too inefficient.

I admit I hadn't considered building knowledge of maxInteger (also maxCharacter) specifically into the generated code we use to generate the Range, but it seems like a specialy-casey hack to me. I suppose we could have some interface with an attribute to determine whether an instance was the maximum (or minimum) value. That would also allow a bit more abstraction over numbers than is possible currently, but it also feels like the top of a slippery slope of trying to describe the properties of number systems (e.g. signed zero, infinity, NaN, etc.).

@ikasiuk
Copy link
Member

ikasiuk commented Nov 22, 2012

I guess you're right. So apparently the only easy solution is to let the method in Ordinal (which we haven't defined yet) return maxInteger in this case so that maxInteger:2 results in {maxInteger}. I think I could live with that: it seems reasonably sane and at least not worse than the alternatives.

@tombentley
Copy link
Member

I think the problem stems from trying to use a Range to represent something which can't necessarily be represented as a Range. If you think of the case of iterating over a segment:

for (i in start:length) { ... }

And compare it with the Java equivalent:

for (int i = start; i < start + length; i++) { ... }

In Java it's left to the programmer to worry (or not) about what happens if start + length overflows. By defining a segment in terms of a Range we're forced to take this decision for the programmer, and there's no right answer. If we had a Segment class we could let the arithmetic overflow and get {maxInteger, minInteger} which is at least consistent with the 2s-complement behaviour of Integer arithmetic on the JVM, consistent with the length of the segment being the same as the size of the sequence and we get to leave it to the programmer to worry (or not) about the possibility of overflow.

@tombentley
Copy link
Member

@gavinking what do you think we should do for this problem with overflow?

@tombentley
Copy link
Member

Reopening to get @gavinking's attention and opinion about this overflow problem.

@tombentley tombentley reopened this Dec 4, 2012
@gavinking
Copy link
Member Author

@tombentley if you have a new related problem, please open a new issue for it. I don't want to discuss it here.

@gavinking
Copy link
Member Author

@tombentley P.S. That would be an issue for ceylon.language, of course. I resent the leakage of language module issues into the issue tracker for ceylon-spec ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants