Skip to content

Discussion: Range operator #198

HaloFour asked this question in General
Discussion: Range operator #198
3y ago · 83 answers

[Editor's note: this is a discussion regarding championed proposal #185. -@gafter]

Summary

I propose a general purpose range operator which can be used to obtain a range a integral values. This range can then be used in loops, patterns or slices:

var slice = source[5..10];
foreach (var i in 5..10) {
    Console.WriteLine(i);
}

Questions

  1. Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
    var range = 5..10;  // int[]?  IEnumerable<int>?  IReadOnlyList<int>?  Range<int>?  None of the above?
  2. What exact syntax should be supported? There are numerous examples found in other languages.
    var perl = source[5..10];
    var swift = source[5...10];
    var basic = source[5 to 10];
    var python = source[5:10];
  3. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
    var range1 = 5...10;  // 5, 6, 7, 8, 9, 10
    var swift = 5..<10;  // 5, 6, 7, 8, 9
    var perl = 5^..10; // 6, 7, 8, 9, 10
  4. Should the syntax include the ability to specify a step or a stride?
    var stride = 5:10:2; // 5, 7, 9
  5. Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
    var source = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var slice1 = source[-3:-1]; // { 8, 9, 10 }
  6. Should the range syntax be supported in pattern matching against integral (or comparable) types?
    int value = CalculateSomeValue();
    if (value is 10..50) {
        Console.WriteLine("The value is somewhere between 10 and 50!");
    }
    Point p = CalculateSomePoint();
    switch (p) {
        case (x, -25:-25):
            Console.WriteLine("The point is near the central axis!");
            break;
    }
  7. Should variables be permitted for boundaries of the range?
    var slice = source[start...end];
  8. Should the syntax allow for open-ended ranges, e.g. omitting either the start or the end boundary which indicates that the range would extend from the start or end of an enumerable? Can such a range be assigned to a variable or enumerated?
    var beginning = source[:10]; // grabs 0-10
    var end = source[10:]; // grabs 10-last
  9. What happens if the start value is greater than the end value? Should that be different if the values are literals v. variables?
    var source = new[] { 1, 2, 3, 4, 5 };
    var slice1 = source[4..2];  // compiler error? empty array? { 4, 3, 2 }?
    var start = 4;
    var end = 2;
    var slice2 = source[start..end]; // ?

Updates:

  1. Fixed foreach loop example
  2. Added questions 7 and 8.
  3. Added question 9 from #198 (comment)

Replies

83 comments

In my opinion:

  1. Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
    Yes, Range<T> : IEnumerable<T>

  2. What exact syntax should be supported? There are numerous examples found in other languages.
    e1..e2

  3. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
    Inclusive. ..< can be the exclusive operator.

  4. Should the syntax include the ability to specify a step or a stride?
    No.

  5. Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
    Yes, if negative indexing is added in general.

  6. Should the range syntax be supported in pattern matching against integral (or comparable) types?
    Yes.

And my own suggestion: range operator should be applicable to all types that support IComparable<T> (or SComparable<T>) and ++/-- operators (SSuccessive<T>?).

0 replies

@svick svick 3y ago
Collaborator

Previous discussion: dotnet/roslyn#7632.

for (var i in 5..10)

Should this have been foreach?

0 replies

Taking a page from Rust, there could be also a syntax for "open" ranges, ..5, ,5.., etc.

0 replies

There is very nice notation in mathematics for open and closed intervals.

[3..5) 
[6, 12]
(8, 12)
(9..11]

Unfortunately square and round brackets are not suitable for C# syntax.

0 replies

@svick svick 3y ago
Collaborator

@alrz What happens when you iterate left-open range? If I understand it correctly, in Rust, both kinds of open ranges are separate types and left-open range is not iterable.

0 replies

@svick I think so. That's particularly useful in slices and patterns.

0 replies

@svick

Yes, thanks for catching that. Will fix.

@orthoxerox

I tend to agree. What's more this Range struct can implement the enumerable pattern with a struct enumerator so that enumerating it should be more efficient. The compiler could take that a step further and unroll loops, turning an iterator variable into a constant.

0 replies

Do not think this is worthy feature. There is low demand for fixed integer ranges, and also for other types. Also, expanding such synetax for even less popular cases, is even more unjustified. Simply, too much resources for too small problems.

0 replies

@vbcodec

A range operator of sorts is likely given the championed proposal to add slicing to the language. Given such it is worthwhile to explore the syntax and capabilities of that operator both inside and outside of the context of slices.

Fixed ranges of integers are still pretty common, and using a range operator to describe them makes sense. Apple even completely removed the classic C for loop from their Swift language in favor of enumerating over a range.

0 replies

For the case of variables (7), what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

0 replies

@vladd

Good question. I added it to the list of questions above in the discussion.

0 replies

I would argue that we don't need it to be operator. Just extension method is enough

public static class Ext
{
    public IEnumerable<int> To(this int start,int end)
    {
        var sign = Math.Sign(end - start);
        while(start != end)
        {
             yield return start;
             start += sign;
        }

        yield return start;
    }
}

foreach(var i in 3.To(5)) // (3..5)
{
}

(ported from previous issue)

Also we could easily make similar function for exclusive

3.ExTo(5) // [3..5)
3.ToEx(5) // (3..5]
3.Until(5) // [3..5]

0 replies

what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

An empty range, and a warning if it's known at the compile time i.e. via constant bounds.

0 replies

@alrz @vladd Why shouldn't we iterate in reverse order?

0 replies

Because the default step is 1, and in slices it doesn't make sense.

0 replies

@jcouv The operator >= is not the expected compliment to < (to the average user, yes) but weirdly on a language implementation details level it's >.

0 replies

@jcouv jcouv 3y ago
Collaborator

@ufcpp Fantastic data points. Thanks! Glad to see the binary search approach does work. The code is harder to read than I expected though. I feel like code generation is probably a good fit in case the switch needs to be updated (I suspect that's what you did to test it out).

Yes, such optimizations are more suitable for roslyn repo. Please do file an issue with the experiments you collected.
Another data point that would be awesome is the typical size of switches (how often do people need massive switches like the one you describe?). I'll ask around, as I think we can analyze github repos as examples.

As far as planning, there is a significant rework for patterns (to support recursive patterns) which is starting. Optimizations based on completeness and exclusivity would probably come after that.

0 replies

@AdamSpeight2008

build script

Debug (/o-) build:

Length Name
74240 BinarySearchIf.dll
54272 LinearIf.dll
140800 Switch.dll
48128 SwitchWhen.dll

Release (/o+) build:

Length Name
31232 BinarySearchIf.dll
24064 LinearIf.dll
140800 Switch.dll
38912 SwitchWhen.dll
0 replies

@jcouv
I'll post an issue to Roslyn in a few days.

Yes, these codes are code-generated. The generator is here: https://github.com/ufcpp/GraphemeSplitter/blob/master/GraphemeBreakPropertyCodeGenerator/Program.cs

0 replies

In addition, it requires 50 seconds to compile the 20 thousand-line switch-case. (My PC is Core i7-4790 3.60GHz.)

0 replies

My vote... I pretty much agree with @orthoxerox.

1.Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
Yes, Range : IEnumerable

Agree: Let's not reuse the LINQ.Range. This is limiting the range ;) of application of the operator.
And I fear performance impact on the length calculation:
a..b --> new Range(a, b - a + 1)

2.What exact syntax should be supported? There are numerous examples found in other languages.
e1..e2

Agree again: Perl-style is clear and doesn't conflict with any existing C# notation.

3.Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
Inclusive. ..< can be the exclusive operator.

Partly agree: I'm not too enthousiastic about the exclusive operator. Maybe again something hard to handle in optimization.

4.Should the syntax include the ability to specify a step or a stride?
No.

Agree: That makes the syntax more fishy...but we can make this Range constructor accept a step or stride:
var even = new Range(0, 10, 2);

5.Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
Yes, if negative indexing is added in general.

This time: Not for me. Again a performance issue. In case the range is not constant, we must check if the boundaries values are positive, yield perfomance impact.

6.Should the range syntax be supported in pattern matching against integral (or comparable) types?
Yes.

This syntax would be nice:

// exit the awkward C-stype for-loop:
foreach (int i in 0..10) { ... }

// Let's keep it consistent in if:
if (myvar in 0..10) { ... }

// Let make it in switch/case:
switch (myvar) {
    case 0..10: {...}
        break;
}

7.Should variables be permitted for boundaries of the range?

Yes. definitly

8.Should the syntax allow for open-ended ranges

Would be nice. Although I don't think we could map it to the same Range type then as defined as in 1. (performance, again).

9.What happens if the start value is greater than the end value?

This should result in an empty range. Again, this choice is driven for a performance concern.
I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Why do I care so much about performance ? I really want the lazy/readable/sweet code to be as efficient as the verbose ancestral way to do it. Otherwise... It will just be a feature for dummies...

IMHO, I think the ultimate goal of this is to be able to write such a code, with the help of Span and this[Range] overload on ICollection, wihtout penalty on performance:

var myvar = new mytype[10];
...
myvar[2..4] = othervar[1..3];
0 replies

@Starwer

I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Should this be true even for end == int.MaxValue? That is, should

foreach (int i in (int.MaxValue-1)..int.MaxValue) { ... }

be an infinite loop?

0 replies

@vladd: I agree it's not ideal, but I still prefer that solution than adding a boundary-safe condition that would grow with a O(n) complexity with the loop, just in case we hit the int.MaxValue.

We already accept that the arithmetic around the corners requires user-awareness for the even simpler expressions like: var i = int.MaxValue; i++;
I consider this is still a scarce corner case, compared to the empty-range which is a frequent case.

You seem to have this awareness. So honestly, if you would know that the use of foreach/Range is more expensive than a simple for-loop because it is safer, would you ever use it ?

0 replies
  1. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?

The best way is to have upper bound exclusive.
examples:
exclusive

var a = new int[] {/*1,2,3,4,5*/}
var len = a.Length
var al = a[ .. len/2]
var ah = a[ len/2 .. ] // al.Concat(ah) = a
var aWithoutLast = a [ .. len-1 ]

inclusive (you need to substract 1 all the time)

var a = new int[] {  }
var len = a.Length
var al = a[ .. len/2-1 ]
var ah = a[ len/2 .. len-1 ] // al.Concat(ah) = a
var aWithoutLast = a [..len-2]
0 replies

Regarding using ranges in patterns, I think I'd prefer an explicit 'in' pattern:

var input = 9;
switch (input) {
    case in 0..10:
        // code
}

All our other patterns (const pattern, type pattern, var pattern) are all things which are the same type as the input, and we're doing an 'is' check on those. Using a bare range makes it look superficially like a const pattern, but it acts like a completely new type of pattern (doing an 'in' check instead of an 'is' check).

(I think this would also be clearer if we ever wanted to extend the switch pattern in future, to use any (non-const) collection object, instead of just a range literal:

switch (userId) {
    case in currentlyAssignedIds:
        // code

whereas I think just case currentlyAssignedIds: would require a bit extra mental overhead to interpret correctly.)

Also, using a bare range could make switching on a range confusing:

var range = 0..10;
switch (range) {
    case 0..10:
        // code
}

since there presumably the case pattern should be interpreted as a const pattern, rather than a range pattern?

Seeing how patterns can be used in if statements too, this would give us if (input is in 0..10), rather than if (input in 0..10). This is slightly more verbose, but only by three characters, and to me it feels a bit more consistent with the fact that all other pattern-based if statements use 'is'.

0 replies

I understand the benefits that go with the upper bound being exclusive but the syntax throws me off every time. Expanding, e.g, 3..6 in my head leads to 3, 4, 5, 6.

0 replies

I'll never understand the decision to make the range exclusive of the upper bound... I mean, the off-by-one issue is the very reason for introducing a "from end" index.

I think it's a pretty narrow view to look at ranges as applying to slicing only (which, let's be honest, is the only place where exclusive ranges make sense).

What other language out there has only exclusive ranges?

0 replies

@Richiban

My understanding is that the x:y range syntax will be more or less specific to slicing. There is a separate proposal for range/comparison patterns which will use a different syntax and will be inclusive.

0 replies

@HaloFour If it would be a language feature, especially syntax, then it should be generalize and synergize with similar purpose. Or at least the same pattern. So we would not have similar feature with very difference syntax

0 replies

I mean, the off-by-one issue is the very reason for introducing a "from end" index.

And they still messed this up, IMO. 🙄

0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
#️⃣
General
Converted from issue
Beta
You can’t perform that action at this time.