New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Range operator #198

Open
HaloFour opened this Issue Feb 27, 2017 · 78 comments

Comments

Projects
None yet
@HaloFour
Contributor

HaloFour commented Feb 27, 2017

[Editor's note: this is a discussion regarding championed proposal #185. -@gafter]

Summary

I propose a general purpose range operator which can be used to obtain a range a integral values. This range can then be used in loops, patterns or slices:

var slice = source[5..10];
foreach (var i in 5..10) {
    Console.WriteLine(i);
}

Questions

  1. Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
    var range = 5..10;  // int[]?  IEnumerable<int>?  IReadOnlyList<int>?  Range<int>?  None of the above?
  2. What exact syntax should be supported? There are numerous examples found in other languages.
    var perl = source[5..10];
    var swift = source[5...10];
    var basic = source[5 to 10];
    var python = source[5:10];
  3. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
    var range1 = 5...10;  // 5, 6, 7, 8, 9, 10
    var swift = 5..<10;  // 5, 6, 7, 8, 9
    var perl = 5^..10; // 6, 7, 8, 9, 10
  4. Should the syntax include the ability to specify a step or a stride?
    var stride = 5:10:2; // 5, 7, 9
  5. Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
    var source = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var slice1 = source[-3:-1]; // { 8, 9, 10 }
  6. Should the range syntax be supported in pattern matching against integral (or comparable) types?
    int value = CalculateSomeValue();
    if (value is 10..50) {
        Console.WriteLine("The value is somewhere between 10 and 50!");
    }
    Point p = CalculateSomePoint();
    switch (p) {
        case (x, -25:-25):
            Console.WriteLine("The point is near the central axis!");
            break;
    }
  7. Should variables be permitted for boundaries of the range?
    var slice = source[start...end];
  8. Should the syntax allow for open-ended ranges, e.g. omitting either the start or the end boundary which indicates that the range would extend from the start or end of an enumerable? Can such a range be assigned to a variable or enumerated?
    var beginning = source[:10]; // grabs 0-10
    var end = source[10:]; // grabs 10-last
  9. What happens if the start value is greater than the end value? Should that be different if the values are literals v. variables?
    var source = new[] { 1, 2, 3, 4, 5 };
    var slice1 = source[4..2];  // compiler error? empty array? { 4, 3, 2 }?
    var start = 4;
    var end = 2;
    var slice2 = source[start..end]; // ?

Updates:

  1. Fixed foreach loop example
  2. Added questions 7 and 8.
  3. Added question 9 from #198 (comment)

@HaloFour HaloFour referenced this issue Feb 27, 2017

Open

Champion "slicing" / Range #185

1 of 5 tasks complete
@orthoxerox

This comment has been minimized.

Show comment
Hide comment
@orthoxerox

orthoxerox Feb 27, 2017

In my opinion:

  1. Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
    Yes, Range<T> : IEnumerable<T>

  2. What exact syntax should be supported? There are numerous examples found in other languages.
    e1..e2

  3. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
    Inclusive. ..< can be the exclusive operator.

  4. Should the syntax include the ability to specify a step or a stride?
    No.

  5. Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
    Yes, if negative indexing is added in general.

  6. Should the range syntax be supported in pattern matching against integral (or comparable) types?
    Yes.

And my own suggestion: range operator should be applicable to all types that support IComparable<T> (or SComparable<T>) and ++/-- operators (SSuccessive<T>?).

orthoxerox commented Feb 27, 2017

In my opinion:

  1. Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
    Yes, Range<T> : IEnumerable<T>

  2. What exact syntax should be supported? There are numerous examples found in other languages.
    e1..e2

  3. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
    Inclusive. ..< can be the exclusive operator.

  4. Should the syntax include the ability to specify a step or a stride?
    No.

  5. Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
    Yes, if negative indexing is added in general.

  6. Should the range syntax be supported in pattern matching against integral (or comparable) types?
    Yes.

And my own suggestion: range operator should be applicable to all types that support IComparable<T> (or SComparable<T>) and ++/-- operators (SSuccessive<T>?).

@svick

This comment has been minimized.

Show comment
Hide comment
@svick

svick Feb 27, 2017

Contributor

Previous discussion: dotnet/roslyn#7632.

for (var i in 5..10)

Should this have been foreach?

Contributor

svick commented Feb 27, 2017

Previous discussion: dotnet/roslyn#7632.

for (var i in 5..10)

Should this have been foreach?

@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 27, 2017

Contributor

Taking a page from Rust, there could be also a syntax for "open" ranges, ..5, ,5.., etc.

Contributor

alrz commented Feb 27, 2017

Taking a page from Rust, there could be also a syntax for "open" ranges, ..5, ,5.., etc.

@gafter gafter added the Discussion label Feb 27, 2017

@gordanr

This comment has been minimized.

Show comment
Hide comment
@gordanr

gordanr Feb 27, 2017

There is very nice notation in mathematics for open and closed intervals.

[3..5) 
[6, 12]
(8, 12)
(9..11]

Unfortunately square and round brackets are not suitable for C# syntax.

gordanr commented Feb 27, 2017

There is very nice notation in mathematics for open and closed intervals.

[3..5) 
[6, 12]
(8, 12)
(9..11]

Unfortunately square and round brackets are not suitable for C# syntax.

@svick

This comment has been minimized.

Show comment
Hide comment
@svick

svick Feb 27, 2017

Contributor

@alrz What happens when you iterate left-open range? If I understand it correctly, in Rust, both kinds of open ranges are separate types and left-open range is not iterable.

Contributor

svick commented Feb 27, 2017

@alrz What happens when you iterate left-open range? If I understand it correctly, in Rust, both kinds of open ranges are separate types and left-open range is not iterable.

@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 27, 2017

Contributor

@svick I think so. That's particularly useful in slices and patterns.

Contributor

alrz commented Feb 27, 2017

@svick I think so. That's particularly useful in slices and patterns.

@HaloFour

This comment has been minimized.

Show comment
Hide comment
@HaloFour

HaloFour Feb 27, 2017

Contributor

@svick

Yes, thanks for catching that. Will fix.

@orthoxerox

I tend to agree. What's more this Range struct can implement the enumerable pattern with a struct enumerator so that enumerating it should be more efficient. The compiler could take that a step further and unroll loops, turning an iterator variable into a constant.

Contributor

HaloFour commented Feb 27, 2017

@svick

Yes, thanks for catching that. Will fix.

@orthoxerox

I tend to agree. What's more this Range struct can implement the enumerable pattern with a struct enumerator so that enumerating it should be more efficient. The compiler could take that a step further and unroll loops, turning an iterator variable into a constant.

@vbcodec

This comment has been minimized.

Show comment
Hide comment
@vbcodec

vbcodec Feb 27, 2017

Do not think this is worthy feature. There is low demand for fixed integer ranges, and also for other types. Also, expanding such synetax for even less popular cases, is even more unjustified. Simply, too much resources for too small problems.

vbcodec commented Feb 27, 2017

Do not think this is worthy feature. There is low demand for fixed integer ranges, and also for other types. Also, expanding such synetax for even less popular cases, is even more unjustified. Simply, too much resources for too small problems.

@HaloFour

This comment has been minimized.

Show comment
Hide comment
@HaloFour

HaloFour Feb 27, 2017

Contributor

@vbcodec

A range operator of sorts is likely given the championed proposal to add slicing to the language. Given such it is worthwhile to explore the syntax and capabilities of that operator both inside and outside of the context of slices.

Fixed ranges of integers are still pretty common, and using a range operator to describe them makes sense. Apple even completely removed the classic C for loop from their Swift language in favor of enumerating over a range.

Contributor

HaloFour commented Feb 27, 2017

@vbcodec

A range operator of sorts is likely given the championed proposal to add slicing to the language. Given such it is worthwhile to explore the syntax and capabilities of that operator both inside and outside of the context of slices.

Fixed ranges of integers are still pretty common, and using a range operator to describe them makes sense. Apple even completely removed the classic C for loop from their Swift language in favor of enumerating over a range.

@vladd

This comment has been minimized.

Show comment
Hide comment
@vladd

vladd Feb 27, 2017

For the case of variables (7), what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

vladd commented Feb 27, 2017

For the case of variables (7), what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

@HaloFour

This comment has been minimized.

Show comment
Hide comment
@HaloFour

HaloFour Feb 28, 2017

Contributor

@vladd

Good question. I added it to the list of questions above in the discussion.

Contributor

HaloFour commented Feb 28, 2017

@vladd

Good question. I added it to the list of questions above in the discussion.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

I would argue that we don't need it to be operator. Just extension method is enough

public static class Ext
{
    public IEnumerable<int> To(this int start,int end)
    {
        var sign = Math.Sign(end - start);
        while(start != end)
        {
             yield return start;
             start += sign;
        }

        yield return start;
    }
}

foreach(var i in 3.To(5)) // (3..5)
{
}

(ported from previous issue)

Also we could easily make similar function for exclusive

3.ExTo(5) // [3..5)
3.ToEx(5) // (3..5]
3.Until(5) // [3..5]

Thaina commented Feb 28, 2017

I would argue that we don't need it to be operator. Just extension method is enough

public static class Ext
{
    public IEnumerable<int> To(this int start,int end)
    {
        var sign = Math.Sign(end - start);
        while(start != end)
        {
             yield return start;
             start += sign;
        }

        yield return start;
    }
}

foreach(var i in 3.To(5)) // (3..5)
{
}

(ported from previous issue)

Also we could easily make similar function for exclusive

3.ExTo(5) // [3..5)
3.ToEx(5) // (3..5]
3.Until(5) // [3..5]

@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 28, 2017

Contributor

what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

An empty range, and a warning if it's known at the compile time i.e. via constant bounds.

Contributor

alrz commented Feb 28, 2017

what should happen if begin is greater than end? Should this silently produce an empty enumeration or throw an exception?

An empty range, and a warning if it's known at the compile time i.e. via constant bounds.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

@alrz @vladd Why shouldn't we iterate in reverse order?

Thaina commented Feb 28, 2017

@alrz @vladd Why shouldn't we iterate in reverse order?

@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 28, 2017

Contributor

Because the default step is 1, and in slices it doesn't make sense.

Contributor

alrz commented Feb 28, 2017

Because the default step is 1, and in slices it doesn't make sense.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

@alrz I don't feel it weird to step 1 in any direction

5 to 3 mean 5 4 3 is also 1 step

Thaina commented Feb 28, 2017

@alrz I don't feel it weird to step 1 in any direction

5 to 3 mean 5 4 3 is also 1 step

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Feb 28, 2017

There are other types that are IComparable<T>

var atoz = 'a'..'z' // char
var daysinyear = JanFirst..DecThirtyFirst // DateTime

AdamSpeight2008 commented Feb 28, 2017

There are other types that are IComparable<T>

var atoz = 'a'..'z' // char
var daysinyear = JanFirst..DecThirtyFirst // DateTime
@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

@AdamSpeight2008 Actually it should work for all integer type. char included. But not for all IComparable<T> because float would too ambiguous

What would 0f.To(0.5f) will do?

Thaina commented Feb 28, 2017

@AdamSpeight2008 Actually it should work for all integer type. char included. But not for all IComparable<T> because float would too ambiguous

What would 0f.To(0.5f) will do?

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Feb 28, 2017

What would 0f.To(0.5f) will do?

yield 0f then done.

AdamSpeight2008 commented Feb 28, 2017

What would 0f.To(0.5f) will do?

yield 0f then done.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

I'm quite sure that if that is inclusive range we would want both 0 and 0.5f

Thaina commented Feb 28, 2017

I'm quite sure that if that is inclusive range we would want both 0 and 0.5f

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Feb 28, 2017

See VB LINQ Range for other concerning issues.

AdamSpeight2008 commented Feb 28, 2017

See VB LINQ Range for other concerning issues.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

@alrz Also reverse slice is not uncommon I think. Sometimes I made round robin algorithm it involve slice first half match with second half in backward

Normally I use LINQ

var users = new User[14];

var left = users.Take(7);
var right = users.Skip(7).Reverse();
var matches = left.Zip(right);

///Actually I do one line
var matches = users.Take(7).Zip(users.Skip(7).Reverse());

If it use slice

var users = new User[14];

var left = users.Slice(0.To(7));
var right = users.Slice(13.To(8)); // users index 0=13,1=12,2=11,3=10,4=9,5=8,6=7
var matches = left.Zip(right);

///
var matches = users[0.To(7)].Zip(users[13.To(8)]);

Thaina commented Feb 28, 2017

@alrz Also reverse slice is not uncommon I think. Sometimes I made round robin algorithm it involve slice first half match with second half in backward

Normally I use LINQ

var users = new User[14];

var left = users.Take(7);
var right = users.Skip(7).Reverse();
var matches = left.Zip(right);

///Actually I do one line
var matches = users.Take(7).Zip(users.Skip(7).Reverse());

If it use slice

var users = new User[14];

var left = users.Slice(0.To(7));
var right = users.Slice(13.To(8)); // users index 0=13,1=12,2=11,3=10,4=9,5=8,6=7
var matches = left.Zip(right);

///
var matches = users[0.To(7)].Zip(users[13.To(8)]);
@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 28, 2017

Contributor

I didn't say reverse range is not useful, but it should not go backwards if left-bound happen to be grather. It should be explicit with a -1 step value.

Contributor

alrz commented Feb 28, 2017

I didn't say reverse range is not useful, but it should not go backwards if left-bound happen to be grather. It should be explicit with a -1 step value.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Feb 28, 2017

@alrz Maybe we would look at what mathematicians do

If they write something like 9..3 are they assume negative step?

partial sum sum (i=9->1) i / 3 something like that?

Thaina commented Feb 28, 2017

@alrz Maybe we would look at what mathematicians do

If they write something like 9..3 are they assume negative step?

partial sum sum (i=9->1) i / 3 something like that?

@alrz

This comment has been minimized.

Show comment
Hide comment
@alrz

alrz Feb 28, 2017

Contributor

When your range bounds are not constant, it will cause bugs and you'd have no idea what's going on e.g. left..right should this be conditionally go backwards?

PS: In programming there is no such thing as "assumption" unless you Assert.

Contributor

alrz commented Feb 28, 2017

When your range bounds are not constant, it will cause bugs and you'd have no idea what's going on e.g. left..right should this be conditionally go backwards?

PS: In programming there is no such thing as "assumption" unless you Assert.

@GeirGrusom

This comment has been minimized.

Show comment
Hide comment
@GeirGrusom

GeirGrusom Feb 28, 2017

@eyalsk

Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?

Should be exclusive and have an inclusive operator.

Why? Why not just pick one and be done with it? Exclusive range: YAGNI.

GeirGrusom commented Feb 28, 2017

@eyalsk

Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?

Should be exclusive and have an inclusive operator.

Why? Why not just pick one and be done with it? Exclusive range: YAGNI.

@eyalsk

This comment has been minimized.

Show comment
Hide comment
@eyalsk

eyalsk Feb 28, 2017

Contributor

@GeirGrusom Sure, it can probably be added later iff it's necessary.

Contributor

eyalsk commented Feb 28, 2017

@GeirGrusom Sure, it can probably be added later iff it's necessary.

@vladd

This comment has been minimized.

Show comment
Hide comment
@vladd

vladd Feb 28, 2017

@GeirGrusom

Exclusive range: YAGNI.

Well, if both ends are included, then the simple iterating on the whole array would need explicit subtracting one from the length: [0..arr.length-1].
And given that having end < start might throw (we didn't reach the decision on this yet), inclusive range might need explicit checks for the case when length == 0. This would make the feature cumbersome to use.

vladd commented Feb 28, 2017

@GeirGrusom

Exclusive range: YAGNI.

Well, if both ends are included, then the simple iterating on the whole array would need explicit subtracting one from the length: [0..arr.length-1].
And given that having end < start might throw (we didn't reach the decision on this yet), inclusive range might need explicit checks for the case when length == 0. This would make the feature cumbersome to use.

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Jun 7, 2017

In VB For i = 0 To 10 is equivalent to For i = 0 To 10 Step 1'. a step value of one (1) is likely what most want. If they want something other the default they include it For i = 0 To 10 Step 0.1'.

AdamSpeight2008 commented Jun 7, 2017

In VB For i = 0 To 10 is equivalent to For i = 0 To 10 Step 1'. a step value of one (1) is likely what most want. If they want something other the default they include it For i = 0 To 10 Step 0.1'.

@quinmars

This comment has been minimized.

Show comment
Hide comment
@quinmars

quinmars Jun 13, 2017

I'd prefer if the .. syntax produce an interval of the value type Range<T> where T : IComparable<T>. In general an interval is inumerable. For special types like int there could be an extension method GetEnumerator(). Other types such as float would have explicit extension methods to generate an IEnumerable<T>, types like string wouldn't have any at all:

foreach (var i in 5..10)
{
    // do something
}

foreach (var f in 1.0 .. 3.14 .StepWidth(0.3))
{
    // do something
}

An interval can be very useful in many cases. Operations such as IntersectsWith, IntersectionWith could be useful for non numeric datatypes like 2D or 3D Points maybe even for strings:

var r = "A" .. "Henry" .IntersectionWith("Anton" .. "Z"); // r == "Anton" .. "Henry";

In my opinion, it would be a mistake to reduce ranges to enumerable sequences.

quinmars commented Jun 13, 2017

I'd prefer if the .. syntax produce an interval of the value type Range<T> where T : IComparable<T>. In general an interval is inumerable. For special types like int there could be an extension method GetEnumerator(). Other types such as float would have explicit extension methods to generate an IEnumerable<T>, types like string wouldn't have any at all:

foreach (var i in 5..10)
{
    // do something
}

foreach (var f in 1.0 .. 3.14 .StepWidth(0.3))
{
    // do something
}

An interval can be very useful in many cases. Operations such as IntersectsWith, IntersectionWith could be useful for non numeric datatypes like 2D or 3D Points maybe even for strings:

var r = "A" .. "Henry" .IntersectionWith("Anton" .. "Z"); // r == "Anton" .. "Henry";

In my opinion, it would be a mistake to reduce ranges to enumerable sequences.

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Jun 13, 2017

Really, I think we should satisfied with just extension method

Span<int> oneToTen = 1.To(10);
Span<float> floatStep = 0.1f.To(3.14f,0.3f);

Thaina commented Jun 13, 2017

Really, I think we should satisfied with just extension method

Span<int> oneToTen = 1.To(10);
Span<float> floatStep = 0.1f.To(3.14f,0.3f);
@Richiban

This comment has been minimized.

Show comment
Hide comment
@Richiban

Richiban Oct 9, 2017

@Thaina I'm not sure that's going to work too well, since you'd have to create loads of extension methods (or create an extension method on object or T which is not a good idea for the BCL).

Introducing a new struct Range<T> to the BCL with special syntax support (a..b => new Range<_>(a, b)) is very reasonable. Then changes & upgrades can be made to the Range class (such as when Shapes come along) and no one will have to change any code.

The new Range type makes it easy to write your own type that accepts a range (e.g. in an indexer).

Richiban commented Oct 9, 2017

@Thaina I'm not sure that's going to work too well, since you'd have to create loads of extension methods (or create an extension method on object or T which is not a good idea for the BCL).

Introducing a new struct Range<T> to the BCL with special syntax support (a..b => new Range<_>(a, b)) is very reasonable. Then changes & upgrades can be made to the Range class (such as when Shapes come along) and no one will have to change any code.

The new Range type makes it easy to write your own type that accepts a range (e.g. in an indexer).

@ufcpp

This comment has been minimized.

Show comment
Hide comment
@ufcpp

ufcpp Oct 29, 2017

I want "range pattern" especial in case statements.

switch (x)
{
    case 1..2:
   // ...
}

I recently implemented a Unicode grapheme breaking algorithm. In this library, a switch statement generated from GraphemeBreakProperty table has about 20 thousand lines.
A if statement version of that is much shorter but 25 times slower than the switch.

If C# have "range pattern", this switch becomes much shorter.

// without range
            switch (codePoint)
            {
                case 1536:
                case 1537:
                case 1538:
                case 1539:
                case 1540:
                case 1541:
                // ...

// with range
            switch (codePoint)
            {
                case 1536..1541:
                // ...

ufcpp commented Oct 29, 2017

I want "range pattern" especial in case statements.

switch (x)
{
    case 1..2:
   // ...
}

I recently implemented a Unicode grapheme breaking algorithm. In this library, a switch statement generated from GraphemeBreakProperty table has about 20 thousand lines.
A if statement version of that is much shorter but 25 times slower than the switch.

If C# have "range pattern", this switch becomes much shorter.

// without range
            switch (codePoint)
            {
                case 1536:
                case 1537:
                case 1538:
                case 1539:
                case 1540:
                case 1541:
                // ...

// with range
            switch (codePoint)
            {
                case 1536..1541:
                // ...
@jcouv

This comment has been minimized.

Show comment
Hide comment
@jcouv

jcouv Oct 30, 2017

Member

@ufcpp This code can already be simplified with when filters:

        switch (x)
        {
            case int x1 when x1 >= 0 && x1 < 10:
                break;
            case int x2 when x2 >=10 && x2 < 20:
                break;
        }
Member

jcouv commented Oct 30, 2017

@ufcpp This code can already be simplified with when filters:

        switch (x)
        {
            case int x1 when x1 >= 0 && x1 < 10:
                break;
            case int x2 when x2 >=10 && x2 < 20:
                break;
        }
@orthoxerox

This comment has been minimized.

Show comment
Hide comment
@orthoxerox

orthoxerox Oct 30, 2017

@jcouv but will it be compiled into a switch jump table?

orthoxerox commented Oct 30, 2017

@jcouv but will it be compiled into a switch jump table?

@jcouv

This comment has been minimized.

Show comment
Hide comment
@jcouv

jcouv Oct 30, 2017

Member

@orthoxerox I don't think that is a language design concern (the language spec does not specify such things), but rather an implementation concern.

You can see the current decompiled for using the when filters here.
It is straightforward (and probably not the most performance for @ufcpp's case). It could be optimized further in the future when the binding for switch learns that x < 10 and x >= 10 are exclusive.

I had missed the benchmark for a linear if version from @ufcpp. For that scenario, manually implementing a binary-search if version would be faster than a linear if and while less verbose than fully expanded switch.

Member

jcouv commented Oct 30, 2017

@orthoxerox I don't think that is a language design concern (the language spec does not specify such things), but rather an implementation concern.

You can see the current decompiled for using the when filters here.
It is straightforward (and probably not the most performance for @ufcpp's case). It could be optimized further in the future when the binding for switch learns that x < 10 and x >= 10 are exclusive.

I had missed the benchmark for a linear if version from @ufcpp. For that scenario, manually implementing a binary-search if version would be faster than a linear if and while less verbose than fully expanded switch.

@ufcpp

This comment has been minimized.

Show comment
Hide comment
@ufcpp

ufcpp Oct 31, 2017

@jcouv
I've tried it.

Method Mean Error StdDev
ByBinarySearch 4.642 ms 0.0468 ms 0.0415 ms
BySwitch 3.825 ms 0.0473 ms 0.0442 ms
BySwitchWhen 115.310 ms 2.7970 ms 2.9928 ms
ByIf 142.658 ms 1.3744 ms 1.1477 ms
ByBinaryIf 2.502 ms 0.0147 ms 0.0137 ms

As you said, I should use a binary-search.
Any way, do you have any plant to optimize this kind of case when clause? (Is this roslyn's matter rather than cshaplang's?)

ufcpp commented Oct 31, 2017

@jcouv
I've tried it.

Method Mean Error StdDev
ByBinarySearch 4.642 ms 0.0468 ms 0.0415 ms
BySwitch 3.825 ms 0.0473 ms 0.0442 ms
BySwitchWhen 115.310 ms 2.7970 ms 2.9928 ms
ByIf 142.658 ms 1.3744 ms 1.1477 ms
ByBinaryIf 2.502 ms 0.0147 ms 0.0137 ms

As you said, I should use a binary-search.
Any way, do you have any plant to optimize this kind of case when clause? (Is this roslyn's matter rather than cshaplang's?)

@Thaina

This comment has been minimized.

Show comment
Hide comment
@Thaina

Thaina Oct 31, 2017

Great experiment! I wish in the future the compiler would convert all switch to if-else-if

Thaina commented Oct 31, 2017

Great experiment! I wish in the future the compiler would convert all switch to if-else-if

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Oct 31, 2017

@ufcpp Could you also get the IL size for each of those implementations?

AdamSpeight2008 commented Oct 31, 2017

@ufcpp Could you also get the IL size for each of those implementations?

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Oct 31, 2017

@jcouv The operator >= is not the expected compliment to < (to the average user, yes) but weirdly on a language implementation details level it's >.

AdamSpeight2008 commented Oct 31, 2017

@jcouv The operator >= is not the expected compliment to < (to the average user, yes) but weirdly on a language implementation details level it's >.

@jcouv

This comment has been minimized.

Show comment
Hide comment
@jcouv

jcouv Oct 31, 2017

Member

@ufcpp Fantastic data points. Thanks! Glad to see the binary search approach does work. The code is harder to read than I expected though. I feel like code generation is probably a good fit in case the switch needs to be updated (I suspect that's what you did to test it out).

Yes, such optimizations are more suitable for roslyn repo. Please do file an issue with the experiments you collected.
Another data point that would be awesome is the typical size of switches (how often do people need massive switches like the one you describe?). I'll ask around, as I think we can analyze github repos as examples.

As far as planning, there is a significant rework for patterns (to support recursive patterns) which is starting. Optimizations based on completeness and exclusivity would probably come after that.

Member

jcouv commented Oct 31, 2017

@ufcpp Fantastic data points. Thanks! Glad to see the binary search approach does work. The code is harder to read than I expected though. I feel like code generation is probably a good fit in case the switch needs to be updated (I suspect that's what you did to test it out).

Yes, such optimizations are more suitable for roslyn repo. Please do file an issue with the experiments you collected.
Another data point that would be awesome is the typical size of switches (how often do people need massive switches like the one you describe?). I'll ask around, as I think we can analyze github repos as examples.

As far as planning, there is a significant rework for patterns (to support recursive patterns) which is starting. Optimizations based on completeness and exclusivity would probably come after that.

@ufcpp

This comment has been minimized.

Show comment
Hide comment
@ufcpp

ufcpp Oct 31, 2017

@AdamSpeight2008

build script

Debug (/o-) build:

Length Name
74240 BinarySearchIf.dll
54272 LinearIf.dll
140800 Switch.dll
48128 SwitchWhen.dll

Release (/o+) build:

Length Name
31232 BinarySearchIf.dll
24064 LinearIf.dll
140800 Switch.dll
38912 SwitchWhen.dll

ufcpp commented Oct 31, 2017

@AdamSpeight2008

build script

Debug (/o-) build:

Length Name
74240 BinarySearchIf.dll
54272 LinearIf.dll
140800 Switch.dll
48128 SwitchWhen.dll

Release (/o+) build:

Length Name
31232 BinarySearchIf.dll
24064 LinearIf.dll
140800 Switch.dll
38912 SwitchWhen.dll
@ufcpp

This comment has been minimized.

Show comment
Hide comment
@ufcpp

ufcpp Oct 31, 2017

@jcouv
I'll post an issue to Roslyn in a few days.

Yes, these codes are code-generated. The generator is here: https://github.com/ufcpp/GraphemeSplitter/blob/master/GraphemeBreakPropertyCodeGenerator/Program.cs

ufcpp commented Oct 31, 2017

@jcouv
I'll post an issue to Roslyn in a few days.

Yes, these codes are code-generated. The generator is here: https://github.com/ufcpp/GraphemeSplitter/blob/master/GraphemeBreakPropertyCodeGenerator/Program.cs

@ufcpp

This comment has been minimized.

Show comment
Hide comment
@ufcpp

ufcpp Oct 31, 2017

In addition, it requires 50 seconds to compile the 20 thousand-line switch-case. (My PC is Core i7-4790 3.60GHz.)

ufcpp commented Oct 31, 2017

In addition, it requires 50 seconds to compile the 20 thousand-line switch-case. (My PC is Core i7-4790 3.60GHz.)

@Starwer

This comment has been minimized.

Show comment
Hide comment
@Starwer

Starwer Apr 3, 2018

My vote... I pretty much agree with @orthoxerox.

1.Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
Yes, Range : IEnumerable

Agree: Let's not reuse the LINQ.Range. This is limiting the range ;) of application of the operator.
And I fear performance impact on the length calculation:
a..b --> new Range(a, b - a + 1)

2.What exact syntax should be supported? There are numerous examples found in other languages.
e1..e2

Agree again: Perl-style is clear and doesn't conflict with any existing C# notation.

3.Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
Inclusive. ..< can be the exclusive operator.

Partly agree: I'm not too enthousiastic about the exclusive operator. Maybe again something hard to handle in optimization.

4.Should the syntax include the ability to specify a step or a stride?
No.

Agree: That makes the syntax more fishy...but we can make this Range constructor accept a step or stride:
var even = new Range(0, 10, 2);

5.Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
Yes, if negative indexing is added in general.

This time: Not for me. Again a performance issue. In case the range is not constant, we must check if the boundaries values are positive, yield perfomance impact.

6.Should the range syntax be supported in pattern matching against integral (or comparable) types?
Yes.

This syntax would be nice:

// exit the awkward C-stype for-loop:
foreach (int i in 0..10) { ... }

// Let's keep it consistent in if:
if (myvar in 0..10) { ... }

// Let make it in switch/case:
switch (myvar) {
    case 0..10: {...}
        break;
}

7.Should variables be permitted for boundaries of the range?

Yes. definitly

8.Should the syntax allow for open-ended ranges

Would be nice. Although I don't think we could map it to the same Range type then as defined as in 1. (performance, again).

9.What happens if the start value is greater than the end value?

This should result in an empty range. Again, this choice is driven for a performance concern.
I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Why do I care so much about performance ? I really want the lazy/readable/sweet code to be as efficient as the verbose ancestral way to do it. Otherwise... It will just be a feature for dummies...

IMHO, I think the ultimate goal of this is to be able to write such a code, with the help of Span and this[Range] overload on ICollection, wihtout penalty on performance:

var myvar = new mytype[10];
...
myvar[2..4] = othervar[1..3];

Starwer commented Apr 3, 2018

My vote... I pretty much agree with @orthoxerox.

1.Can a range expression be assigned directly to a variable? If so, what is the type of that expression?
Yes, Range : IEnumerable

Agree: Let's not reuse the LINQ.Range. This is limiting the range ;) of application of the operator.
And I fear performance impact on the length calculation:
a..b --> new Range(a, b - a + 1)

2.What exact syntax should be supported? There are numerous examples found in other languages.
e1..e2

Agree again: Perl-style is clear and doesn't conflict with any existing C# notation.

3.Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?
Inclusive. ..< can be the exclusive operator.

Partly agree: I'm not too enthousiastic about the exclusive operator. Maybe again something hard to handle in optimization.

4.Should the syntax include the ability to specify a step or a stride?
No.

Agree: That makes the syntax more fishy...but we can make this Range constructor accept a step or stride:
var even = new Range(0, 10, 2);

5.Should the syntax include specifying a range relative to an anchor, such as the end of an array/collection?
Yes, if negative indexing is added in general.

This time: Not for me. Again a performance issue. In case the range is not constant, we must check if the boundaries values are positive, yield perfomance impact.

6.Should the range syntax be supported in pattern matching against integral (or comparable) types?
Yes.

This syntax would be nice:

// exit the awkward C-stype for-loop:
foreach (int i in 0..10) { ... }

// Let's keep it consistent in if:
if (myvar in 0..10) { ... }

// Let make it in switch/case:
switch (myvar) {
    case 0..10: {...}
        break;
}

7.Should variables be permitted for boundaries of the range?

Yes. definitly

8.Should the syntax allow for open-ended ranges

Would be nice. Although I don't think we could map it to the same Range type then as defined as in 1. (performance, again).

9.What happens if the start value is greater than the end value?

This should result in an empty range. Again, this choice is driven for a performance concern.
I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Why do I care so much about performance ? I really want the lazy/readable/sweet code to be as efficient as the verbose ancestral way to do it. Otherwise... It will just be a feature for dummies...

IMHO, I think the ultimate goal of this is to be able to write such a code, with the help of Span and this[Range] overload on ICollection, wihtout penalty on performance:

var myvar = new mytype[10];
...
myvar[2..4] = othervar[1..3];
@vladd

This comment has been minimized.

Show comment
Hide comment
@vladd

vladd Apr 3, 2018

@Starwer

I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Should this be true even for end == int.MaxValue? That is, should

foreach (int i in (int.MaxValue-1)..int.MaxValue) { ... }

be an infinite loop?

vladd commented Apr 3, 2018

@Starwer

I believe those two expressions shall be equivalent, also performance-wise, no matter what (strange) value the variables start and end may contain:

for (int i=start; i <= end; i++) { ... }
foreach (int i in start..end) { ... }

Should this be true even for end == int.MaxValue? That is, should

foreach (int i in (int.MaxValue-1)..int.MaxValue) { ... }

be an infinite loop?

@Starwer

This comment has been minimized.

Show comment
Hide comment
@Starwer

Starwer Apr 4, 2018

@vladd: I agree it's not ideal, but I still prefer that solution than adding a boundary-safe condition that would grow with a O(n) complexity with the loop, just in case we hit the int.MaxValue.

We already accept that the arithmetic around the corners requires user-awareness for the even simpler expressions like: var i = int.MaxValue; i++;
I consider this is still a scarce corner case, compared to the empty-range which is a frequent case.

You seem to have this awareness. So honestly, if you would know that the use of foreach/Range is more expensive than a simple for-loop because it is safer, would you ever use it ?

Starwer commented Apr 4, 2018

@vladd: I agree it's not ideal, but I still prefer that solution than adding a boundary-safe condition that would grow with a O(n) complexity with the loop, just in case we hit the int.MaxValue.

We already accept that the arithmetic around the corners requires user-awareness for the even simpler expressions like: var i = int.MaxValue; i++;
I consider this is still a scarce corner case, compared to the empty-range which is a frequent case.

You seem to have this awareness. So honestly, if you would know that the use of foreach/Range is more expensive than a simple for-loop because it is safer, would you ever use it ?

@TheRWe

This comment has been minimized.

Show comment
Hide comment
@TheRWe

TheRWe Jun 5, 2018

  1. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?

The best way is to have upper bound exclusive.
examples:
exclusive

var a = new int[] {/*1,2,3,4,5*/}
var len = a.Length
var al = a[ .. len/2]
var ah = a[ len/2 .. ] // al.Concat(ah) = a
var aWithoutLast = a [ .. len-1 ]

inclusive (you need to substract 1 all the time)

var a = new int[] {  }
var len = a.Length
var al = a[ .. len/2-1 ]
var ah = a[ len/2 .. len-1 ] // al.Concat(ah) = a
var aWithoutLast = a [..len-2]

TheRWe commented Jun 5, 2018

  1. Should the upper boundary be inclusive or exclusive? Should the syntax of the operator support explicit inclusivity?

The best way is to have upper bound exclusive.
examples:
exclusive

var a = new int[] {/*1,2,3,4,5*/}
var len = a.Length
var al = a[ .. len/2]
var ah = a[ len/2 .. ] // al.Concat(ah) = a
var aWithoutLast = a [ .. len-1 ]

inclusive (you need to substract 1 all the time)

var a = new int[] {  }
var len = a.Length
var al = a[ .. len/2-1 ]
var ah = a[ len/2 .. len-1 ] // al.Concat(ah) = a
var aWithoutLast = a [..len-2]
@james-uffindell-granta

This comment has been minimized.

Show comment
Hide comment
@james-uffindell-granta

james-uffindell-granta Jun 25, 2018

Regarding using ranges in patterns, I think I'd prefer an explicit 'in' pattern:

var input = 9;
switch (input) {
    case in 0..10:
        // code
}

All our other patterns (const pattern, type pattern, var pattern) are all things which are the same type as the input, and we're doing an 'is' check on those. Using a bare range makes it look superficially like a const pattern, but it acts like a completely new type of pattern (doing an 'in' check instead of an 'is' check).

(I think this would also be clearer if we ever wanted to extend the switch pattern in future, to use any (non-const) collection object, instead of just a range literal:

switch (userId) {
    case in currentlyAssignedIds:
        // code

whereas I think just case currentlyAssignedIds: would require a bit extra mental overhead to interpret correctly.)

Also, using a bare range could make switching on a range confusing:

var range = 0..10;
switch (range) {
    case 0..10:
        // code
}

since there presumably the case pattern should be interpreted as a const pattern, rather than a range pattern?

Seeing how patterns can be used in if statements too, this would give us if (input is in 0..10), rather than if (input in 0..10). This is slightly more verbose, but only by three characters, and to me it feels a bit more consistent with the fact that all other pattern-based if statements use 'is'.

james-uffindell-granta commented Jun 25, 2018

Regarding using ranges in patterns, I think I'd prefer an explicit 'in' pattern:

var input = 9;
switch (input) {
    case in 0..10:
        // code
}

All our other patterns (const pattern, type pattern, var pattern) are all things which are the same type as the input, and we're doing an 'is' check on those. Using a bare range makes it look superficially like a const pattern, but it acts like a completely new type of pattern (doing an 'in' check instead of an 'is' check).

(I think this would also be clearer if we ever wanted to extend the switch pattern in future, to use any (non-const) collection object, instead of just a range literal:

switch (userId) {
    case in currentlyAssignedIds:
        // code

whereas I think just case currentlyAssignedIds: would require a bit extra mental overhead to interpret correctly.)

Also, using a bare range could make switching on a range confusing:

var range = 0..10;
switch (range) {
    case 0..10:
        // code
}

since there presumably the case pattern should be interpreted as a const pattern, rather than a range pattern?

Seeing how patterns can be used in if statements too, this would give us if (input is in 0..10), rather than if (input in 0..10). This is slightly more verbose, but only by three characters, and to me it feels a bit more consistent with the fact that all other pattern-based if statements use 'is'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment