Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add completeness checking to pattern matching draft specification #188

Closed
agocke opened this issue Jan 31, 2015 · 39 comments
Closed

Comments

@agocke
Copy link
Member

agocke commented Jan 31, 2015

Background

As noted in issue #180, many modern programming programs are data-focused, especially distributed applications which tend to store, manipulate, and move sets of data between different storage and computation points. One solution proposed to deal with this issue is a combination of records and pattern matching. Records provide a simple way to declare and structure the data types and pattern matching provides a way to destructure and manipulate the data.

Problem

Records provide a great way to represent the data and pattern matching provides a great way to manipulate the data, but there is currently no mechanism in the #180 proposal to ensure that the data and the logic remain in sync. The nature of records and pattern matching is that the data declaration code is often far from the data consumption code. In a distributed system it's even more likely that a single data structure will be consumed and manipulated in various parts of the code base. If the data structure is ever modified, there is currently no mechanism in the draft to alert the programmer that all instances of manipulation logic must be updated.

Solution

Add completeness checking to certain switch statements on certain record types. The core of this proposal is to provide a warning when a switch statement does not handle every possible match on a type hierarchy. This proposal features two possible designs for this idea, presented in order of increasingly intrusive modification to the language.

Design 1

This design actually features no new syntax or semantics beyond that of proposal #180. The suggestion is to create a C# type heirarchy which can be guaranteed 'complete' with existing language features. In this case, complete means that it is not possible for a new subclass of the root member of the type hierarchy, so the compiler can be sure that any and all subclasses of the chosen switching type are visible in the current compilation.

We can construct this type hierarchy in existing C# with the following rules:

  1. All subclasses of the root type must be sealed, preventing any subclassing of any existing leaf types in the hierarchy.
  2. All constructors of the root type must be private, prevent any subclassing of the root type.
    • As a consequence, all subclasses must be inner classes and thus must be in source in the current compilation

Here's an example of the structure of this type hierarchy:

abstract class C
{
  private C() {}
  public sealed class C_1 : C {}
  public sealed class C_2 : C {}
      ...
  public sealed class C_n : C {}
}

This guarantees that switching on an instance of type C which explicitly matches C_1...C_n has matched against every possible instance of C. The only thing which changes about the language specification is a requirement that the compiler produce a warning when not all cases are matched.

Design 2

There are a few disadvantages to Design 1:

  1. The mandated structure is complicated and brittle. Forgetting to mark any of the subclasses as sealed or adding any public constructors won't produce a compiler error or warning, but the compiler will now silently skip the completeness check.
  2. The structure is verbose -- most of the sealed or private markers are mostly part of the 'incantation' of completeness and are not directly related to the task at hand.
  3. The relevant record instances are all nested classes, so all references require an extra layer of naming indirection.

Design 2 attempts to fix these problems by replacing much of the boiler plate with a new combination of modifiers on a type -- abstract + sealed. Under Design 2, marking the root type of a hierarchy as abstract sealed will cause the structure from Design 1 to be generated by the compiler in lowering. The following example demonstrates what the structure from Design 1 looks like with an abstract sealed type:

abstract sealed class C {}
public class C_1 : C {}
public class C_2 : C {}
   ...
public class C_n : C {}

In this case, most of the problems with Design 1 are solved, but new semantics are required to be added to the language.

@gafter gafter self-assigned this Jul 24, 2015
@gafter gafter changed the title Proposal: Add completeness checking to pattern matching draft specification Proposal: Add completeness checking (algebraic data types) to pattern matching draft specification Sep 10, 2015
@alrz
Copy link
Contributor

alrz commented Sep 30, 2015

This can be another option as well (based on Design 2):

public abstract case class C {}
public case class C_1 : C {}
public case class C_2 : C {}
   ...
public case class C_n : C {}

So case classes can only inherit from other case classes. Also, this helps to distinguish between case classes and regular ones, if they were defined in different files.

@gafter
Copy link
Member

gafter commented Sep 30, 2015

@alrz I don't get it. Would it be an error to extend C_1 in another assembly? Or is there some other rule that would allow the compiler to know that it sees all of the cases?

@alrz
Copy link
Contributor

alrz commented Sep 30, 2015

@gafter Quoting yourself, "It's a closed hierarchy of types" (like discriminated unions in F# but more flexible since you can inherit from other cases) so why should it be extendable in another assembly?

@gafter
Copy link
Member

gafter commented Sep 30, 2015

@alrz so case would mean "not extendable in another assembly"? Sort of a strange choice of keyword for that meaning.

@alrz
Copy link
Contributor

alrz commented Sep 30, 2015

@gafter Same keyword is used in Scala for exactly this purpose — implying that each case class corresponds to a case statement, I guess.

@gafter
Copy link
Member

gafter commented Sep 30, 2015

@alrz no, case in Scala does not restrict inheritance.

@alrz
Copy link
Contributor

alrz commented Sep 30, 2015

@gafter Yes, case has nothing to do with inheritance restriction in Scala, but my suggestion is not exactly what Scala is offering. Actually it's something in between of its syntax:

sealed abstract class C
case object C_1 extends C
case object C_2 extends C

and the proposed one:

sealed abstract class C {}
class C_1 : C {}
class C_2 : C {}

I'm trying to say that I think this is a more expressive syntax compared to above examples, for this specific use case:

abstract case class C {} // or sealed abstract case class?
case class C_1 : C {}
case class C_2 : C {}

Marking all classes with a unified keyword, indicating that these classes belong to a closed hierarchy.

PS: Although, this is just another option to consider. Except for this little concern, sealed abstract looks good to me.

@orthoxerox
Copy link
Contributor

Will this support multiple levels of inheritance?

abstract sealed class C {}
abstract sealed class D : C {}
public class C_1 : C {}
public class C_2 : C {}
public class D_1 : D {}
public class D_2 : D {}

So now the compiler will look for either C, (C_1 + C_2 + D) or (C_1 + C_2 + D_1 + D_2) when checking for completeness.

@gafter
Copy link
Member

gafter commented Oct 6, 2015

@orthoxerox Yes, we do not plan to make one level of inheritance any more special that a second level of inheritance. The compiler will have to build a decision tree and ensure that every path reachable from the root is handled.

@orthoxerox
Copy link
Contributor

@gafter great, I missed that in F#.

@gafter gafter added this to the C# 7 and VB 15 milestone Oct 21, 2015
@gafter gafter mentioned this issue Oct 27, 2015
49 tasks
@gafter
Copy link
Member

gafter commented Nov 3, 2015

See the excellent comparison of difference languages by @jonschoning in #5154 (comment)

@gafter gafter assigned agocke and gafter and unassigned gafter and agocke Nov 20, 2015
@gafter gafter removed this from the C# 7 and VB 15 milestone Nov 21, 2015
@dsaf
Copy link

dsaf commented Nov 29, 2015

Would C#7/8 promote having more of multiple types per file or the IDE will be capable of grouping them automatically?

@DavidArno
Copy link

As I understand this proposal, it would allow the creation of discriminated unions, such as

public abstract sealed class Option<T>
{
    class None();
    class Some<T>(T value);
}

And, because the compiler knows this is a complete hierarchy, I could pattern match as follows:

public string SomeFunction(Option<int> optionalValue) =>
    optionalValue switch
    {
        case None(): "It's a None!",
        case Some(var value): $"It's a {value}"
    };

In other words, the default (or * if we stick with that notation) case wouldn't be required.

Have I understood this correctly?

@gafter
Copy link
Member

gafter commented Feb 10, 2016

@DavidArno Yes, that is precisely correct.

@DavidArno
Copy link

@gafter,

Is this functionality implemented in any of the feature branches yet?

@KalitaAlexey
Copy link

I suggest ADT with the following syntax:

void Process(int | List<int> v)
{
    switch (v)
    {
        case int i:
            ProcessInt(I);
            break;
        case List<int> l:
            ProcessList(l);
    }
}

And to make ADT like type the following syntax:

// For structs
struct Type = int | bool;
// For classes or both structs and classes
class Type = int | List<int>;

Roslyn may enforce a user to write class when one of specified types is not a struct.

@svick
Copy link
Contributor

svick commented Apr 3, 2016

@KalitaAlexey I don't think you can all that ADT, it's more like union type. In ADTs, the cases have names and can be recursive (I can't tell if your proposal would allow that or not).

@KalitaAlexey
Copy link

I like how it is done in Rust. I think we could inherit their enum ADT.
In Haskell

data Expression = Number Int
                | Add Expression Expression
                | Minus Expression
                | Mult Expression Expression
                | Divide Expression Expression

I'd like to have In C#

enum Expression {
    Number(int Number),
    Add(Expression LeftOperand, Expression RightOperand),
    Minus(Expression Expression),
    Mult(Expression LeftOperand, Expression RightOperand),
    Divide(Expression LeftOperand, Expression RightOperand),
}

And pattern matching like

void ProcessExpression(Expression expression)
{
    switch (expression)
    {
    case Expression::Number n:
        ProcessNumber(n);
        break;
    case Expression::Add a:
        ProcessAdd(a);
        break;
    case Expression::Minus m:
        ProcessMinus(m);
        break;
    case Expression::Mult m:
        ProcessMult(m);
        break;
    }
    case Expression::Divide d:
        ProcessDivide(d);
        break;
}

@HaloFour
Copy link

HaloFour commented Apr 3, 2016

@KalitaAlexey #6739

@KalitaAlexey
Copy link

@HaloFour Thanks. What's the difference then?

@wekempf
Copy link

wekempf commented Apr 12, 2016

The discussion around this concept is scattered everywhere, so forgive me if I just repeat something said elsewhere. Option 1 has serious problems but is on the right track. Option 2 I don't care for because it scatters the declarations and muddles the concept. What we're modeling here is a discriminated union. @KalitaAlexey gets close to the syntax I'd prefer, but just using "enum" is at least confusing, if it doesn't actually cause parsing problem. I'd suggest (and saw others do so as well in other threads) "enum class".

public enum class Expression {
    Number(int Number),
    Add(Expression LeftOperand, Expression RightOperand),
    Minus(Expression Expression),
    Mult(Expression LeftOperand, Expression RightOperand),
    Divide(Expression LeftOperand, Expression RightOperand),
}

There's still lots of open questions after deciding on rough syntax like this, however. For instance, in this thread it was suggested a DU could have a type that's also a DU. I'm not sure that makes sense and would suggest not allowing that, knowing you can always add this feature in the future if that turns out to be the wrong decision but you can't remove a feature, ever. I just don't know when such a feature would actually be useful, and without a compelling use case it seems best to err on the conservative side.

I've seen other posts where the syntax is very similar to the above but "abstract sealed" is used instead of "enum class". Frankly, I think "abstract sealed" is highly confusing and gives no indication that one is building a DU, while "enum class" is intuitive.

@gafter gafter modified the milestones: 2.0 (RTM), 2.0 (RC), 3.0 Jun 26, 2016
@agocke agocke changed the title Proposal: Add completeness checking (algebraic data types) to pattern matching draft specification Proposal: Add completeness checking to pattern matching draft specification Feb 21, 2017
@gafter gafter closed this as completed Apr 21, 2017
@gafter
Copy link
Member

gafter commented Apr 21, 2017

Issue moved to dotnet/csharplang #486 via ZenHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment