Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminated union types #9163

Merged
merged 6 commits into from
Jun 17, 2016
Merged

Discriminated union types #9163

merged 6 commits into from
Jun 17, 2016

Conversation

ahejlsberg
Copy link
Member

@ahejlsberg ahejlsberg commented Jun 14, 2016

This PR implements support for discriminated union types, inspired by suggestions in #186 and #1003. Specifically, we now support type guards that narrow union types based on tests of a discriminant property and furthermore extend that capability to switch statements. Some examples:

interface Square {
    kind: "square";
    size: number;
}

interface Rectangle {
    kind: "rectangle";
    width: number;
    height: number;
}

interface Circle {
    kind: "circle";
    radius: number;
}

type Shape = Square | Rectangle | Circle;

function area(s: Shape) {
    // In the following switch statement, the type of s is narrowed in each case clause
    // according to the value of the discriminant property, thus allowing the other properties
    // of that variant to be accessed without a type assertion.
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
}

function test1(s: Shape) {
    if (s.kind === "square") {
        s;  // Square
    }
    else {
        s;  // Rectangle | Circle
    }
}

function test2(s: Shape) {
    if (s.kind === "square" || s.kind === "rectangle") {
        return;
    }
    s;  // Circle
}

A discriminant property type guard is an expression of the form x.p == v, x.p === v, x.p != v, or x.p !== v, where p and v are a property and an expression of a string literal type or a union of string literal types. The discriminant property type guard narrows the type of x to those constituent types of x that have a discriminant property p with one of the possible values of v.

Note that we currently only support discriminant properties of string literal types. We intend to later add support for boolean and numeric literal types.

if (!hasNonEmptyDefault) {
addAntecedent(postSwitchLabel, preSwitchCaseFlow);
const hasDefault = forEach(node.caseBlock.clauses, c => c.kind === SyntaxKind.DefaultClause);
// We mark a switch statement as possibly exhaustive if it has no default clause and if all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a switch/case non-exhaustive if it has a default? I feel like if it does, it is definitely exhaustive (because it accounts for all cases the user hasn't explicitly accounted for). Can you also document the answer in a comment here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has a default clause it is definitely exhaustive and not just possibly exhaustive. We already handle the definitely exhaustive case through normal control flow analysis (i.e. if all branches exit the post-switch label will have no antecedents).

# Conflicts:
#	src/compiler/binder.ts
#	src/compiler/checker.ts
@zpdDG4gta8XKpMCd
Copy link

what is the officially recommended way to do exhaustive checks?

@ahejlsberg
Copy link
Member Author

@Aleksey-Bykov To check for exhaustiveness you can add a default clause where you pass the narrowed object to a method that requires an argument of type never. This will fail if you're missing one or more cases. For example:

function assertNever(x: never): never {
    throw new Error("Unexpected object: " + x);
}

function area(s: Shape) {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
        default: return assertNever(s);  // Error here if there are missing cases
    }
}

@ahejlsberg
Copy link
Member Author

@Aleksey-Bykov Or you can just put the return assertNever(s) call after the switch statement:

function assertNever(x: never): never {
    throw new Error("Unexpected object: " + x);
}

function area(s: Shape) {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    return assertNever(s);  // Error here if there are missing cases
 }

@zpdDG4gta8XKpMCd
Copy link

zpdDG4gta8XKpMCd commented Jun 14, 2016

wait a second

function test1(s: Shape) {
    if (s.kind === "square") { // <--- huh??
        s;  // Square
    }
    else {
        s;  // Rectangle | Circle
    }
}

does this mean this: #7447 ?

@ahejlsberg
Copy link
Member Author

ahejlsberg commented Jun 15, 2016

@Aleksey-Bykov No, #7447 is a separate issue, but they're sort of related. The type guard s.kind === "square" has the effect of narrowing s to type Square. That in turn means that s.kind has type "square" within the guarded block (because that's the type of the kind property in Square). So, effectively the kind property is narrowed as well.

return type;
}
const clauseTypes = switchTypes.slice(clauseStart, clauseEnd);
const hasDefaultClause = clauseStart === clauseEnd || contains(clauseTypes, undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is if it implicitly has a default, or clauseTypes implicitly encodes an explicit default through undefined in place of a type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We use clauseStart === clauseEnd for the implicit default that falls out the bottom of the switch statement, and we use undefined to mark explicit default clauses.

@ivogabe
Copy link
Contributor

ivogabe commented Jun 15, 2016

@ahejlsberg What do you think of this idea: If x is of a union type, and property x.y is narrowed, then the type of x is filtered to the union parts based on the narrowed type of x.y. Then we can use type guards on properties to narrow the containing object. That would then require #7447 for the use case of this PR. That would allow such cases:

class Foo { foo: any; }
class Bar { bar: any; }
interface HasFoo {
  y: Foo;
  a: number;
}
interface HasBar {
  y: Bar;
  b: string;
}
const x: HasFoo | HasBar = ...;
if (x.y instanceof Foo) {
  x.a; // x: HasFoo
} else {
  x.b; // x: HasBar
}

@ahejlsberg
Copy link
Member Author

@ivogabe That's an interesting idea. Other than the Foo and Bar example, are there scenarios for which this is particularly compelling? I can't think of any offhand, but I might be missing something.

One concern is how this would affect performance. It seems that for every reference to x in a control flow graph we would now have to examine every type guard that has x as a base name in a dotted name. In other words, in order to know the type of x we'd have to look at all type guards for properties of x. That has the potential to generate a lot of work.

@ivogabe
Copy link
Contributor

ivogabe commented Jun 15, 2016

@ahejlsberg I didn't have a specific use case in my mind, it came to my mind when I was working on my thesis. In theory it is a good idea to reuse the same logic for such cases, as it will give the most accurate results. I'm not sure how this will affect the performance of the compiler. I think that the impact is small given that this is limited to union types. I think that it would require an implementation to know that for sure.

@danquirk
Copy link
Member

@ivogabe perhaps see #1260

@jesseschalken
Copy link
Contributor

To check for exhaustiveness you can add a default clause where you pass the narrowed object to a method that requires an argument of type never.

If the return type excludes undefined, can that serve as an exhaustiveness check since if the switch is non-exhaustive flow may reach the bottom and return undefined?

function area(s: Shape):number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    // unreachable, since s has type never
 }
function area(s: Shape):number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
    }
    // reachable, s has type Circle, function returns undefined, which is not compatible with number
 }

@zpdDG4gta8XKpMCd
Copy link

@jesseschalken make sure you compile with --strictNullChecks, it should do it

@ahejlsberg
Copy link
Member Author

@jesseschalken There are a surprising number of interconnected issues in the reachability, control flow, and exhaustiveness topics. I will try to explain in the following.

Our reachability analysis is based purely on the structure of your code, not on the higher level type analysis. For example, if you have code immediately following a return or throw statement, we know from the structure of your code that it is unreachable. But in this example

function area(s: Shape): number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    // Unreachable?
}

we can't conclude from the structure of the code that end point is unreachable. Indeed, someone might pass you a shape with an invalid kind and you may want to write code to guard against that:

function area(s: Shape): number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    fail("Invalid shape");
}

But even if fail never returns (i.e. if it returns never), we still can't tell from the structure of the code that the end point is unreachable. However, once you return the value of fail

function area(s: Shape): number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    return fail("Invalid shape");
}

we now know from the structure of the code that the end point is unreachable. Thus, there is no implicit return of undefined. Furthermore, because fail returns never and because never is ignored in combination with other types (i.e. number | never is just number), everything works out.

To get exhaustiveness checks, we use a slight twist on the above and pass the guarded object as an argument to a never returning function that expects a never parameter:

function area(s: Shape): number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
    return assertNever(s);  // Error unless type of s is never
}

Now, returning to the original example:

function area(s: Shape): number {
    switch (s.kind) {
        case "square": return s.size * s.size;
        case "rectangle": return s.width * s.height;
        case "circle": return Math.PI * s.radius * s.radius;
    }
}

You'd think the above would be an error because the structure of the code does not preclude the implicit return of undefined at the bottom of the function. The reason you don't get an error here is that when a switch statement is the last statement of a function, and when that switch statement has an exhaustive set of cases for the type of the switch expression, we suppress the undefined that would have come from the implicit return at the end of the function. This happens in the type checking phase as part of return type analysis and is one of the parts of this pull request.

@jesseschalken
Copy link
Contributor

@bluemmc It's likely because TypeScript is not just a language that happens to support JavaScript as a compilation target (like some other languages do), but rather it is intended to be JavaScript with type safety added. In other words rather than asking "How do we support ADTs?" the question is "How do we add type safety to the ADTs people already have?"

Nonetheless a dedicated ADT syntax would be nice. :)

@yortus
Copy link
Contributor

yortus commented Jun 22, 2016

@bluemmc the Mozilla Parser API is a good (and pretty widely used) example of how string-typed discriminants are used in real-world JavaScript. TypeScript now adds a great deal of value for statically checking code that uses this kind of API.

@ghost
Copy link

ghost commented Jun 22, 2016

There a lots of both good and bad in the javascript language and common javascript practices. I do not believe that typescript should repeat past mistakes. Also remember, that the reason strings are often used in plain javascript is because there are no better way of doing things. In this case, there are.

We should strive for great language design that solves problems without introducing new problems (or reintroducing old javascript problems). In addition, using types instead of strings allow more powerful future switch constructions (look at F# for examples).

@DanielRosenwasser
Copy link
Member

I'm not sure what mistakes are being repeated. We are modeling canonical JavaScript as it's used today. There are definitely ways we can expand on the current design that do not involve strings themselves, and even with what's been implemented so far, you can create constants and type aliases that refer to string literal types if you want to deliver clearer semantics over what a tag means.

We could invent a completely new syntax that would encompass exactly what you're talking about (#9241), but that would diverge from ECMAScript in a significant way, and we are not keen on doing so.

@kitsonk
Copy link
Contributor

kitsonk commented Jun 22, 2016

Also remember, that the reason strings are often used in plain javascript is because there are no better way of doing things. In this case, there are.

And TypeScript enables you to use them, although it has to live with the undeniable sadness of reality that there are significant amounts JavaScript patterns that cannot be modelled safely in TypeScript without this feature.

@CyrusNajmabadi
Copy link
Contributor

We should strive for great language design that solves problems

Agreed. And that's why i really love TypeScript. Because it really tries to solve the problem of "How do i work with existing JavaScript and JavaScript libraries in a way that helps me find problems faster, and makes me more productive in general."

TypeScript exists in a JavaScript world. One of the things that makes it great is that it embraces the code that is already out there and does not dictate that one needs to move away from it to get great experiences. That is absolutely solving a real problem, albeit maybe not exactly the one you want it to be solving :)

The good thing is that, as mentioned above, you don't ever need to use this feature if you don't want to. if it's not how you write your own types, then that's fine. If you don't interface with existing libraries that work in this manner, then it won't ever affect you.

However, I have a few apps i've written that talk to webservices that use precisely this pattern to model their results. Prior to this feature i would model the types i expected, but i had to write lots of code to check kinds and manually cast all over the place. It was ugly, verbose, and very redundant. This feature allows me to greatly simplify my code while still giving me the great error checking and productivity gains that i love about TS.

@ghost
Copy link

ghost commented Jun 22, 2016

@CyrusNajmabad I want to be able to use discriminated union types in a typesafe manner like F#. With such a design, there is nothing that stops people of expressly switching on strings instead using the string returned by xxxx.kind()

@DanielRosenwasser Your are essentially argumentinf that discriminated union checks should not be typesafe because javascript is not. I thought the idea of typescript was to make things typesafe, maintainable and to enable tool-support. This proposal is none of these things because of this flaw. Thanks for the link to the much better alternative proposal though.

@jesseschalken
Copy link
Contributor

I thought the idea of typescript was to make things typesafe, maintainable and to enable tool-support. This proposal is none of these things because of this flaw.

Yes, it is type safe. That's the entire point of this issue. The area(s:Shape) function will not accept anything that doesn't precisely fit the Shape type, the switch statement inside area cannot access properties of any of the concrete shapes without being guarded by an appropriate kind check, and the switch is checked for exhaustiveness as mentioned previously. That's all the features you get with dedicated algebraic data types.

The only thing that's wrong is it's a bit ugly and doesn't hide the type tag from you. But if you mistype one of the type tags you will still get an error, just as though you mistyped the name of a constructor for an ADT.

@ghost
Copy link

ghost commented Jun 22, 2016

@jesseschalken Unless this function gives a compiler error, then the proposal is NOT typesafe:

function area(s: Shape) {
switch (s.kind) {
case "squara": return 42;
case "rectangle": return s.width * s.height;
case "circle": return Math.PI * s.radius * s.radius;
}
}

@jesseschalken
Copy link
Contributor

jesseschalken commented Jun 22, 2016

@bluemmc If I understand @ahejlsberg's comment correctly, it would. The switch is not exhaustive which means that the implied return undefined at the bottom of the function is not suppressed, and the return type is inferred as number|undefined. If you add :number to the signature, that will show as an error, but otherwise you'll still see the error where the function is called assuming it does something with the result that can't be done with undefined.

If the switch is not the last statement in the function (eg, the cases set a variable instead of returning), or all of the call sites in the same compilation unit happen to do something with the result that is permitted with undefined and you don't want to add a return type to the function, you can still check exhaustiveness by adding default: return assertNever(s);.

If in the future the compiler considered code for which a local variable had type never as unreachable, the return 42; would also be flagged as unreachable.

@zpdDG4gta8XKpMCd
Copy link

zpdDG4gta8XKpMCd commented Jun 22, 2016

@bluemmc the way unions are (non-discriminated yet) in TypeScript is way better than sum types Haskell's or F#, because in TypeScript:

  • types can be mixed arbitrarily without having to be wrapped into constructors and declared under a cerain data type
  • can be discriminated based on your own logic, written the way you want it

Let me explain. In Haskell you can't pass Just a or Nothing alone they can only be seen:

  1. as constructors
  2. as cases of Maybe a (only) while pattern matched

Meaning they are not real types because you cannot declare a value of type Nothing, you will have to make it of type Maybe a.

Why is that? Because the internal mechanism of sum types requires these cases to be parts of the Maybe type. Only then will it be able to discriminate one case from another.

Now in TypeScript you can declare Some<a> and None as completely separate interfaces that have nothing to do with each other and they are 100% real types without any limitations:

interface Some<a> { some: a; }
interface None { none: void; }

And later you can:

  1. Use them alone: const someEmptyString : Some<string> = { some: '' }; or...
  2. Use them together mixing under a new type type Optional<a> = Some<a> | None or...
  3. Recombine them with in a completely new type: type Uncertain<a> = Some<a> | None | Dunno

So it gives you a greater degree freedom at the cost of... having to come up with your own way to destructure them into possible cases. Scared? Fear not, because thank to type guards and narrowing switch statements you are given all the tools you could possible need:

interface Dunno { hm: void; }
const none: None = { none: void 0 }; // single case value declared alone, can't be done in Haskell
const some: Some<string> = { some: 'hey' }; // single case value declared alone, can't be done in Haskell
const dunno: Dunno = { hm: void 0 }; // single case value declared alone, can't be done in Haskell

let huh: Optional<a> = Math.random() > 0.5 ? some : none;
let meh: Uncertain<a> = Math.random() > 0.3 ? some : Math.random() > 0.5 ? none : dunno;
function isSome<a, b>(value: Some<a> | b) : value is Some<a> {
    return 'some' in value; // one of many possible ways to discriminate Some out of an arbitrary union
}
meh = huh; // works! without having to transform Optional to Uncertain, can't be done in Haskell
if (isSome(meh)) {
// works! using one function to exclusively pattern match only Some case out of an arbitrary type that might have it
// can't be done in Haskell
    alert(meh.some);
}
if (isSome(huh)) {
// works again! using the very same function to pattern match only Some case of a completely different type again
// can't be done in Haskell either
   alert(huh.some);
}

All in all, the union types in TypeScript together with various narrowing facilities give you ultimate freedom to design your ADT's the way you always wanted it.

It's not a shorcoming as you think, it's a flipping blessing sent to us from the gods of programming above. Enjoy it.

@basarat
Copy link
Contributor

basarat commented Jun 25, 2016

I am going with a simple const _exhaustiveCheck: never = s; for exhaustive checks. Also added a section here : https://basarat.gitbooks.io/typescript/content/docs/types/discriminated-unions.html PS made a release of alm with all this amazing work pulled in 🌹

@zpdDG4gta8XKpMCd
Copy link

zpdDG4gta8XKpMCd commented Jun 25, 2016

@basarat such check saves you at compile time (assuming your code model is solid and consistent), but it doesn't save you at runtime when an unexpected compile-time-impossible case comes in. Then your check would silently suck it in like nothing happened. As opposed to a function that throws that would crash you fast and loud 👊

@basarat
Copy link
Contributor

basarat commented Jun 25, 2016

such check saves you at compile time (assuming your code model is solid and consistent), but it doesn't save you at runtime when an unexpected compile-time-impossible case comes in

Having that throw in there feels a lot like the mandatory throw that is not something I'd put in TypeScript 🌹 Unless there is an example that is in TypeScript that makes sense.

Also, I made a snippet in alm :)

yay

@zpdDG4gta8XKpMCd
Copy link

zpdDG4gta8XKpMCd commented Jun 25, 2016

it's not mandatory it's just a question whether you trust your data or not

say, you expect a shape of 3 cases

then suddenly your colegues from a backend team added one more case without letting you know

with your denial to throw you will know about it after days or weeks

with a throw you will know much earlier

your choice

@basarat
Copy link
Contributor

basarat commented Jun 25, 2016

your choice

Agreed. I'd rather focus on codegen instead of adding a throw. We've done code gen on backend code to make sure that we get type defs that don't go out of sync. Without that even a simple matter of foo.bar can become an error that isn't noticed till after weeks 🌹

@zpdDG4gta8XKpMCd
Copy link

zpdDG4gta8XKpMCd commented Jun 25, 2016

thinking of the advise in your book, you'd better be off suggesting to throw, because far not everyone is

  • capable of codegen
  • permitted to codegen
  • cares enough

@shelby3
Copy link

shelby3 commented Sep 10, 2016

@bluemmc wrote:

Yes, the language should hide this for me. It is a design smell to encode types as magic strings.

That is only possible if you want nominal typing, but TypeScript's interface doesn't support nominal typing.

For example, it makes refactoring more difficult because one can't assume that all stringfied instances of a type is really a reference to said type or if it is something else.

Structured typing doesn't aid refactoring in the way nominal typing does.

@Aleksey-Bykov wrote:

the way unions are (non-discriminated yet) in TypeScript is way better than sum types Haskell's or F#, because in TypeScript:

  • types can be mixed arbitrarily without having to be wrapped into constructors and declared under a cerain data type

... Fear not, because thank to type guards and narrowing switch statements you are given all the tools you could possible need

Afaics, these structural sum types don't support compile-time extensibility of _existing_ classes (without editing their dependent code, e.g. a function returning a type of existing class), which afaics only typeclasses with unions could do.

Let me explain. In Haskell you can't pass Just a or Nothing alone they can only ...

Afaik this is because Haskell can't support first-class unions without breaking global type inference decidability. When you give up global type inference, then the declaring the union for a sum type can declared orthogonally to the data types which are members of the sum type. Although I think Haskell does support nominal (not anonymous, not first-class) unions (aka enum) in an extension. By first-class, we mean can interact with other higher-order typing constructs such as functions and subtyping. In general the Lamba cube is undecidable for global type inference away from its origin.

@shelby3
Copy link

shelby3 commented Sep 10, 2016

@Aleksey-Bykov wrote:

thinking of the advise in your book, you'd better be off suggesting to throw, because far not everyone is

  • capable of codegen
  • permitted to codegen
  • cares enough

All good reasons to hope Google's SoundScript becomes a compelling reality, eventually a standard, and that it supports the good parts from TypeScript.

We'd still be able to use codegen with SoundScript as a target for extra features and experimentation. So TypeScript's raison d'être wouldn't necessarily end.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet