New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trade-offs in Control Flow Analysis #9998

Open
RyanCavanaugh opened this Issue Jul 28, 2016 · 44 comments

Comments

Projects
None yet
@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Jul 28, 2016

Some rough notes from a conversation @ahejlsberg and I had earlier about trade-offs in the control flow analysis work based on running the real-world code (RWC) tests. For obvious reasons I'll be comparing what Flow does with similar examples to compare and contrast possible outcomes.

The primary question is: When a function is invoked, what should we assume its side effects are?

One option is to be pessimistic and reset all narrowings, assuming that any function might mutate any object it could possibly get its hands on. Another option is to be optimistic and assume the function doesn't modify any state. Both of these seem to be bad.

This problem spans both locals (which might be subject to some "closed over or not" analysis) and object fields.

Optimistic: Bad behavior on locals

The TypeScript compiler has code like this:

enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
    token = Token.Beta;
}
function maybeNextToken() {
    if (... something ...) {
        nextToken();
    }
}

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
    }
    // is this possible?
    if (token === Token.Alpha) {
        // something happens
    }
}

Optimistically assuming token isn't modified by maybeNextToken incorrectly flags token === Token.Alpha as an impossibility. However, in other cases, this is a good check to! See later examples.

Optimistic: Bad behavior on fields

The RWC suite picked up a "bug" that looked like this:

// Function somewhere else
declare function tryDoSomething(x: string, result: { success: boolean; value: number; }): void;

function myFunc(x: string) {
    let result = { success: false, value: 0 };

    tryDoSomething(x, result);
    if (result.success === true) { // %%
        return result.value;
    }

    tryDoSomething(x.trim(), result);
    if (result.success === true) { // ??
        return result.value;
    }
    return -1;
}

The ?? line here is not a bug in the user code, but we thought it was, because after the %% block runs, the only remaining value in result.success's domain is false.

Pessimistic: Bad behavior on locals

We found actual bugs (several!) in partner code that looked like this:

enum Kind { Good, Bad, Ugly }
let kind: Kind = ...;
function f() {
    if (kind) {
        log('Doing some work');
        switch (kind) {
            case Kind.Good:
                // unreachable!
        }
    }
}

Here, we detected the bug that Kind.Good (which has the falsy value 0) is not in the domain of kind at the point of the case label. However, if we were fully pessimistic, we couldn't know that the global function log doesn't modify the global variable kind, thus incorrectly allowing this broken code.

Pessimistic: Bad behavior on fields, example 1

A question on flowtype SO is a good example of this

A smaller example that demonstrates the behavior:

function fn(arg: { x: string | null }) {
    if (arg.x !== null) {
        alert('All is OK!');
        // Flow: Not OK, arg.x could be null
        console.log(arg.x.substr(3));
    }
}

The problem here is that, pessimistically, something like this might be happening:

let a = { x: 'ok' };
function alert() {
    a.x = null;
}
fn(a);

Pessimistic: Bad behavior on fields, example 2

The TS compiler has code that looks like this (simplified):

function visitChildren(node: Node, visit: (node: Node) => void) {
    switch(node.kind) {
        case SyntaxKind.BinaryExpression:
            visit(node.left);
            visit(node.right); // Unsafe?
            break;
        case SyntaxKind.PropertyAccessExpression:
            visit(node.expr);
            visit(node.name); // Unsafe?
            break;
    }
}

Here, we discriminated the Node union type by its kind. A pessimistic behavior would say that the second invocations are unsafe, because the call to visit may have mutated node.kind through a secondary reference and invalidated the discrimination.

Mitigating with (shallow) inlining / analysis

Flow does some assignment analysis to improve the quality of these errors, but it's obviously short of a full inlining solution, which wouldn't be even remotely practical. Some examples of how to defeat the analysis:

// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
    function check1() {
        x = 'still OK';
    }

    if (x !== null) {
        check1();
        // Flow: Error, x could be null
        console.log(x.substr(0));
    }
}
// Inlining is only one level deep
function fn(x: string | null) {
    function check1() {
        check2();
    }
    function check2() {
        x = null;
    }

    if (x !== null) {
        check1();
        // Flow: No error
        console.log(x.substr(0)); // crashes
    }
}

Mitigating with const parameters

A low-hanging piece of fruit is to allow a const modifier on parameters. This would allow a much faster fix for code that looks like this:

function fn(const x: string | number) {
  if (typeof x === 'string') {
    thisFunctionCannotMutateX();
    x.substr(0); // ok
  }
}

Mitigating with readonly fields

The visitChildren example above might be mitigated by saying that readonly fields retain their narrowing effects even in the presence of intervening function calls. This is technically unsound as you may have both a readonly and non-readonly alias to the same property, but in practice this is probably very rare.

Mitigating with other options

Random ideas that got thrown out (will add to this list) but are probably bad?

  • pure modifier on functions that says this function doesn't modify anything. This is a bit impractical as we'd realistically want this on the vast majority of all functions, and it doesn't really solve the problem since lots of functions only modify one thing so you'd really want to say "pure except for m"
  • volatile property modifier that says this "this property will change without notice". We're not C++ and it's perhaps unclear where you'd apply this and where you wouldn't.
@aleksey-bykov

This comment has been minimized.

Copy link

aleksey-bykov commented Jul 28, 2016

pure is a bit impractical

with persistent data structures it makes perfect practical sense, amortized time/memory consumption for a typical application would be comparable, it take's some discipline and support from the language ( #6230, #2319, #1213), but it's worth the trouble

with readonly for immutables ans pure for functions it will be blazing fast compared to inlining, because modifiers only need to be checked once for each function/interface

please no more half measures based on assumptions for a couple of typical cases and endless exceptions that only worsen soundness

@ahejlsberg

This comment has been minimized.

Copy link
Member

ahejlsberg commented Jul 28, 2016

Looking through the RWC tests, I see only one case where the additional narrowing performed by #9407 causes unwanted errors. However, there were scores of real bugs and inconsistencies being caught, such as checking for the same value twice, assuming that enum members with the value 0 pass truthiness checks, dead branches in if and switch statements, etc. In aggregate, I think our optimistic assumption that type guards are unaffected by intervening function calls is the best compromise.

BTW, the compiler itself relies on side effecting changes to a token variable in the parser. For example:

if (token === SyntaxKind.ExportKeyword) {
    nextToken();
    if (token === SyntaxKind.DefaultKeyword) {
        // We have "export default"
    }
    ...
}

This becomes an error with #9407 because the compiler continues to think token has the value SyntaxKind.ExportKeyword following the call to nextToken (and thus reports an error when token is compared to SyntaxKind.DefaultKeyword).

We will instead be using a function to obtain the current token:

if (token() === SyntaxKind.ExportKeyword) {
    nextToken();
    if (token() === SyntaxKind.DefaultKeyword) {
        // We have "export default"
    }
    ...
}

where the function is simply:

function token(): SyntaxKind {
    return currentToken;
}

Since all modern JavaScript VMs inline such simple functions there is no performance penalty.

I think this pattern of suppressing type narrowing by accessing mutable state using a function is a reasonable one.

@gauravk92

This comment has been minimized.

Copy link

gauravk92 commented Jul 29, 2016

is there any possibility of a const on an argument?

function fn(const x: string | number)

In swift/objc they added the "in", "out", and "inout" meta attributes to arguments so you could see if they meant for a value to be modified. i feel like this could be useful for annotating whether a function will modify a variable it received or not. Not sure if I'm missing something.

@kitsonk

This comment has been minimized.

Copy link
Contributor

kitsonk commented Jul 29, 2016

While pure is all the "rage" these days, I think many JavaScript developers don't understand what it means and could easily lead to surprises and or large scale frustration. Because the inherit universal mutability of JavaScript I think pure would generally be shackles that would potentially go unused because of the over strictness of the concept... It just seems an "all or nothing" concept, when most of JavaScript is shades of grey.

I think a argument by argument modifier makes the most sense. Of course const and readonly only imply disallowing reassignment. Personally, I don't find it confusing of expanding those to inform the flow control that property mutations are "unsafe" as that would be a design time only construct.

Is there any benefit in considering a deeply immutable design time keyword to avoid any confusion between const and readonly? Potentially even immutable (though that is jargoney).

@aleksey-bykov

This comment has been minimized.

Copy link

aleksey-bykov commented Jul 29, 2016

in my stubborn defense of pure:

  • it is easy to implement (at least in my naive view of it)
  • it is fast with almost no impact on current performance level (naive again)
  • it will do what it's been asked for
  • it will do it 100% correct no exceptions ever
  • it will cover 30% cases out of the box without having to learn anything: [1, 2, 3].map(x => x + 1 /* <-- hi, i am pure function */)
  • and it won't hurt to have it in addition to what will work best for the mainstream JavaScript audience

i really wish it was it was considered

other options may work out too and should be considered, just not a big fan of half-baked solutions

@kitsonk

This comment has been minimized.

Copy link
Contributor

kitsonk commented Jul 29, 2016

I understand your argument, although I think what you would cover with "obviously" pure functions are not ones that suffer from CFA challenges. You example of [1, 2, 3].map(x => x + 1) does not suffer from CFA function boundary issues, and while it is logically pure, there is not benefit in denoting this at design time.

I think your argument might have more merit if you could think of some examples where there are current CFA issues that would be solved by a pure notation that benefitted the developer. My argument isn't against the concept of writing pure functions, it is more of the argument that an "all or nothing" solution likely won't be practical.

One of the biggest current issues with CFA in my opinion is in Promise callbacks. There are many situations where those might not be pure. That is why I think an argument by argument mutability notation would solve significantly more use cases and do so in a more end-developer controllable way.

@aleksey-bykov

This comment has been minimized.

Copy link

aleksey-bykov commented Jul 29, 2016

although you are right that the example doesn't target CFA issues, the answer is rather evident:

  • if a callback meets all constraints that make it pure, then it is safe to say that all assertions that were made outside of it are still valid inside of it (including all CFA reasonings deduced before) because it doesn't change anything

this is one merit of the pureness, among many others

let me say it again, if a promise callback is pure, all reasoning that was done outside of it's scope or time is still valid inside of it

you might wonder why i am so certain, let me explain, there is no notion of time in math that stands behind the idea of pureness, time doesn't exist there, it's absolutely static, everything existed forever and forever will and cannot ever change (because changes need time) to the point where a pure function can be thrown away and replaced with its result (what's called referential transparency) which for the any given arguments will always be the same (hence a question why not just use a hash-table instead to map arguments to results), so you don't really need a function in the first place, armed with such super static equivalence between results that were calculated vs hardcoded the only merit in using a function is to save some memory that otherwise would be taken by a huge static table mapping all possible arguments to all results, since math is absolute abstraction we only care when it comes to program it

this is what a pure function is, this is why it will always work

i agree that it's rather 'all or nothing', but is the price you have to pay for the peace of mind because in its strictness it solves the CFA problems and many others simply by giving them no place to exist

@aleksey-bykov

This comment has been minimized.

Copy link

aleksey-bykov commented Jul 29, 2016

think about it, a "pure" (aka persistent) data structure, say, a linked list, can be deeply copied just by... being assigned:

const one = toList();
const absolutelyAnotherListThatWeCanDoWhateverWeWant = one;

how cool is that?

in my humble opinion "pure" functions and data structures can easily cover 40% of typical everyday needs without even feeling restricted

as for the rest 60% (since we don't appreciate any sort of extremes) we can use good old mutable impure JavaScript

but mind those 40% percent of no problem ever! that's a darn bargain, just let it happen and enjoy it

@sledorze

This comment has been minimized.

Copy link

sledorze commented Jul 29, 2016

totally agree with @aleksey-bykov arguments in favor of having "pure" introduced into the type checker.

@gauravk92

This comment has been minimized.

Copy link

gauravk92 commented Jul 30, 2016

Just my .02, I think "pure" would be too abstract and too confusing to understand.

I'm actually in favor of apple style semantics, "in", "out", "inout"

Also I think const could lead to confusion since they would be similar in usage but different concepts.

It might be easier for beginners to pick up and understand but the intent simply wouldn't be as clear.

@maiermic

This comment has been minimized.

Copy link

maiermic commented Jul 31, 2016

@RyanCavanaugh The compiler should already be able to answer the question in the example of Optimistic: Bad behavior on locals already with yes. If doSomething is called when token is Token.Alpha, the condition token !== Token.Alpha of the first if statement is false. Hence, the body is skipped and nextToken is never called. token is Token.Alpha still holds when the second if statement is reached. Ergo the answer is yes.

We can make this question insteresting again by changing doSomething like this:

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
        // something happens
        }
    }
}

Now it is only possible if maybeNextToken changes token to Token.Alpha. If you are optimistic you would answer the question with no, since you assume that functions have no side effects like changing variables. However, if you change token in nextToken to Token.Alpha instead of Token.Beta

function nextToken() {
    token = Token.Alpha;
}

it would be possible if and only if ... something ... is true in maybeNextToken. If you are optimistic you would still answer the question with no even though the correct answer is yes, maybe.

@maiermic

This comment has been minimized.

Copy link

maiermic commented Jul 31, 2016

Infer constraints about functions

How about tracking variable changes of functions?

The basic idea is that we know a constraint before a function call and that we can reason about a constraint after a function call.

Note: I write a new introduced term bold in the following when it is mentioned the first time. Jump back to this position to see the definition of this term.

if statements

This idea is quite similar to control flow base type analysis. So let's take a look at the if statement in the example of control flow base type analysis first (before we discover constraints introduced by functions):

function foo(x: string | number | boolean) {
    if (typeof x === "string") {
        x; // type of x is string here
        x = 1;
        x; // type of x is number here
    }
    x; // type of x is number | boolean here
}

I rewrite the constraint annotations like this

function foo(x: string | number | boolean) {
    // x :: string | number | boolean
    if (typeof x === "string") {
        // x :: string
        x = 1;
        // x :: number
    }
    // x :: number | boolean
}

As usual, a single colon like in x: string | number | boolean means variable x has type string | number | boolean.

A double colon like in x :: string | number | boolean describes a value-typed-binding, which means that the current value of x is of type string | number | boolean. The type of the value may be any subtype of the type of the variable. You can for example conclude from the check typeof x === "string" is true that we have x :: string (the current value of x is a string).

The constraints of a simple if statement can be described in more general as

// cBefore
if (condition) {
    // c1
    ...
    // c2
}
// cAfter

Here constraint is abbreviated as c (cBefore stands for constraint before). I name our constraints in the example above like this:

function foo(x: string | number | boolean) {
  // cBefore := (x :: string | number | boolean)
  if (typeof x === "string") {
    // c1 := (x :: string)
    x = 1;
    // c2 := (x :: number)
  }
  // cAfter := (x :: number | boolean)
}

I write c := e to assign a constraint expression e to the constraint called c.

In the example, x = 1; is a side effect that causes a change on the constraint we have before. Before x = 1; we have c1 := (x :: string) and afterwards we have c2 := (x :: number).

I write c with c' for the constraint where all appearances of value-type-bindings x :: T of c are overridden with value-type-bindings x :: T' of c'. In this example c1 with c2 is (x :: string) with (x :: number) which results in (x :: number), since the old constraint value-type-binding (x :: string) in c1 is overridden by (x :: number) of t1.

I describe a side effect s as a constraint transition c -> c'. A constraint transition takes constraint c (that holds before side effect s) and gives you a new constraint c' (that holds after side effect s). Given a constraint transition t: c -> e where e is a constraint expression, I use function like notation t(cp) to express constraint c' that is equal to e where all appearances of c are replaced by cp.

Using this notation, the example from above looks like this:

// c1 := (x :: string)
x = 1; // t1: c -> c with (x :: number)
// c2 := t1(c1)

You can describe reasoning about constraints of a simple if statement in more general like this

// cBefore
if (condition) {
    // c1 := cBefore & condition
    ... // t1
    // c2 := t1(c1)
    //     = t1(cBefore & condition)
}
// cAfter := t1(cBefore & condition) | (cBefore & !condition)

I write c & condition to describe the case that condition is true. Otherwise, I write c & !condition to describe the case that condition is false. In both cases the constraint c had been given before the check. From the check of condition you can often reason about a new constraint c' that holds. If condition is true implies that c' holds, you can rewrite c & condition as c with c'. Similarly, if condition is false implies that c' holds, you can rewrite c & !condition as c with c'. If no constraint follows, you just write c instead.

In the example, condition is typeof x === "string". If condition is fulfilled the body is executed. Thereby, c1 is cBefore & condition. From fulfilled condition you know typeof x === "string" holds. From that, you can infer the constraint transition x :: string. As a result, cBefore & condition equals cBefore with (x :: string). Since cBefore is x :: string | number | boolean you get (x :: string | number | boolean) with (x :: string) that is reduced to x :: string.

After the if statement you know that

  • either condition had been fulfilled and it's body has been executed that causes side effects described by t1
    • which would result in t1(cBefore & condition) and can be reduced to x :: string (like described above)
  • or that condition had been unfulfilled
    • which would result in cBefore & !condition
      • where !condition is !(typeof x === "string") which is typeof x !== "string"
      • so you can infer (x :: string | number | boolean) with !(x :: string) which can be reduced to x :: number | boolean

From that you can conclude that cAfter is t1(cBefore & condition) | (cBefore & !condition) which is equal to (x :: string) | (x :: number | boolean) and can be reduced to x :: number | boolean.

Constraints introduced by functions

Now we have warmed up, we take a look at constraints that are introduced by functions.

(User-Defined) Type Guards

A type guard is a function whose return type is a type predicate:

function isFish(pet: Fish | Bird): pet is Fish {
    return (<Fish>pet).swim !== undefined;
}

Those type predicates are constraints that are introduced by a function (here isFish) and are already part of TypeScript:

let pet: Fish | Bird

if (isFish(pet)) {
    pet.swim();
}
else {
    pet.fly();
}

Let's add constraints as comments:

let pet: Fish | Bird
// cBefore := (pet :: Fish | Bird)
if (isFish(pet)) {
    // c1 := cBefore & isFish(pet)
    //     = (pet :: Fish | Bird) & (pet :: Fish)
    //     = (pet :: Fish)
    pet.swim();
}
else {
    // c2 := cBefore & !isFish(pet)
    //     = (pet :: Fish | Bird) & !(pet :: Fish)
    //     = (pet :: Bird)
    pet.fly();
}

Before the if statement you know that pet is Fish | Bird. If isFish(pet) is fulfilled, the type predicate pet is Fish of isFish holds for the value of our local variable pet. That means that it adds a constraint pet :: Fish to c1. Thus, it's ok to call pet.swim();. In the else branch you know that the type predicate pet is Fish is unfulfilled and you can add !(pet :: Fish) to c2, which results in pet :: Bird.

Functions in general

In a similar way to type predicates of type guards I introduce constraints that can be automatically inferred from expressions and statements. I will add constraints to the functions one after another in my modified example (see comment above) of the example behavior on locals (see first post) to answer the question is this possible? in the comment in the function doSomething:

enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
    token = Token.Alpha;
}
function maybeNextToken() {
    if (condition) {
        nextToken();
    }
}

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
          // something happens
        }
    }
}

nextToken

function nextToken() {
    token = Token.Alpha;
}

nextToken assigns Token.Alpha to the shared local variable token. Like the assignment x = 1; in the if statement example, token = Token.Alpha is a side effect that can be described as a constraint transition:

function nextToken() {
    // cBefore
    token = Token.Alpha; // t1: c -> c with (token :: Token.Alpha)
    // cAfter := t1(cBefore)
    //         = cBefore with (token :: Token.Alpha)
}

Here cBefore describes the constraints that we have before calling nextToken and cAfter describes the constraints that we have after calling nextToken. Like an assignment, a function can be described as a constraint transition. Since nextToken is doing just the same as the assignment token = Token.Alpha; it describes the same constraint transition as t1.

maybeNextToken

Let's have a look at the caller maybeNextToken:

function maybeNextToken() {
    if (condition) {
        nextToken();
    }
}

First you add the constraint transitions (here t1 for nextToken) and afterwards the constraints of the if statement:

function maybeNextToken() {
    // cBefore
    if (condition) {
        // cBefore & condition
        nextToken(); // t1: c -> c with (token :: Token.Alpha)
        // t1(cBefore & condition)
    }
    // cAfter := t1(cBefore & condition) | (cBefore & !condition)
}

Since condition is not specified in the original example you don't know its result. I assume that it is an expression without any side effect. To relax this assumption even further I consider condition is a shared variable of type boolean. Thereby, you can derive condition :: true if condition is fulfilled and condition :: false otherwise. Thereof, cAfter is

((cBefore with (condition :: true)) with (token :: Token.Alpha)) | (cBefore with (condition :: false))

Despite that simplification maybeNextToken is described as the constraint transition cBefore -> cAfter.

doSomething

function doSomething() {
    if (token !== Token.Alpha) {
        maybeNextToken();
        // is this possible?
        if (token === Token.Alpha) {
          // something happens
        }
    }
}

Like in maybeNextToken you first add constraint transitions and afterwards the constraints of the if statement:

function doSomething() {
    // cBefore
    if (token !== Token.Alpha) {
        // c1
        maybeNextToken(); // t1: c -> ((c with (condition :: true)) with (token :: Token.Alpha)) | (c with (condition :: false))
        // c2
        // is this possible?
        if (token === Token.Alpha) {
          // c3
          // something happens
          // c4
        }
        // c5
    }
    // cAfter
}

Our goal is to answer the question if it is possible that token === Token.Alpha is fulfilled in the condition of the second if statement. This is only possible if c2 contains a constraint token :: T where T is a type that contains Token.Alpha. Hence, you only look at c1 and c2:

  • c1 is cBefore & token !== Token.Alpha. Since token is not Token.Alpha it only can be Token.Beta or Token.Gamma. This results in cBefore with (token :: Token.Beta | Token.Gamma).

  • c2 is derived by replacing c in t1 with c1:

    c2 := ((c with (condition :: true)) with (token :: Token.Alpha))
          | (c with (condition :: false))
        = (((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: true)) with (token :: Token.Alpha))
          | ((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: false))

    Since the type of the value of token in cBefore is overridden by cBefore with (token :: ...) operations, you can leave cBefore out:

        = (((token :: Token.Beta | Token.Gamma) with (condition :: true)) with (token :: Token.Alpha))
          | ((token :: Token.Beta | Token.Gamma) with (condition :: false))

    I'm only interested in token to answer the question. Hence, you can remove all with operations of other variables (here condition):

        = ((token :: Token.Beta | Token.Gamma) with (token :: Token.Alpha))
          | (token :: Token.Beta | Token.Gamma)
        = (token :: Token.Alpha)
          | (token :: Token.Beta | Token.Gamma)
        = token :: Token.Alpha | Token.Beta | Token.Gamma
        = token :: Token

From c2 = token :: Token you conclude that the answer of the question is:
Yes, token might be Token.Alpha in the condition of the second/inner if statement.

@yortus

This comment has been minimized.

Copy link
Contributor

yortus commented Aug 1, 2016

@maiermic your post was pretty complex, so I may have misunderstood it, but it looks like you are describing what control flow anaylsis already does, plus extending it to 'look into' the functions being called within guarded blocks as well. If so, how is this different to the inlining approach mentioned by @RyanCavanaugh in the OP?

@maiermic

This comment has been minimized.

Copy link

maiermic commented Aug 1, 2016

@yortus That's right, I describe what control flow analysis already does to introduce the notation I use afterwards to explain my idea. However, I'd like to clarify that I don't 'look into' a function every time it is called (if that wasn't clear so far). Instead I infer the constraint transition of a function when I look at it's definition. Therefore, I have to look into the function, but I only have to do this once, since I can carry the constraint transition in the type of the function. When I look at the function call, I can use the constraint transition that I inferred before to calculate the constraint that holds after the function call from the constraint that holds before the function call.

Mitigating with (shallow) inlining / analysis

I don't know how (shallow) inlining/analysis works and @RyanCavanaugh doesn't explain it. Nevertheless, he shows two examples.

1. Example

// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
    function check1() {
        x = 'still OK';
    }

    if (x !== null) {
        check1();
        // Flow: Error, x could be null
        console.log(x.substr(0));
    }
}

Flow claims that x could be null after the call of check1. Flow doesn't look into check1. Otherwise, it would know that x has type string and cann't be null.

In my approach, the constraint transition of check1 is inferred as t: c -> (x :: string). Since we checked that x !== null is true before and the type of the variable x is string | null we have x :: string as constraint that holds before the call of check1. Hence, we pass x :: string to t to get the constraint that holds after the call of check1. t(x :: string) results in x :: string. As a consequence, x cann't be null in console.log(x.substr(0));.

Note: You can even infer x :: 'still OK' in check1 in that case. Thereby, you could even determine the type of the return value of x.substr(0) which is 's'. However, this would require further knowledge about the build-in method substr. I didn't cover such an approach in my previous post.

2. Example

// Inlining is only one level deep
function fn(x: string | null) {
    function check1() {
        check2();
    }
    function check2() {
        x = null;
    }

    if (x !== null) {
        check1();
        // Flow: No error
        console.log(x.substr(0)); // crashes
    }
}

To come straight to the point, the constraint transition

  • of check2 is t2: c -> (x :: null) and
  • of check1 is t1 = t2

Before the call of check1 we have x :: string. Afterwards we have t1(x :: string), which results in x :: null. Thus, my approach detects the error and flow doesn't.

@yahiko00

This comment has been minimized.

Copy link

yahiko00 commented Aug 1, 2016

Quite long to fully read and understand, but this seems very interesting. It makes me think of a kind of propositional logic solver like the Prolog language. Correct me if I'm wrong.
Not sure it would be easy to implement such a thing, but, this could be an evolution target for the TypeScript's Type Inference and Control Flow analysis.

@yortus

This comment has been minimized.

Copy link
Contributor

yortus commented Aug 2, 2016

@maiermic hmm yes interesting. I suppose two issues would be:

a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?

b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?

@aleksey-bykov

This comment has been minimized.

Copy link

aleksey-bykov commented Aug 2, 2016

might be related #8545 (comment)

@maiermic

This comment has been minimized.

Copy link

maiermic commented Aug 4, 2016

@aleksey-bykov #8545 (comment) looks like a similar approach for manually annotating constraints, even though you only refer to a specific case (object initializers)

@maiermic

This comment has been minimized.

Copy link

maiermic commented Aug 4, 2016

@yortus good questions

a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?

@RyanCavanaugh May you please explain performace issues of a full inlining solution?

Without thinking too much about it, I guess deep constraint analysis might be doable in linear or linearithmic time complexity. Gathered constraints are seralizable and can be cached so that you can speed up analysis time. Manuall constraint annotations can further improve analysis time.

b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?

If you don't know the code, you cann't do flow analysis. Without .d.ts declarations you couldn't even do any type analysis. You either include constraint annotaions in .d.ts files or you have to fall back to another approach. You might choose an optimistic approach if no constraint annotaions exist, since you can add constraint annotations to .d.ts if a function has a relevant side effect.

@ivogabe

This comment has been minimized.

Copy link
Contributor

ivogabe commented Sep 4, 2016

Initially I thought that the current unsound strategy would work, though in the previous months 1) narrowing works on more locations (with CFA) 2) narrowing works on more types (enums, numbers) and 3) narrowing works with more type guards. This has demonstrated several issues with the current strategy.

A solution that has been discussed was that the compiler should track where a variable might be modified through a function call. Since JavaScript is a higher-order language (functions can be passed as arguments, stored in variables and so on), it's hard/impossible to track where a function might be called. A reassignment of a variable can also happen without a function call, for instance by a getter or setter function, or by arithmetic that calls .valueOf() or .indexOf(). For a language that is not higher-order, the analysis can be done easily, since a function call will directly refer to the declaration of a function. Using the same analysis in a higher-order language will still be unsound; this will probably cause issues when an assignment happens in .forEach(() => { ... }) for instance.

My suggestion is basically that a variable may only be narrowed if its assignments can be tracked completely. A variable can only be narrowed if all reassignments (that is, assignments except the initializer) are inside the control-flow graph of the current function. That means that a variable declared with const or one that has no reassignments can be narrowed in any function. A variable with reassignments in different functions can never be narrowed. The later is restrictive; a user should copy a value of such variable into a const variable to narrow it.

To reduce this restriction, the control-flow graph of the function could be extended to an inter-procedural control flow graph. The analysis stays sound, but becomes more accurate. The graph can relatively easily be extended at call expression that directly refer to a function declaration. Assignments in functions that are not used in higher-order constructs can now be tracked too. It would be interesting to extend this to higher-order functions too, but that would be more complex. It would be necessary to annotate function definitions with the control flow of their arguments some way.

I'm not sure whether an abstraction over the control flow graph is needed. The graph could become a lot bigger with this inter-procedural analysis. However, the compiler already uses a compressed control flow graph, which basically contains only branch nodes and assignments. It might help to track which variables are referenced by a function to skip the whole function if the analyzed variable is not referenced there.

Thoughts? Let me know if something is not clear.

@felixfbecker

This comment has been minimized.

Copy link

felixfbecker commented Sep 17, 2016

How is this code possible with strictNullChecks on? I want to use strict null checks, but this control flow checking makes it impossible to write this kind of code efficiently:

let myValue: string
[1, 2, 3].forEach(x => {
  if (x == 2) {
    myValue = 'found it'
  }
})
console.log(myValue) // error: myValue used before assignment

Since the callback is not guaranteed to be executed TS sees that not all code paths assign the value. The same would be true when using a for of loop, because the array is not guaranteed to be non-empty. Even though I as a developer can guarantee for sure that [1, 2, 3] is not empty and forEach will be called, there is no way for me to tell that TS, no /* ts ignore no-use-before-assignment */ or something like that. I have to disable strict null checks completely.

@Arnavion

This comment has been minimized.

Copy link
Contributor

Arnavion commented Sep 17, 2016

@felixfbecker

Either initialize it with a sentinel:

let myValue: string = ''
// Dangerous if myValue is never actually set to a real value

Or, if you're worried about the sentinel never getting overwritten, with undefined. This more closely mimics the underlying JS but requires ! assertion on use:

let myValue: string | undefined = undefined;
...
console.log(myValue!);
// Still dangerous if myValue is never actually set to a real value, but less so since undefined is likely to fail earlier / harder

Or, have a check to narrow the type:

let myValue: string | undefined = undefined;
...
if (myValue === undefined) {
    throw new Error();
}
console.log(myValue); // Guaranteed safety
@felixfbecker

This comment has been minimized.

Copy link

felixfbecker commented Sep 17, 2016

@Arnavion thanks!

@bjouhier

This comment has been minimized.

Copy link

bjouhier commented Oct 6, 2016

@RyanCavanaugh Following up here, from #11393, with a suggestion.

I would be nice to have an indication of whether the optimistic heuristic was used or not when reporting an error. Something like Operator '===' cannot be applied to types 'false' (likely) and 'true'.

This will allow the developer to quickly figure out if the error is certain, or if it may just be the consequence of an optimistic assumption made by the compiler. It may also reduce the flow of non-bugs reported here 😄.

@basarat

This comment has been minimized.

Copy link
Contributor

basarat commented Oct 6, 2016

optimistic heuristic

Just wanted to point out that TypeScript has no concept of this at the moment 🌹

@no2chem

This comment has been minimized.

Copy link

no2chem commented Jul 14, 2018

Look, I get that there is a debate on how much CFA typescript should do. But the CFA errors are cryptic.

I just spent 20 minutes debugging this:

  run() {
        this.state = State.RUNNING;
        while (this.state === State.RUNNING) {
            const nextOp = this.code[this.pc];
            if (nextOp === undefined) {
                break;
            } 
            nextOp.operation(this);
        }

        if (this.state === State.EXCEPTION) {
            throw new Error(this.exceptionReason);
        } 
}

Which results in

Operator '===' cannot be applied to types 'State.RUNNING' and 'State.EXCEPTION'.

Presumably because CFA thinks that this.state must always === state.RUNNING. But nextOp.operation(this) can change this.state which CFA cannot deduce at compile time.

Of course, if you remove the break, all of a sudden the code works! 😖, because the CFA is no longer over-eager about determining the value of this.state. (IMO this is a bug, but it seems that there is a discussion about this instead)...

Either way, I believe most programmers would be scratching their heads wondering what is going on. It would be helpful if the error message said something to the effect of: Error: CFA predicts that this path is not possible, try refactoring your code.

@masaeedu

This comment has been minimized.

Copy link
Contributor

masaeedu commented Jul 15, 2018

I think it might be a little bit easier to reason about this sort of thing if the treatment of effects (side effects like local mutation) and "coeffects" (requirements that certain effects have occurred) was formally specified. Then it would be clearer which weird behaviors are down to PEBKAC, which ones are due to bugs in the type checker implementation, and which ones are bugs in the specification.

@donaldpipowitch

This comment has been minimized.

Copy link
Contributor

donaldpipowitch commented Nov 8, 2018

Hi there! Sorry, but I lost the overview what this issue currently tracks and if there is a discussion to fix some flow related issues?

I came here to ask this question, because I saw two unrelated people stumbled over a destructing issue related to union types and they couldn't figure out the problem. I know why this happens, but is there some discussion to maybe fix this behaviour to make some code more ergonomic?

type AorB = { type: 'a'; value: string } | { type: 'b'; value: number };

function fun({ type, value }: AorB) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
      return value;
  }
}
@Bnaya

This comment has been minimized.

Copy link

Bnaya commented Nov 8, 2018

@donaldpipowitch to solve this kind of issue, there's a need in somekind of depended type expression, that the control flow analysis & destructuring will create on the fly
I've typed to make one explicitly with conditional mapped types, but its not working:

function fun<T extends 'a' | 'b'>(type: T, value: T extends 'a' ? string : number) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
        return value.toExponential();
  }
}

function alsoNotLikeThat<T extends 'a' | 'b'>(type: T, value: typeof type extends 'a' ? string : number) {
  switch (type) {
    case 'a':
      return value.toLowerCase(); // throws: TS thinks it could be `number`
    case 'b':
        return value.toExponential();
  }
}

If would be nice if my code above would have worked

@pimterry

This comment has been minimized.

Copy link
Contributor

pimterry commented Nov 17, 2018

I see there's few examples with synchronous side effects in functions here, but lots of linked issues around how this causes big problems with async/await, and no direct discussion of that that I can see, so it's worth highlighting. Here's an example of some async code where this issue causes problems:

async function test(p: Promise<any>) {
    let x: 'a' | 'b' = 'a';

    setTimeout(() => {
        x = 'b';
    }, 500);

    await p;

    // x is inferred as 'a' here, so this isn't allowed, but it could be 'b'
    if (x === 'b') {
    }
}

(Playground link)

Note that the inference is wrong here despite explicit types of x. This issue makes it very awkward to write a function using await to wait for a side effect, which I think is a fairly common case.

Meanwhile in the equivalent promise-based code the types work totally fine:

function test(p: Promise<any>) {
    let x: 'a' | 'b' = 'a';

    setTimeout(() => {
        x = 'b';
    }, 500);

    p.then(() => {
		// x is 'a' | 'b' here, correctly
        if (x === 'b') {
        }
    });
}
@bobmoff

This comment has been minimized.

Copy link

bobmoff commented Dec 23, 2018

I just just experienced the exact problem that @pimterry explained in (#9998 (comment)) and was a bit confused as why it complained to me that this wasn't allowed

@Amareis

This comment has been minimized.

Copy link

Amareis commented Jan 14, 2019

+1 for #9998 (comment), this is really stupid, await is definitely not a pure statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment