Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite pattern matcher #4940

Closed

Conversation

@Simn
Copy link
Member

commented Mar 14, 2016

I've been working on this for quite a while now and would like to get it into the 3.3 release. There are some things to mention about this though:

Pattern parsing

This is largely the same as before with the two main differences being that

  1. we better utilize expression typing instead of doing things on our own when it comes to plain identifiers and
  2. we do not unify against the expected type as much as before, which might lead to slightly different (but more consistent) unification behavior.

Match compilation

This part is much dumber now and doesn't try to figure out compatible types and whatnot, which has been delegated to a later stage. It now really only translates the patterns into a decision tree and expands where necessary.

Extractors

I have completely revised how these are dealt with. Instead of awkwardly breaking up pattern matching which lead to a multitude of problems, extractors now create additional columns in the pattern matrix. You can picture it like this:

switch (subject) {
    case pattern1: result1;
    case _.expr => pattern2: result2;
    case pattern3: result3;

Becomes:

switch [subject, subject.expr] {
    case [pattern1, _]: result1;
    case [_, pattern2]: result2;
    case [pattern3, _]: result3;
}

As a consequence extractors may now be evaluated more eagerly at the benefit of avoiding exponential code-size.

Exhaustiveness checks

These are now done when transformation the decision tree to a typed expression. This is good because at that point we know the exact type for every switch subject and can filter incompatible constructors in the GADT case.

Uselessness checks

Oh boy, this is a big one. After a lot of trying and reading I finally decided to bite the bullet and implement this as a separate operation. I closely followed the OCaml implementation which was a bit of a challenge to get right and adapt to our situation.

My tests so far are encouraging, but I decided to turn unused-pattern errors into warnings for now. I might also have to measure the performance of this because we now essentially do duplicate work on the pattern matrix. However, given the nature of these checks I don't think that could be avoided.

Diffs

As much as I'd like to get some diffs for this, there's a big case ordering difference coming from the fact that I changed an internal data structure from a PMap to a Hashtbl. That's a bit unfortunate, but I don't think there's much I could do about it.

Please test your codebases against this PR/branch and tell me if you notice any behavior that is different from before!

closes #2508
closes #3621
closes #3816
closes #3937
closes #4247
closes #4677
closes #4689
closes #4745
closes #4846
closes #4907
@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Ooh, that's huge. I'll check it right away. Too bad case ordering will be messed up. I wonder if there can be any performance issues with that (e.g. someone matched shorter strings before longer etc.)

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Got a regression with detecting common base type:

enum Kind {
    KA;
    KB;
}

class Base {
    public function new() {}
}

class A extends Base {}
class B extends Base {}

class Main {
    static function main() {
        var kind = KA;
        var base = switch (kind) {
            case KA: new A();
            case KB: new B();
        }
    }
}

Main.hx:18: characters 12-16 : B should be A

DocType = Type.createEnum(XmlType,"__");
ProcessingInstruction = Type.createEnum(XmlType,"__");
Document = Type.createEnum(XmlType,"__");
PCData = Type.createEnum(cast XmlType,"__");

This comment has been minimized.

Copy link
@nadako

nadako Mar 14, 2016

Member

Wow, how does that work? O_o

This comment has been minimized.

Copy link
@Simn

Simn Mar 14, 2016

Author Member

It probably doesn't, I wanted to ask @hughsando about what I'm supposed to do with this. This empty XmlType enum is pretty terrible to handle in general.

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

Should be fixed!

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Compiles! It's quite different to check logs because of different ordering, but it mostly looks just ordering.

Here's one interesting change tho:

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

I still have some concerns about reordering, for example this one:

Checking for 0 first could be intended here, for example if it's the most common case.

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

Well, the previous matcher didn't retain the case ordering from syntax either, did it?

Also you shouldn't be thinking in terms of what is matched first there, that's not how switches work in practice. Then again, it might have to be considered for String switches because these turn to if-else chains on several targets.

Regarding the other diff: Were all these cases in the same case in the original?

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Well, the previous matcher didn't retain the case ordering from syntax either, did it?

Not in general, but often :)

Also you shouldn't be thinking in terms of what is matched first there, that's not how switches work in practice.

I'm not, actually, but someone might be. It's a bit hard to explain why haxe messes up ordering even for straightforward switches, but it's not a big deal, I guess.

Then again, it might have to be considered for String switches because these turn to if-else chains on several targets.

You mean ordering them alphabetically or retaining order?

Regarding the other diff: Were all these cases in the same case in the original?

Yep, it was an OR case.

@back2dos

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Very interesting! Could you elaborate on this:

we do not unify against the expected type as much as before

Does that introduce type errors in switch statements that previously were able to match?

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Anyway, looks great, I love how single case switches are turned into if(/else)s!
Now I'll try finding and testing a much older (and uglier) codebase that has actual unit tests :)

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Oh, HaxeQuake doesn't compile :(

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Okay, I fixed most cases in HaxeQuake (nadako/HaxeQuake@32893ce). I guess it was working before because variables were monomorphs and weren't bound to an enum abstract.

However there's one weird case left that we already encountered before, @Simn could you take a look at the latest HaxeQuake version?

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Thanks to @markknol, now we know that nekovm.org doesn't build with this branch. Something in tink_core, it seems:

F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:105: characters 4-8 : tink.core.Outcome<withSameError.Out, withSameError.In -> tink.core.Outcome<withSameError.Out, withSameError.Error>> should be tink.core.Outcome<withSameError.Out, withSameError.Error>
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:105: characters 4-8 : Type parameters are invariant
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:105: characters 4-8 : withSameError.In -> tink.core.Outcome<withSameError.Out, withSameError.Error> should be withSameError.Error
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:105: characters 4-8 : For function argument 'f'
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:102: lines 102-107 : tink.core._Outcome.OutcomeMapper<withSameError.In, withSameError.In -> tink.core.Outcome<withSameError.Out, withSameError.Error>, withSameError.Out, withSameError.Error> should be tink.core.Outcome<withSameError.Out, withSameError.Error>
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:111: lines 111-120 : tink.core._Outcome.OutcomeMapper<withEitherError.DIn, withEitherError.DIn -> tink.core.Outcome<withEitherError.DOut, withEitherError.FOut>, withEitherError.DOut, tink.core.Either<withEitherError.DIn -> tink.core.Outcome<withEitherError.DOut, withEitherError.FOut>, withEitherError.FOut>> should be tink.core._Outcome.OutcomeMapper<withEitherError.DIn, withEitherError.FIn, withEitherError.DOut, tink.core.Either<withEitherError.FIn, withEitherError.FOut>>
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:111: lines 111-120 : Type parameters are invariant
F:\Code\haxe\lib\tink_core/1,0,0-rc,11/src/tink/core/Outcome.hx:111: lines 111-120 : withEitherError.DIn -> tink.core.Outcome<withEitherError.DOut, withEitherError.FOut> should be withEitherError.FIn
F:\Code\haxe\lib\ufront-mvc/1,1,0/src/ufront/web/Controller.hx:156: characters 2-11 : Build failure
@Gama11

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

I'm getting some compiler errors in a project using hxargs with this:

C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:76: characters 60-68 : { expr : haxe.macro.ExprDef } has no field pos
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : Array<{ values : Array<{ expr : haxe.macro.ExprDef }>, guard : Null<Null<haxe.macro.Expr>>, expr : haxe.macro.Expr }> should be Array<haxe.macro.Case>
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : Type parameters are invariant
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : { values : Array<{ expr : haxe.macro.ExprDef }>, guard : Null<Null<haxe.macro.Expr>>, expr : haxe.macro.Expr } should be haxe.macro.Case
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : { values : Array<{ expr : haxe.macro.ExprDef }>, guard : Null<Null<haxe.macro.Expr>>, expr : haxe.macro.Expr } should be { values : Array<haxe.macro.Expr>, ?guard : Null<Null<haxe.macro.Expr>>, expr : Null<haxe.macro.Expr> }
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : Invalid type for field values :
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : { expr : haxe.macro.ExprDef } should be haxe.macro.Expr
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : { expr : haxe.macro.ExprDef } should be { pos : haxe.macro.Position, expr : haxe.macro.ExprDef }
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : Inconsistent setter for field expr : null should be default
C:\HaxeToolkit\haxe\lib\hxargs/3,0,0/hxargs/Args.hx:135: characters 42-47 : For function argument 'cases'
Build halted with errors.
@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

However there's one weird case left that we already encountered before, @Simn could you take a look at the latest HaxeQuake version?

I knew I should have added a test for that... gonna fix it again.

Does that introduce type errors in switch statements that previously were able to match?

It influences inference in some cases, for example:

I'm getting some compiler errors in a project using hxargs with this:

This one. I have already fixed this on hxargs git. The problem comes from this piece of code:

import haxe.macro.Expr;

class Main {
    static function main() { }

    static function infer(cmds) {
        cmds = switch (cmds.expr) {
            case EMeta(_, e1): e1;
            case _: cmds;
        }
        $type(cmds); // { expr : haxe.macro.ExprDef }
    }
}

This now uses unify_min on the cases which means it finds a common sub-type, which here is {expr: ExprDef}. The result of this used to depend on the case order which I think is worse. The current behavior is also consistent with how if-else would work.

Something in tink_core, it seems:

I would appreciate if that could be reduced to a small example. It does look a bit scary...

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

I would appreciate if that could be reduced to a small example. It does look a bit scary...

Well, this is somewhat reduced, but still mind-melting.
It's funny that:

  • if you rename the fun argument from f to e.g. f2, error goes away
  • if you rename captured var of Failure(f) to e.g. f2, error goes away as well
  • (i also had some case where type params became Unknowns in the error message, but I can't reproduce it now)
enum Outcome<Data, Failure> {
    Success(data:Data);
    Failure(failure:Failure);
}

abstract OutcomeMapper<DIn, FIn, DOut, FOut>(Outcome<DIn, FIn>->Outcome<DOut, FOut>) {
    function new(f) this = f;

    static function fun<In, Out, Error>(f:In->Outcome<Out, Error>):OutcomeMapper<In, Error, Out, Error> {
        return new OutcomeMapper(function (o)
            return switch o {
                case Success(d): f(d);
                case Failure(f): Failure(f);
            }
        );
    }
}
Main.hx:13: characters 16-20 : Outcome<fun.Out, fun.In -> Outcome<fun.Out, fun.Error>> should be Outcome<fun.Out, fun.Error>
Main.hx:13: characters 16-20 : Type parameters are invariant
Main.hx:13: characters 16-20 : fun.In -> Outcome<fun.Out, fun.Error> should be fun.Error
Main.hx:13: characters 16-20 : For function argument 'f'
Main.hx:10: lines 10-15 : OutcomeMapper<fun.In, fun.In -> Outcome<fun.Out, fun.Error>, fun.Out, fun.Error> should be Outcome<fun.Out, fun.Error>
@Gama11

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Ah, should've checked the hxargs repo. :)

This also causes a number of compiler errors in OpenFL on non-flash targets (all the same type of issue). Here's a somewhat reduced example:

@:enum abstract Endian(Int) {
    public var BIG_ENDIAN = 0;

    @:to private static function toString(value:Int):String {
        return switch (value) {
            case Endian.BIG_ENDIAN: "bigEndian";
            default: null;
        }
    }
}
Main.hx:14: characters 8-25 : Endian should be Int
@Gama11

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

There's also this issue in flixel-ui:

package;

class Main {
    static function main() {}

    var ID:Int;

    function sortWidgets(method:SortMethod):Void {
        switch (method) {
            case XY:
            case ID:
        }
    }
}

enum SortMethod {
    XY;
    ID;
}
Main.hx:10: characters 8-10 : Capture variables must be lower-case
@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

The OpenFL issue is interesting because it's technically correct: It matches Int against Endian and the types are not compatible. But yes I don't think I want to break that in a dot release.

The other issues all seem to have come from the same resolution order problem that HaxeQuake had. Try again now!

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

haxequake and nekovm.org compile now!

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

Actually I'm not really sure how to avoid that OpenFL issue...

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2016

I have added a hack for that problem for now. Not sure yet where I want to go with that.

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 14, 2016

Maybe emit a warning about that? This seems really wrong.

Well maybe not that obviously wrong when you use it INSIDE abstract method, but still confusing.

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

BTW, HaxeQuake compiled JS seems to be bigger by 7KB. I'm not sure whether this is a reason, but here's some more case separation:

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

No, this is totally the reason it got bigger:

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

Right, we group cases by comparing them to the next case and with the case order essentially being random now this could easily happen. I'll see what I can do about this.

Simn added 2 commits Mar 15, 2016
@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

Try again please, cases are now grouped across the list. I've also added case sorting so we don't get so many random diff changes when adding a case somewhere.

@skial skial referenced this pull request Mar 15, 2016
@Gama11

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

OpenFL and Flixel now compile! There's some interesting "unused pattern" warnings in OpenFL's Keyboard.hx leftover though - here's one of them reduced:

package;

class Main {
    static function main() {}
}

class Keyboard {
    public static inline var QUOTE = 222;
    public static inline var RIGHT = 39;

    private static function convertKeyCode(key:KeyCode):Int {
        return switch (key) {
            case SINGLE_QUOTE: Keyboard.QUOTE;
            case RIGHT: Keyboard.RIGHT;
            default: cast key;
        }
    }
}

@:enum abstract KeyCode(Int) from Int to Int {
    var SINGLE_QUOTE = 0x27;
    var RIGHT = 0x4000004F;
}
Main.hx:14: characters 8-13 : Warning : This pattern is unused
@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

That seems correct to me. Resolution order picks up the inline var RIGHT = 39, which happens to be the same value as 0x27 which is already matched by the first pattern.

I can see that this used to be different though. This is a bit concerning because it could introduce some hard to detect regressions. On the other hand the new behavior is in line with the documented resolution order:

6. If the current class has a static field named i, resolve to it and halt.
7. If an enum constructor named i is declared on an imported enum, resolve to it and halt.

Any opinions?

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

I think current resolution order makes sense when we're resolving some random identifier, but here we dealing with the known KeyCode type, so I think it should be modified to "if enum(abstract) value is expected, first check its constructor names".

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

Try again please, cases are now grouped across the list. I've also added case sorting so we don't get so many random diff changes when adding a case somewhere.

Nice, no duplication anymore! One little thing is that they seem to be grouped first and then sorted, so inside a group, the order is still random, I guess.

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

Good point, we sort within the cases as well now.

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

Regarding the resolution order, it would be pretty weird if the following didn't match:

@:enum abstract E(Int) from Int to Int {
    var C = 2;
}

class Main {
    static inline var C = 1;

    static function main() {
        var x:E = C;
        switch (x) {
            case C: trace("ok");
        }
    }
}

Prioritizing the constructor in the pattern would resolve it to 2, whereas the value is resolved to the inline variable and thus 1.

@nadako

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

Meh, this is a weird ambigous case... I don't know then.

@hughsando

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

As for the Xml.Element, I would not at all be sad for that to go.
The @:enum abstract XmlType(Int) sounds like a much better way to handle it.

@hughsando

This comment has been minimized.

Copy link
Member

commented Mar 15, 2016

I put the abstract enum in, and it seemed to work.
This file is a strange mix of native an inefficient code. I think I need to rewrite it using some of the newer extern stuff, with an eye on very low GC memory usage, which can otherwise be a problem with big xml files.

Conflicts:
	std/cpp/NativeXml.hx
@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2016

Ah great, thanks!

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 16, 2016

@jgranick: Could you take a look at the OpenFL problem and tell me what you think about that? I don't want to willfully break OpenFL but to me it really does look like this relied on a bad resolution order.

@ncannasse: Do you have an opinion on that? Also, can you test your codebase with this branch?

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2016

If there are no further complaints I'll go ahead and merge this later today.

@Simn

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2016

Done!

@Simn Simn closed this Mar 17, 2016
@Simn Simn deleted the Simn:make_pattern_matching_great_again branch Mar 17, 2016
Gama11 added a commit to Gama11/haxe-checkstyle that referenced this pull request Mar 17, 2016
See HaxeFoundation/haxe#4940 (comment) - need a version of hxargs which has the fix.
Also made the buildTelemetry.hxml consistent with build.hxml (no more haxelib install commands).
Gama11 added a commit to openfl/openfl that referenced this pull request Mar 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.