Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse whitespace more precisely #1483

Merged
merged 1 commit into from
Nov 3, 2019
Merged

Conversation

sjakobi
Copy link
Collaborator

@sjakobi sjakobi commented Oct 28, 2019

This is preparatory work for #1454.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Oct 28, 2019

So far, the parser is still quite broken. For example:

      ./dhall-lang/tests/parser/success/unit/MergeXYZ:                        FAIL
        Exception: 
        Error: Invalid input
        
        1:9:
          |
        1 | merge x y z
          |         ^^
        unexpected "y "
        expecting operator or whitespace
      ./dhall-lang/tests/parser/success/unit/MergeParenAnnotation:            FAIL
        Exception: 
        Error: Invalid input
        
        1:10:
          |
        1 | (merge x y) : t
          |          ^^
        unexpected "y)"
        expecting operator or whitespace
      ./dhall-lang/tests/parser/success/unit/Merge:                           FAIL
        Exception: 
        Error: Invalid input
        
        1:9:
          |
        1 | merge x y
          |         ^^
        unexpected "y<newline>"
        expecting operator or whitespace
      ./dhall-lang/tests/parser/success/unit/ListLitEmptyPrecedence:          FAIL
        Exception: 
        Error: Invalid input
        
        1:11:
          |
        1 | [] : List T U
          |           ^
        unexpected 'T'
        expecting ->, :, end of input, operator, or whitespace
      ./dhall-lang/tests/parser/success/unit/ListLitNonEmptyAnnotated:        FAIL
        Exception: 
        Error: Invalid input
        
        1:15:
          |
        1 | [x, y] : List T
          |               ^
        unexpected 'T'
        expecting ->, :, end of input, operator, or whitespace

@Gabriella439
Copy link
Collaborator

@sjakobi: I pushed a change with some fixes that illustrates what went wrong. Many of the remaining fixes are due to the import parser needing to be fixed to not consume trailing whitespace

@sjakobi
Copy link
Collaborator Author

sjakobi commented Oct 31, 2019

Thanks a lot @Gabriel439! :)

This is the last parser test failure:

      ./dhall-lang/tests/parser/success/largeExpression:                                     FAIL
        Exception: 
        Error: Invalid input
        
        67:3:
           |
        67 |   (   λ ( x
           |   ^
        unexpected '('
        expecting end of input or whitespace

There are a bunch of other test failures remaining though.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Oct 31, 2019

CI should turn green now. I'll do some cleanup.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Oct 31, 2019

A few test failures in dhall-lsp-server:

  Completion
    Dhall.Completion
      suggests user defined types:     FAIL (0.03s)
        expected: "Config"
         but got: "toMap"
      suggests user defined functions: FAIL (0.03s)
        expected: "makeUser"
         but got: "toMap"
      suggests user defined bindings:  FAIL (0.03s)
        expected: "bob"
         but got: "toMap"
      suggests functions from imports: FAIL (0.03s)
        uncaught exception: ErrorCall
        Prelude.head: empty list
  Hovering
    Dhall.Hover
      reports types on hover:          FAIL (0.03s)
        expected: "{ home : Text, name : Text }"
         but got: "Type"

_arrow
whitespace
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should add aliases whsp and whsp1 to reduce the noise a bit.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 1, 2019

@EggBaconAndSpam Do you have some advice on how to best update dhall-lsp-server? Should I attempt to fix the parsers in Dhall.LSP.Backend.Parsing in analogy to the changes in Dhall.Parser.Expression or could I potentially completely get rid of them now?

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 1, 2019

It took me way too long to figure out why my changes to dhall-lsp-server wouldn't change the test results from stack test dhall-lsp-server:tests:

  1. dhall-lsp-server:tests is an integration test suite that depends on the executable dhall-lsp-server.

  2. While stack happily reports

    Installing executable dhall-lsp-server in <path>

    every time I change the library, it will only re-compile the executable if it's one of the targets!

So you have to include the executable in the targets and run

stack test dhall-lsp-server:tests dhall-lsp-server:dhall-lsp-server

😱 😱 😱

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 1, 2019

I've modified dhall-lsp-server's just enough now that the tests pass. dhall-lsp-server is very likely still broken for other inputs though.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 2, 2019

While looking at dhall-lsp-server's tests I was surprised by this expectation:

getValue functionContent `shouldBe` "{ home : Text, name : Text }"

My understanding is that this should be the type of mkUser here:

let mkUser =
λ(_isAdmin : Bool)
if _isAdmin
then { name = "admin", home = "/home/admin" }
else { name = "default", home = "/home/user" }

Shouldn't the type be

Bool -> { home : Text, name : Text }

then?

(CC @mujx)

@Gabriella439
Copy link
Collaborator

@sjakobi: It's a bug in dhall-lsp-server. I can reproduce:

Screen Shot 2019-11-02 at 11 44 48 AM

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 3, 2019

@Gabriel439 I have made a new issue for this: #1510

Would you mind giving this a final review, so we can get this merged?

parseDirectory </> "failure/unit/ImportEnvWrongEscape.dhall"

-- Other spacing related unexpected successes:
, parseDirectory </> "failure/spacing/AnnotationNoSpace.dhall"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that these are finally fixed! 🙂

This is preparatory work for #1454.

This also fixes some cases where dhall would previously accept
malformatted inputs.

The changes to dhall-lsp-server are mostly untested. See #1510.

Co-authored-by: Gabriel Gonzalez <Gabriel439@gmail.com>
@sjakobi sjakobi changed the title WIP: Parse whitespace more precisely Parse whitespace more precisely Nov 3, 2019
@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 3, 2019

I remembered that we should look at the performance impact:

I've benchmarked dhall resolve --immediate-dependencies for cpkg's pkg-set.dhall:

master

$ bench "dhall resolve --immediate-dependencies --file pkgs/pkg-set.dhall"
benchmarking dhall resolve --immediate-dependencies --file pkgs/pkg-set.dhall
time                 154.7 ms   (151.9 ms .. 156.1 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 155.9 ms   (154.9 ms .. 156.7 ms)
std dev              1.217 ms   (701.7 μs .. 1.891 ms)
variance introduced by outliers: 12% (moderately inflated)

This branch

$ bench "dhall resolve --immediate-dependencies --file pkgs/pkg-set.dhall"
benchmarking dhall resolve --immediate-dependencies --file pkgs/pkg-set.dhall
time                 232.9 ms   (226.7 ms .. 246.0 ms)
                     0.999 R²   (0.993 R² .. 1.000 R²)
mean                 228.1 ms   (225.2 ms .. 233.7 ms)
std dev              4.848 ms   (1.233 ms .. 6.609 ms)
variance introduced by outliers: 14% (moderately inflated)

This is an increase of 50.5%!

In the micro-benchmarks I've noticed particular increases for the comment parsing:

, benchExprFromText "Line comment" ("x -- " <> T.replicate 1000000 " ")
, benchExprFromText "Block comment" ("x {- " <> T.replicate 1000000 " " <> "-}")

master

benchmarked Line comment
time                 11.86 ms   (11.69 ms .. 11.98 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 11.84 ms   (11.79 ms .. 11.89 ms)
std dev              129.4 μs   (107.2 μs .. 164.1 μs)

benchmarked Block comment
time                 13.20 ms   (13.00 ms .. 13.41 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 13.59 ms   (13.41 ms .. 13.94 ms)
std dev              600.0 μs   (142.2 μs .. 953.7 μs)
variance introduced by outliers: 15% (moderately inflated)

This branch

benchmarked Line comment
time                 228.6 ms   (225.6 ms .. 231.1 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 230.3 ms   (228.9 ms .. 233.8 ms)
std dev              3.631 ms   (1.668 ms .. 5.820 ms)

benchmarking Block comment ... took 14.31 s, total 56 iterations
benchmarked Block comment
time                 260.9 ms   (234.1 ms .. 298.2 ms)
                     0.976 R²   (0.940 R² .. 0.997 R²)
mean                 255.3 ms   (243.2 ms .. 273.0 ms)
std dev              25.54 ms   (17.55 ms .. 35.68 ms)
variance introduced by outliers: 29% (moderately inflated)

This got about 20x slower!

@Gabriella439
Copy link
Collaborator

Gabriella439 commented Nov 3, 2019

@sjakobi: The reason it is slower is because of backtracking. After it parses the x there are actually several candidate whitespace parsers guarded by try (I'm guessing roughly 20 of them!) so it has to try all 20 to know which one to commit to before it gives up and tries the final whitespace after completeExpression

This can be fixed by left-factoring the parsers so that they no longer need to wrap the whitespaces in trys, although it will be a little tricky to do so since there isn't really a systematic way to do so.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Nov 3, 2019

Right. Is this good to merge anyways or do you want to try left-factoring it first? I suspect that I won't be of much help with the parser performance. :/

@Gabriella439
Copy link
Collaborator

@sjakobi: You can go ahead and merge. I can work on the parsing performance

@mergify mergify bot merged commit 7eec31d into master Nov 3, 2019
@mergify mergify bot deleted the sjakobi/precise-whitespace branch November 3, 2019 19:43
Gabriella439 added a commit that referenced this pull request Nov 4, 2019
This undoes some of the performance regression introduced
in #1483

Before #1483:

```
benchmarked Line comment
time                 11.86 ms   (11.69 ms .. 11.98 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 11.84 ms   (11.79 ms .. 11.89 ms)
std dev              129.4 μs   (107.2 μs .. 164.1 μs)

benchmarked Block comment
time                 13.20 ms   (13.00 ms .. 13.41 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 13.59 ms   (13.41 ms .. 13.94 ms)
std dev              600.0 μs   (142.2 μs .. 953.7 μs)
```

After #1483:

```
benchmarked Line comment
time                 288.7 ms   (282.8 ms .. 294.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 292.3 ms   (290.8 ms .. 294.6 ms)
std dev              3.156 ms   (2.216 ms .. 4.546 ms)

benchmarked Block comment
time                 286.2 ms   (280.9 ms .. 292.6 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 290.6 ms   (288.3 ms .. 292.9 ms)
std dev              3.875 ms   (2.866 ms .. 5.500 ms)
```

After this change:

```
benchmarked Line comment
time                 61.44 ms   (60.37 ms .. 63.03 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 61.41 ms   (60.74 ms .. 62.25 ms)
std dev              1.341 ms   (945.0 μs .. 1.901 ms)

benchmarked Block comment
time                 61.83 ms   (60.97 ms .. 63.14 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 61.16 ms   (60.33 ms .. 61.85 ms)
std dev              1.396 ms   (1.011 ms .. 1.907 ms)
```
mergify bot pushed a commit that referenced this pull request Nov 4, 2019
* Partially fix whitespace parsing performance regression

This undoes some of the performance regression introduced
in #1483

Before #1483:

```
benchmarked Line comment
time                 11.86 ms   (11.69 ms .. 11.98 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 11.84 ms   (11.79 ms .. 11.89 ms)
std dev              129.4 μs   (107.2 μs .. 164.1 μs)

benchmarked Block comment
time                 13.20 ms   (13.00 ms .. 13.41 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 13.59 ms   (13.41 ms .. 13.94 ms)
std dev              600.0 μs   (142.2 μs .. 953.7 μs)
```

After #1483:

```
benchmarked Line comment
time                 288.7 ms   (282.8 ms .. 294.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 292.3 ms   (290.8 ms .. 294.6 ms)
std dev              3.156 ms   (2.216 ms .. 4.546 ms)

benchmarked Block comment
time                 286.2 ms   (280.9 ms .. 292.6 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 290.6 ms   (288.3 ms .. 292.9 ms)
std dev              3.875 ms   (2.866 ms .. 5.500 ms)
```

After this change:

```
benchmarked Line comment
time                 61.44 ms   (60.37 ms .. 63.03 ms)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 61.41 ms   (60.74 ms .. 62.25 ms)
std dev              1.341 ms   (945.0 μs .. 1.901 ms)

benchmarked Block comment
time                 61.83 ms   (60.97 ms .. 63.14 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 61.16 ms   (60.33 ms .. 61.85 ms)
std dev              1.396 ms   (1.011 ms .. 1.907 ms)
```

* Correctly parse `https://example.com usingBla`

... as caught by @sjakobi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants