Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite 'dhall format' to be a Vonderhaar style pretty-printer #1496

Open
neongreen opened this issue Oct 30, 2019 · 26 comments
Open

Rewrite 'dhall format' to be a Vonderhaar style pretty-printer #1496

neongreen opened this issue Oct 30, 2019 · 26 comments
Labels

Comments

@neongreen
Copy link

@neongreen neongreen commented Oct 30, 2019

This has already been discussed a bit at #244, but I want to bring this up again.

I'd like it if dhall format preserved my intent to lay out an expression one a single line / on several lines.

I am making a small HTML templating system with Dhall – it's supposed to power monadfix.com, which is currently done with Mustache. The syntax is quite a bit more verbose than Mustache, but I can live with it if I get Dhall's benefits in return.

Both this time – and pretty much every time I tried Dhall before that – I kept wishing that dhall format would not transform multiline strings/lists into single-line strings/lists depending on how short/long they are.

Dhall-ised HTML is already putting a strain on me, so I want trees to at least stay trees when I write them as trees. Same with multiline strings – I growl every time I write a multiline string and it turns into something like \n\nLorem ipsum\n\n because it's not long enough.

@sjakobi sjakobi added the formatting label Oct 30, 2019
@Gabriel439

This comment has been minimized.

Copy link
Collaborator

@Gabriel439 Gabriel439 commented Oct 31, 2019

So I don't know about lists, but it seems like we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Oct 31, 2019

I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Oct 31, 2019

we could easily have the formatter always prefer multi-line strings if the string has at least one newline in it

How about at least one newline and at least one non-newline character? That way "\n" and "\n\n" are still formatted as a single-line strings.

Here is the relevant code BTW:

-- | > TextLit (Chunks [(t1, e1), (t2, e2)] t3) ~ "t1${e1}t2${e2}t3"
| TextLit (Chunks s a)

prettyChunks :: Pretty a => Chunks Src a -> Doc Ann
prettyChunks (Chunks a b) =
if any (\(builder, _) -> hasNewLine builder) a || hasNewLine b
then Pretty.flatAlt long short
else short
where
long =
Pretty.align
( literal ("''" <> Pretty.hardline)
<> Pretty.align
(foldMap prettyMultilineChunk a <> prettyMultilineBuilder b)
<> literal "''"
)
short =
literal "\"" <> foldMap prettyChunk a <> literal (prettyText b <> "\"")
hasNewLine = Text.any (== '\n')
prettyMultilineChunk (c, d) =
prettyMultilineBuilder c
<> dollar
<> lbrace
<> prettyExpression d
<> rbrace
prettyMultilineBuilder builder = literal (mconcat docs)
where
lazyLines = Text.splitOn "\n" (escapeSingleQuotedText builder)
docs =
Data.List.intersperse Pretty.hardline (fmap Pretty.pretty lazyLines)
prettyChunk (c, d) =
prettyText c
<> syntax "${"
<> prettyExpression d
<> syntax rbrace
prettyText t = literal (Pretty.pretty (escapeText_ t))

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Oct 31, 2019

NB:

even then that's not the way that the prettyprinter library works (that dhall-format is based on). prettyprinter just tries to format the expression in single-line form regardless of what the user originally tried to do and if the single-line form exceeds 80 columns then it uses multi-line form.

I would go as far as claim that getting rid of prettyprinter is a good thing.

The current behavior re/ line-wrapping is a bit too much of "spooky action at a distance". Here is a demonstration. Take a snippet that barely fits into 80 chars:

  λ(page : Page/Type)
   node "html"
   attrs (toMap { lang = "en" })
   content
      [   node "head"
         content
            [ node "meta"  attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ(link : Text)
                       stylesheet
                          "non-breakable line barely fitting into 80 characters"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]

Adding one more character to the non-fitting line causes many other lines to be rewrapped for no good reason:

  λ(page : Page/Type)
   node
      "html"
   attrs (toMap { lang = "en" })
   content
      [   node
            "head"
         content
            [   node
                  "meta"
               attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ ( link
                          : Text
                          )
                       stylesheet
                          "non-breakable line barely fitting into 80 characters!"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]

I, for one, would (probably!) not miss line-wrapping if it was gone, and getting rid of it would mean (I think?) that we can get by without prettyprinter.

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Oct 31, 2019

@neongreen The extra line rewrapping is due to the same bug you discovered in #1400. With my patch from quchen/prettyprinter#89, your example is formatted like this:

  λ(page : Page/Type)
   node "html"
   attrs (toMap { lang = "en" })
   content
      [   node "head"
         content
            [ node "meta"  attrs (toMap { charset = "UTF-8" })
            , Html.nodes
                ( List/map
                    Text
                    Html.Node
                    (   λ(link : Text)
                       stylesheet
                          "non-breakable line barely fitting into 80 characters!"
                    )
                    page.extraStyles
                )
            , Html.nodes page.extraHead
            ]
      , node "body"
      ]
@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Oct 31, 2019

@sjakobi oh! This is good to know.

@joneshf

This comment has been minimized.

Copy link
Collaborator

@joneshf joneshf commented Oct 31, 2019

I'll throw another vote in for dropping Wadler-style in exchange for Vonderhaar-style.

Vonderhaar-style is where a construct that is written over a single line stays a single line and a construct that is written over multiple lines stays multiple lines. It was popularized by the creator of elm-format, influenced purty, and now ormolu. It's a nice way to format computer languages nowadays.

I suspect this would be equally troublesome, just in the other direction. Don't have a good example though.

I agree 100%. Dealing with dhall format and Text literals is incredibly annoying. Changing the algorithm to prefer one way of reformatting Text literals over the other won't fix the problem.

@ocharles

This comment has been minimized.

Copy link
Member

@ocharles ocharles commented Oct 31, 2019

I'm a huge fan of the Vonderhaar style, thanks for teaching me the name of it!

@Gabriel439

This comment has been minimized.

Copy link
Collaborator

@Gabriel439 Gabriel439 commented Nov 2, 2019

I want to add another reason to possibly get rid of prettyprinter: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Nov 2, 2019

I want to add another reason to possibly get rid of prettyprinter: it's currently becoming a performance bottleneck now that we've made a lot of progress optimizing on other fronts

Oh! I thought I had seen it take quite a bit of time formatting the normalized pkg-set.dhall from cpkg but I thought that that was a rather extreme case and an unusual thing to do. Do you have "real-world" expressions that take too long to format?

@Gabriel439

This comment has been minimized.

Copy link
Collaborator

@Gabriel439 Gabriel439 commented Nov 2, 2019

@sjakobi: Here's an example:

https://github.com/Gabriel439/dhall-kubernetes-charts/blob/aed07023b032f301b03e2acc1356c0f2144f574a/stable/jenkins/index.dhall

If you add a trivial type error to that module and try to type-check it with dhall-1.27.0 it spends quite a lot of time just rendering the (truncated) context due to pretty-printing being the bottleneck. This is what motivated the following change:

#1482

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Nov 2, 2019

Wow! That's quite bad indeed. Incidentally quchen/prettyprinter#89 would speed this up a bit, maybe 15-20%.

What really impressed me was how long the first type-checking took. I didn't time it, but I'm pretty sure it was more than 10 minutes. I suspect that most of that time was spent on import resolution, possibly downloading one file of dhall-kubernetes after another?! Has there been a discussion of possible speedups there before?

@Gabriel439

This comment has been minimized.

Copy link
Collaborator

@Gabriel439 Gabriel439 commented Nov 2, 2019

@sjakobi: It depends on the specific bottleneck but if the root cause is indeed downloading too many files then one solution would be to create a globally available caching forward proxy that can pre-resolve and pre-encode imports (plus language support for importing from a binary content type). Then users could add http_proxy=https://proxy.dhall-lang.org as an environment variable to take advantage of this shared proxy

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Nov 3, 2019

@Gabriel439 I've made a new issue for this: #1511.

sjakobi added a commit that referenced this issue Nov 4, 2019
This causes text literals to be formatted as multi-line strings
whenever they contain at least one newline and at least one non-newline
character. "Spacers" like `"\n\n"` continue be formatted as single-line
strings. If the heuristic turns out to be too eager to choose a
multi-line layout, we can refine it later.

This partially addresses #1496.

Also

* update some variable names

* use 80-column "smart" layout consistently
mergify bot added a commit that referenced this issue Nov 4, 2019
This causes text literals to be formatted as multi-line strings
whenever they contain at least one newline and at least one non-newline
character. "Spacers" like `"\n\n"` continue be formatted as single-line
strings. If the heuristic turns out to be too eager to choose a
multi-line layout, we can refine it later.

This partially addresses #1496.

Also

* update some variable names

* use 80-column "smart" layout consistently
@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 3, 2019

Another example, illustrating the problem better. The formatter makes this snippet look much messier than it could be:

             content
                [   Html.node "li"
                   content
                      [ text "Language questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content
                      [ text "Ecosystem questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content [ text "Tooling support ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content
                      [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Architecture review" ]
                      ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Performance review" ]
                      ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Security review" ] ]
                ]

Original:

             content
                [   Html.node "li"
                   content [ text "Language questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content [ text "Ecosystem questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content [ text "Tooling support ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                   content [ node "strong"  content [ text "+ Architecture review" ] ]
                ,   Html.node "li"
                   content [ node "strong"  content [ text "+ Performance review" ] ]
                ,   Html.node "li"
                   content [ node "strong"  content [ text "+ Security review" ] ]
                ]

When working with moderately nested structures, formatting consistency takes a huge hit as it approaches the 80-column boundary.

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Dec 3, 2019

@neongreen That because some lines in the original are wider than 80 characters, right?!

Here's a reproducible version:

[ [ [ [ [ [   x
             content
                [   Html.node "li"
                   content
                      [ text "Language questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content
                      [ text "Ecosystem questions ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content [ text "Tooling support ", rightSideQuantity "" ]
                ,   Html.node "li"
                   content
                      [ text "Code review ", rightSideQuantity "20 hours" ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Architecture review" ]
                      ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Performance review" ]
                      ]
                ,   Html.node "li"
                   content
                      [ node "strong"  content [ text "+ Security review" ] ]
                ]
          ]
        ]
      ]
    ]
  ]
]

We could make the page width configurable if that would be useful for you. So you could use dhall format --page-width 100 to format with a page-width of 100 columns instead of 80.

BTW is that HTML DSL open source? Looks neat! :)

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 3, 2019

100 chars will make this example better at the expense of making other things worse (stuff that I want to stay multi-line will become single-line).

My point is that there's no --page-width I could choose that wouldn't lead to inconsistent formatting.

The HTML DSL is here but it needs a renderer: https://gist.github.com/neongreen/9b0845789c62301e0cc978019556cbb2. I want to release it at some point but currently don't have time for it.

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 3, 2019

(Node could be a recursive type but I cheated and used JSON instead because type checking was slow.)

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Dec 4, 2019

@neongreen Regarding rendering: In dhall-lang/dhall-lang#200 there's some discussion toward a generic pretty-printing library that could be useful for your renderer. Not sure how usable @matheus23's dhall-blocks is today…

(Node could be a recursive type but I cheated and used JSON instead because type checking was slow.)

I'm very interested in improving dhall's performance – so please do report such issues if you can! :)

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Dec 4, 2019

Regarding the apparent demand for a Vonderhaar style pretty-printer: I agree that it could do a better job at formatting source code than the current pretty-printer. But I suspect that it might be difficult to get it to format generated code, such as the output of dhall type, well. That means we might end up with two pretty-printers, and I worry that that would take too much effort to maintain.

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Dec 4, 2019

I should clarify: Building a Vonderhaar-style formatter for Dhall might still be a worthwhile effort. But if it should replace the existing pretty-printer, it needs to work well for generated code too. So maybe it makes sense to develop it as a separate tool.

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 4, 2019

Good objection.

Do you have examples of generated code it has to work with?

Something as simple as a flag to "always use multi-line formatting for records and strings with newlines" might do the trick, but I don't know.

@sjakobi

This comment has been minimized.

Copy link
Collaborator

@sjakobi sjakobi commented Dec 4, 2019

Do you have examples of generated code it has to work with?

AFAIK nearly all code in https://github.com/dhall-lang/dhall-kubernetes is generated.

@neongreen neongreen changed the title 'dhall format' could preserve single-line/multi-line layout Rewrite 'dhall format' to be a Vonderhaar style pretty-printer Dec 4, 2019
@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 4, 2019

Regarding the apparent demand for a Vonderhaar style pretty-printer

Changed the issue title to make the demand more explicit 🙂

There are many small problems with dhall format that could be solved without rewriting it, but making it a Vonderhaar style pretty-printer solves a problem that is hard to describe without literally saying "I want a Vonderhaar style pretty-printer". Something along the lines of: I want similar pieces of code to be formatted consistently, except that I can't precisely define 'similar', and also sometimes I want to opt out anyway.

@Gabriel439

This comment has been minimized.

Copy link
Collaborator

@Gabriel439 Gabriel439 commented Dec 4, 2019

The concern about generated code also applies to normalized source code, too, since the normal form might not correspond one-to-one with the original source code.

@neongreen

This comment has been minimized.

Copy link
Author

@neongreen neongreen commented Dec 4, 2019

Looking at dhall-kubernetes, I think even a Vonderhaar style pretty-printer will work nicely there – they already generate well-formatted code, it just needs parentheses to be stripped.

I don't have a good answer for normalized code, though. Counting columns might indeed be the best approach there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.