Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhall-format removes comments #145

Open
psibi opened this issue Sep 29, 2017 · 40 comments
Open

dhall-format removes comments #145

psibi opened this issue Sep 29, 2017 · 40 comments

Comments

@psibi
Copy link
Contributor

psibi commented Sep 29, 2017

Example file:

--- This is dhall file
{ foo = "jax" }

The Output it gives:

{ foo = "jax" }
@psibi
Copy link
Contributor Author

psibi commented Sep 29, 2017

Going through the code, it seems a known issue. :) Anyway keeping this open.

@Gabriella439
Copy link
Collaborator

One thing I can do as an immediate workaround is to have the formatter preserve leading whitespace and comments at the beginning of the file. Then if you need comments for a subexpression you can at least have the workaround of splitting it out into another file

Gabriella439 added a commit that referenced this issue Sep 29, 2017
Partial fix for #145

Before this change `dhall-format` would get rid of all comments when formatting
code.  After this change `dhall-format` will preserve any leading comments and
whitespace so that users can at least add top-level comment headers to their
files
Gabriella439 added a commit that referenced this issue Sep 29, 2017
Partial fix for #145

Before this change `dhall-format` would get rid of all comments when formatting
code.  After this change `dhall-format` will preserve any leading comments and
whitespace so that users can at least add top-level comment headers to their
files
Gabriella439 added a commit that referenced this issue Sep 29, 2017
Partial fix for #145

Before this change `dhall-format` would get rid of all comments when formatting
code.  After this change `dhall-format` will preserve any leading comments and
whitespace so that users can at least add top-level comment headers to their
files
Gabriella439 added a commit that referenced this issue Sep 30, 2017
Partial fix for #145

Before this change `dhall-format` would get rid of all comments when formatting
code.  After this change `dhall-format` will preserve any leading comments and
whitespace so that users can at least add top-level comment headers to their
files
@f-f
Copy link
Member

f-f commented Nov 1, 2018

@Gabriel439 What's the main challenge to tackle in order to fix this? As far as I remember the main problem is that it'd be hard to associate comment fragments to other AST nodes, right? (as in: a comment line that comes before some code goes with the next node, but what to do in case of a comment fragment on the same line? Also how about the alignments? And does this mean that comments would slow down all operations, since it's a whole lot of packing-unpacking dummy AST comment nodes?)

@ocharles
Copy link
Member

ocharles commented Nov 1, 2018 via email

@Gabriella439
Copy link
Collaborator

There is also a third solution, which is a "low-tech" formatter that doesn't actually parse the AST but rather just scans the text and formats as it goes. I believe this is how go fmt works.

However, I think the simplest and most robust solution is the second one that @ocharles mentioned which is just to preserve parsed whitespace in the syntax tree using the Noted constructors.

That still doesn't get us all the way, though, because intelligently preserving comments when formatting code is still difficult even when your AST preserves whitespace and comments. Here are some pathological cases to consider when designing a comment-preserving formatting algorithm:

  • The formatter wants to format an expression as one line since it's less than 80 characters, but there is a line comment in the way:

    { x = 1  -- !
    , y = 2
    }
  • Same as the previous example, but now there is a multi-line comment in the way:

    { x = 1  {- !
                !
             -}
    , y = 2
    }
  • The formatter wants to reindent a value but doing so might break the indentation of a multi-line comment:

    [ 1
    , 2
    ...
    ,     99 {- !
                !
             -}
    ]
  • Same as the previous example, except that te user wrote the multi-line comment using multiple single-line comments (i.e. -- ), so the formatter doesn't know that they are supposed to be indented together:

    [ 1
    , 2
    ...
    ,     99 -- !
             -- !
    ]
  • User inserts comments preceding syntactic elements that we want to horizontally align:

    [ 1
    , 2
    ...
    {- ! -} , 99
    ]
  • User adds a comment that goes beyond the desired column limit:

    1 -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • User adds whitespace before a comment pushing it beyond the desired column limit:

    1                                                                                          -- !

@macalinao
Copy link

I'd like to have comments on record fields if possible. This doesn't work right now.

let Block
	: Type
	= { id :
		  Text
		  -- Used in radio selects, checkbox selects
	  , label :
		  Text
	  , fieldType :
		  Text
	  , options :
		  List { id : Text, label : Text }
	  }

gets nuked.

@feliksik
Copy link

feliksik commented Jul 6, 2019

If we compare different styles and techniques of formatting: one aspect of gofmt is that is preserves line breaks. So it may compact many linebraks to /n/n, but other than that, the user has some freedom on formatting. gofmt then mainly manages indentation, not breaks.

I think this is useful. The fact that some dhall maps are formatted in one line and some are not (based on line lengths) is not something I enjoy.

This may warrant a separate issue, but if you want to preserve linebreaks and want to go for AST style formatting and include comments, linebreaks would also have to be added to AST.

@Gabriella439
Copy link
Collaborator

@feliksik: The Haskell AST does preserve the original source code for all nodes, including whitespace/newlines, but it's still not clear to me if only dealing with indentation is enough.

For example, one difference between Dhall and Go is that Dhall is less whitespace-sensitive than Go is. For example, in Go, this is syntactically valid:

func main() {
	fmt.Println("Hello, world!")
}

... but this is not valid

func main()
{
	fmt.Println("Hello, world!")
}

If the latter were valid then Go would have to delete the newline in between main() and the opening brace to format the code according to the standard style

Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.

@feliksik
Copy link

feliksik commented Jul 7, 2019

@Gabriel439 thank you for your response.

Because Dhall is not as restrictive as Go, it would have to accommodate all sorts of weirdness if it took greater care to preserve the original newlines.

This is a good point! Indeed, only dealing with indentation may not be enough to make it great. If I correctly understand, you would need some rules (that are eventually arbitrary), and these would need to be a bit more sophisticated than with go.

I tried to get this straight for myself, so I can just as well share the examples I made:

This would be fine (it's current formatting):

λ(x : Text) → { a = x, b = x }

As would possibly:

λ(x : Text) →
  { a = x
  , b = x
  }

but this would probably be considered too weird:

λ(
  x : Text) →
  {
   a = x,
   b = x
  }

But you may consider such rules are undesirable (apart from the existing 80-char-line rewrite logic).

Note that go also implements such heuristic, a bit: The following is gofmt'ed, but one newline would be removed if the comment is removed:

package main

import "fmt"

func main() {
        a := map[string]string{"k1": "v1", "k2": "v2"}

        b := map[string]string{
                "k1": "v1", "k2": "v2",
        }

        c := map[string]string{
                "k1": "v1",
                "k2": "v2",
        }
        d := map[string]string{
                "k1": "v1", "k2": // note that this newline WOULD be removed if you remove this comment
                "v2",
        }
        e := map[string]string{
                "k1": "v1",

                "k2": "v2", // extra line before
        }

        fmt.Println(
                "Hello, world!", a, b,

                c,
                d, e,
        )
}

So to summarize: if I get it right, there is 2 taking home points here:

  • line rewrite heuristics can be formulated, and it's arbitrary how far you want to take this
  • the main point of this ticket (preserving comments) could possibly be implemented (and made easier?) by only respecting newlines that are preceded by a comment (--).

@Gabriella439
Copy link
Collaborator

@feliksik: Is there a document or code describing the go fmt algorithm? It would help a lot if I could crib from that when working on the dhall format equivalent

@feliksik
Copy link

feliksik commented Jul 7, 2019

Hmm I am not aware of such a document.
The code is not too long, though: https://golang.org/src/cmd/gofmt/gofmt.go . I didn't study it yet, though.

I did find http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/ , which elaborates how hard this problem can become, based on the requirements. But this aims for much more sophisticated results.

By all means, don't let this spoil the work that can be done on preserving comments in the shorter term. It will be of great value, and much more important than the newlines.

@Profpatsch
Copy link
Member

Profpatsch commented Jul 9, 2019

Personally, I do like the current dhall fmt behaviour, especially since dhall is by design a line-break and leading whitespace insensitive language.

But I think this discussion is derailing the original issue too much and should move to a different issue.

@lorenzo
Copy link

lorenzo commented Jul 9, 2019

What about starting with simple rules for preserving comments such as "only comments in their own lines are preserved"?

I think this would go a long way, at least for documenting functions and types at the top level

@Gabriella439
Copy link
Collaborator

I really liked the idea of preserving newlines that follow a comment. I'm currently trying to see how easy it would be to implement

@Gabriella439
Copy link
Collaborator

Gabriella439 commented Jul 9, 2019

Alright, here's the rough idea I have for how to preserve one trailing comment per AST node.

First, some background: every node in the AST currently keeps track of its source code, including trailing whitespace (but not leading whitespace).

So what we can do for each node is to check the trailing whitespace and behave differently under the following three conditions:

  • No trailing comment

    In this case, don't preserve the trailing whitespace for that node at all, which is the current behavior

  • Trailing comment beginning on the same line

    Preserve the comment and align it against the right-hand side of the expression (indenting/dedenting the comment as necessary if it's a multi-line comment)

    For example, this expression:

    let x =   1   {- Foo
                     Bar
                  -}
    
    
    
    in  x

    ... would be formatted as:

    let x =
          1 {- Foo
               Bar
            -}
    
    in  x
  • Trailing comment beginning on another line

    Preserve the comment and align it below the expression to the left-hand side (indent/dedenting the comment as necessary)

    For example, this expression:

    let x = 1
    
    
    {- Foo
       Bar
    -}
    
    in  x

    ... would be formatted as:

    let x =
          1
          {- Foo
             Bar
          -}
    
    in  x

We could also add special cases for preserving comments right before a let binding or a record/union key since we can preserve their interior whitespace, too.

Gabriella439 added a commit that referenced this issue Sep 2, 2019
Related to #145

Note that this also refactors `Let` to use `Binding` in order
to avoid having to duplicate `Src`-related fields in two
places.
@Gabriella439
Copy link
Collaborator

Alright, I made some progress on this here by fixing dhall format to at least preserve comments inside of a let binding (but not immediately before):

#1273

So, for example, it will preserve the comment if you do this:

let x = 1

let {- Example
       comment
    -}
    y = 1

in  x + y

... but it will not preserve the comment if you do this:

let x = 1

{- Example
   comment
-}
let y = 1

in  x + y

To provide a quick update: the general tack I suggested in my previous comment did not work. The issue is that the same trailing whitespace would be preserved by multiple nodes in the syntax tree, so it was difficult to get a preserved comment to show up exactly once (i.e. there were many errors with comments that were duplicated or missing).

However, the trick that does work (which is partially implemented in the above pull request) is just adding Src spans directly to the Expr constructors everywhere that there is whitespace within the grammar. The above pull request only implements it for Let but I can slowly add support for preserving comments to more constructors as time goes on.

For now, I'm only doing this for Let to minimize disruption to downstream libraries and I'll try to add support for a few more constructors with each release to ease the migration process for people using the Expr API. Once I've covered all constructors then we can close out this issue.

Gabriella439 added a commit that referenced this issue Sep 5, 2019
Related to #145

Note that this also refactors `Let` to use `Binding` in order
to avoid having to duplicate `Src`-related fields in two
places.
@Gabriella439
Copy link
Collaborator

@ari-becker: Another work-around that dhall lint will not remove is this:

{ field =
    let -- foo/bar is for bazzes
        result = "value"

    in  result
}

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 18, 2020

Now that #1908 is merged, the parser output also contains the comments that precede labels in records and record types. e.g. the A and B comments in these examples:

{ {- A -} x = 0, {- B -} y = 1 }
{ {- A -}
  x : T
, {- B -}
  y : U
}

However the pretty-printer still needs to be updated to output these comments. @german1608's initial work (including tests) on this is on this branch, mixed with parts of #1908 and #1926.

If there are no other volunteers, I'd take a stab at finishing the the pretty-printer update. :)

It should also be also be fairly simple to retain the comments between labels and the field value or type now, e.g.

{ x {- A -} = {- B -} 0 }

and

{ x {- A -} : {- B -} T }

@PierreR
Copy link
Contributor

PierreR commented Aug 4, 2020

@sjakobi this issue hasn't been resolved in1.34, has it ? I was just very exited believing record comments would not be removed by the linter anymore. My hope has been high on this one because some of our users have expressed their disappointment when they realize their comments were garbaged on record fields in a military way.

@german1608
Copy link
Collaborator

german1608 commented Aug 4, 2020

@PierreR dhall format doesn't preserve those comments on 1.34.0. The work to record those on the AST is done, but the formatter needs an update to not remove them.

@PierreR
Copy link
Contributor

PierreR commented Aug 4, 2020

@german1608 thanks for the info. Will look forward to it. Amazing work !

@sjakobi
Copy link
Collaborator

sjakobi commented Aug 4, 2020

@PierreR A lot of preparatory work has been done, most importantly #1465.

I hope to get to the pretty-printer next week. But I also wouldn't mind if someone else would take over.

@sjakobi
Copy link
Collaborator

sjakobi commented Sep 10, 2020

Now that #2021 has been merged, dhall format can preserve comments in several positions in record literals and record types. The expression below demonstrates all the positions where comments are now preserved:

{- Header
-}
let {- A -} letBinding
    {- B -} : {- C -} T
    = {- D -} x

let recordType =
      { {- E -}
        x
        {- F -}
        : {- G -}
          T
      }

let pun = x

let record =
      { {- H -}
        simple
        {- I -}
        = {- J -}
          z
      , {- K -}
        pun
        {- L -}
      ,   {- M -}
          dotted
          {- N -}
        . {- O -}
          fields
          {- P -}
        = {- R -}
          x
      }

in  {=}

I assume that these changes will be included in v1.35.0 (#2016).

As @TristanCacqueray has pointed out in #2021, it would be relatively easy to preserve comments in union types now. E.g.

< {- A -} X {- B -} : {- C -} Y >

Also, @german1608 has already laid the ground work for preserving comments on

  • functions: λ({- A -} a {- B -} : {- C -} T) -> e
  • function types: ∀({- A -} a {- B -} : {- C -} T) -> e
  • field selections: e . {- A -} x {- B -}

@Gabriella439
Copy link
Collaborator

@sjakobi: Yes, I plan to cut the 1.35.0 branch tonight, which will incorporate the formatting improvements

@hanshoglund
Copy link

hanshoglund commented Oct 26, 2020

EDIT: Retracting my original comment as it was badly formulated and sounded like a complaint which I didn't intend.

What I meant to say was "It would be great to get this fully resolved to facilitate greater adoption." 😄

@Gabriella439
Copy link
Collaborator

@hanshoglund: I think my main question is: which places do you still need to preserve comments? The most recent release preserves comments on let bindings and record value/type fields

@hanshoglund-da
Copy link

@Gabriel439 That is a great improvement, but to minimize negative first-user reactions I think it is still crucial to ensure that all comments accepted by the parser are also preserved by the formatter. Maybe the solution is to make the parser accept fewer comments?

I'm sure this is being worked on, just wanted to add another data point :D.

@JohannesRudolph
Copy link

I can for sure say that it has also caused some weird reactions on our team and sth. we had to explicitly educate everyone about. To the point that we had long discussions on figuring out the best "tricks" like e.g. artificial let bindings for how we can use comments to document our dhall types. This was before the most recent improvements in 1.35+ and introduction of the dhall docs (which is still very rough).

Anyhow, I'm all for a "canonical formatter" and the benefits of this hugely outweigh the cons. In particular I also appreciate that the formatter enforces e.g. comment whitespace/alignment to some degree.

I like the idea proposed by @hanshoglund-da to think about how the "write comment & format file" experience can be improved to yield less surprising results.

@Gabriella439
Copy link
Collaborator

@hanshoglund: I don't think disallowing other comments would work, mainly because that would be a highly breaking change to the standard and the standard isn't intended to track details specific to the Haskell implementation (such as the formatter).

@hanshoglund-da
Copy link

@Gabriel439 Yes, avoiding breaking changes to the standard is very reasonable indeed :D.

@PierreR
Copy link
Contributor

PierreR commented Oct 29, 2020

@Gabriel439 it is puzzling to see that the evolution of the Dhall language won't meet such a sensible request knowing how young it is.

Maybe the formatter part should also be part of the standard ?

Quoting Guy Steele:

I should not design a small language, and I should not design a large one. I need to design a language that can grow.

When you say "breaking changes" do you mean you expect this change to be too big of a burden for the language binding implementers ? If so, beeing not in charge of maintaining a languague binding for Dhall, I will respect the concern very much.

As a educated Dhall user I don't mind about this issue. On the other hand as a Dhall lover/promoter I've witnessed more than once the "negative first-user reaction".

Of course another reason "not to do this for now" is a question of ROI/priority/available resources ...

@Gabriella439
Copy link
Collaborator

@PierreR: What I meant was that disallowing comments that the Haskell formatter does not support would break a lot of users' Dhall code in the wild

@PierreR
Copy link
Contributor

PierreR commented Nov 2, 2020

What I find surprising it the fact that comments are removed when added after the term.

For instance:

let letBinding
    = x {- A -}

let record =
      { simple = z {- B -}
      }

The fact that the Dhall Language Support vscode extension is not highlighting correctly many of the suggested options by @sjakobi does not help.

@PierreR
Copy link
Contributor

PierreR commented Nov 2, 2020

FWIW another nitpick that might be worth considering: the line after the comment is indented for let while it is not so within record.
dhall-comment-highlight

@Gabriella439
Copy link
Collaborator

@PierreR: Issues with the VSCode highlighting should be reported at https://github.com/dhall-lang/vscode-language-dhall

@PierreR
Copy link
Contributor

PierreR commented Nov 2, 2020

Done here: dhall-lang/vscode-language-dhall#9. Thanks for pointing this out.

U-cauda-elongata added a commit to U-cauda-elongata/KF_pipitor-resources that referenced this issue Jan 9, 2022
Well, sort of. We left `rule` and `topic` untouched for now because they
have a bunch of comments in `List Topic` literals, which `dhall format`
removes.

Related: <dhall-lang/dhall-haskell#145>
@GandelXIV
Copy link

Pardon my ignorance, but could there be a CLI option to disable any comment manipulations when using dhall format until this bug gets fixed?

@Gabriella439
Copy link
Collaborator

There isn't a good way to disable comment manipulations using the current implementation (that we're aware of). If there were such an option we would have already implemented it because we're not trying to intentionally delete comments if we can help it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests