Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in formatting tool #309

Closed
sferik opened this issue Dec 9, 2014 · 21 comments
Closed

Built-in formatting tool #309

sferik opened this issue Dec 9, 2014 · 21 comments

Comments

@sferik
Copy link
Contributor

sferik commented Dec 9, 2014

Following the discussion in #308, I realized it would be nice if Crystal had a built-in formatting command that would apply some basic style to the code (two-space indentation, etc.)

The Go language handles this quite nicely. Quoting from Effective Go:

Formatting issues are the most contentious but the least consequential. People can adapt to different formatting styles but it's better if they don't have to, and less time is devoted to the topic if everyone adheres to the same style. The problem is how to approach this Utopia without a long prescriptive style guide.

With Go we take an unusual approach and let the machine take care of most formatting issues. The gofmt program (also available as go fmt, which operates at the package level rather than source file level) reads a Go program and emits the source in a standard style of indentation and vertical alignment, retaining and if necessary reformatting comments.

@sferik sferik changed the title Built-in formatter tool Built-in formatting tool Dec 9, 2014
@asterite
Copy link
Member

asterite commented Dec 9, 2014

I think this would be nice to have. Many people are obsesses with correctly formatted syntax... including me :-P

More cases that come to my mind are:

  • No spaces between operators: 1+2*3.
  • Extra newlines between definitions:
class Foo

  def initialize
  end


  def bar
  end

end

I once wrote a D plugin for Eclispe that copied most of JDT's code. I had to adapt the Java formatter to D's language. There I learned how they do it. First they parse the file and get an AST from it. Then they create a lexer for that same file. Then they traverse the AST and lex the source at the same time, while applying formatting rules. That way you can deal with each token while knowing where you are in the AST. Very smart.

We could do the same, but right now the parser will create nodes that don't exactly reflect what was written. For example when you write this:

foo.try &.bar

The parser will actually emit something like this:

foo.try { |_arg0| _arg0.bar }

We would have to define AST nodes for those cases and later unsugar them, like we do with other things (case, &&, ||).

@vendethiel
Copy link
Contributor

@asterite Isn't it easier with a CST rather than a AST?

@asterite asterite mentioned this issue May 3, 2015
@sdogruyol
Copy link
Member

@asterite is there any improvement on this 🙏

I really like crystalfmt 👍 to be there 😄

@asterite
Copy link
Member

No progress yet. A formatter is something that will come after 1.0, unless someone else wants to tackle this. But it's not something trivial to do, you would need to extend the Parser and as you parse the code you'd need to format it.

In Go it's much easier because the language is pretty simple in terms of grammar.

@tristil
Copy link
Contributor

tristil commented Sep 18, 2015

One idea would be to rip off a lot of the work that's gone into Rubocop. That could involve converting Rubocop itself and the underlying AST parsing gems (https://github.com/whitequark/parser and https://github.com/yujinakayama/astrolabe) or it could mean building a tool in Crystal that could reuse the cops. Some of the Rubocop cops wouldn't be relevant for Crystal, some would be wrong, but some could be used as is.

@ozra
Copy link
Contributor

ozra commented Sep 29, 2015

Why not implement this in the ast.to_s code? If "standard compile" flag - it renders code any way it wants (for macro use etc.). If "stylize", it looks at all necessary rules passed (indent=4, rowspace_after_def=0,rowspace_after_classdef=1, space_around_ops=1, etc. whatever)
Via "location", original code can be looked up for the "keep as is"-cases, without storing "original token" in AST (where there are more ways to the same AST.. [unless, etc.]).

The amount of options will of course grow daily from requests ;-) )

@refi64
Copy link
Contributor

refi64 commented Sep 29, 2015

@ozra The problem with that is that to_s will become cluttered quickly. See http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/, an article I found on Reddit by the author of dartfmt, Dart's auto formatter. This is really something large-scale enough to belong outside the compiler.

@ozra
Copy link
Contributor

ozra commented Sep 29, 2015

@kirbyfan64 - Yeah I pondered the alternatives, but came to the conclusion that every node's to_s would have an if-statement - one branch being the regular, the other being the "messier" one. So the plain version would still be first in every visitor. I just figured it would keep the stylizer always up to date (not implementing the super-options-renderer part still gives compiler-default formatted and keeps working).
Well, definitely a good point - it would be much clearer keeping it out. I'll take that into consideration for what I'm writing atm.
Thanks for reminding me of the line-breaking issue!!

@asterite
Copy link
Member

If you want a built-in formatting tool, either open an issue describing exactly what the code formatter will do, or either submit a PR with the solution. Speaking of a hypothetical code formatter without knowing what it will do is useless.

Alternatively, I think a formatter can perfectly be done as 3rd party library.

@asterite
Copy link
Member

asterite commented Oct 3, 2015

I changed my mind :-)

I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.

@sdogruyol
Copy link
Member

Hey @asterite i'm curious why you changed your mind 😄

@asterite
Copy link
Member

asterite commented Oct 3, 2015

Some reasons:

  1. It's fun and I like challenges 😉
  2. @waj likes it too, and obviously many others too
  3. I did it in the past in the Descent plugin for Eclipse (well, I copied the code from JDT, but I learned a lot doing it), so it should be pretty easy to do
  4. We have a coding style document but applying it manually to existing code can be tedious. It would be nice if a tool could do it for you.
  5. It can be integrated automatically into editors so you don't have to waste time aligning small spaces, adding/removing lines, etc.
  6. Discussions about indentation and spaces will (hopefully) disappear

But note that the formatter I have in mind will mostly affects spaces and newlines: it won't change names or literals, etc. And gofmt works that way too. It would be really bad if the formatter somehow got in your way in a bad way.

The only "worry" I have is for big arrays, like this. But there are some heuristics that could be applied there.

@will
Copy link
Contributor

will commented Oct 12, 2015

Tried out the formatter as of 757bd1c . Mostly great changes.

Just a few changes jumped out to me that I didn't accept:

It'd be nice to keep the right aligned numbers

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)

The right alignment of the *Decoder.new is maybe too much, but again the right aligned integers would be nice

     # https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h
-    register_decoder    BoolDecoder.new,   16 # bool
-    register_decoder   ByteaDecoder.new,   17 # bytea
-    register_decoder    Int8Decoder.new,   20 # int8 (bigint)
-    register_decoder    Int2Decoder.new,   21 # int2 (smallint)
-    register_decoder     IntDecoder.new,   23 # int4 (integer)
-    register_decoder DefaultDecoder.new,   25 # text
-    register_decoder    JsonDecoder.new,  114 # json
-    register_decoder   JsonbDecoder.new, 3802 # jsonb
-    register_decoder Float32Decoder.new,  700 # float4
-    register_decoder Float64Decoder.new,  701 # float8
-    register_decoder DefaultDecoder.new,  705 # unknown
-    register_decoder    DateDecoder.new, 1082 # date
-    register_decoder    TimeDecoder.new, 1114 # timestamp
-    register_decoder    TimeDecoder.new, 1184 # timestamptz
-    register_decoder    UuidDecoder.new, 2950 # uuid
+    register_decoder BoolDecoder.new, 16     # bool
+    register_decoder ByteaDecoder.new, 17    # bytea
+    register_decoder Int8Decoder.new, 20     # int8 (bigint)
+    register_decoder Int2Decoder.new, 21     # int2 (smallint)
+    register_decoder IntDecoder.new, 23      # int4 (integer)
+    register_decoder DefaultDecoder.new, 25  # text
+    register_decoder JsonDecoder.new, 114    # json
+    register_decoder JsonbDecoder.new, 3802  # jsonb
+    register_decoder Float32Decoder.new, 700 # float4
+    register_decoder Float64Decoder.new, 701 # float8
+    register_decoder DefaultDecoder.new, 705 # unknown
+    register_decoder DateDecoder.new, 1082   # date
+    register_decoder TimeDecoder.new, 1114   # timestamp
+    register_decoder TimeDecoder.new, 1184   # timestamptz
+    register_decoder UuidDecoder.new, 2950   # uuid
   end

My formatting for this is weird, so I don't know if it could make it into a tool

-        (((((((((((((((( 0_u64
-          ) | ptr[0] ) << 8
-          ) | ptr[1] ) << 8
-          ) | ptr[2] ) << 8
-          ) | ptr[3] ) << 8
-          ) | ptr[4] ) << 8
-          ) | ptr[5] ) << 8
-          ) | ptr[6] ) << 8
-          ) | ptr[7] )
+        ((((((((((((((((0_u64) | ptr[0]) << 8) | ptr[1]) << 8) | ptr[2]) << 8) | ptr[3]) << 8) | ptr[4]) << 8) | ptr[5]) << 8) | ptr[6]) << 8) | ptr[7])

@asterite
Copy link
Member

I don't understand the first suggestion, spaces will be stripped.

The second suggestion seems pretty arbitrary and kind of depends on the semantic you give the lines. The rule would be "align the last argument of a run of calls if they are number literals"? Mmm... I algo see the names are aligned, kind of really hard to guess what you want there.

The last case is a bug, the formatter should preserve newlines. I'll try to fix it.

By the way, the idea of the formatter is that it gives some kind of beauty and consistency to the code, but there will inevitably be cases you won't like. The idea is to stop worrying about these small cases and learn to live with the formatter.

@will
Copy link
Contributor

will commented Oct 12, 2015

I don't understand the first suggestion, spaces will be stripped.

That's because I left out a line of the diff that showed what was going on. Before the ones place in the digits and commas were aligned with the 10 line at the end

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)
           bytes[10, 6].hexstring(buffer + 24)

The idea is to stop worrying about these small cases and learn to live with the formatter.

Yes, once the formatter lands I'll go with whatever it does, so giving early feedback :)

@ozra
Copy link
Contributor

ozra commented Oct 12, 2015

These alignments, and especially the register_decoder example, are somewhat akin to the elastic tabstops #1682 (in a formatter, not a realtime elastic way ofc.). Alignments are a real nice feature of formatting, however of course quite a bit more complex to implement than construct-by-construct formatting.

@asterite
Copy link
Member

@will I "fixed" the formatting of your long parenthesised expression, though now it ends up like this:

((((((((((((((((0_u64
               ) | ptr[0]) << 8
             ) | ptr[1]) << 8
           ) | ptr[2]) << 8
         ) | ptr[3]) << 8
       ) | ptr[4]) << 8
     ) | ptr[5]) << 8
   ) | ptr[6]) << 8
 ) | ptr[7])

The parentheses are aligned. I could try to make it nicer, but maybe this won't happen very often. I also think you might want to use a loop there, LLVM might optimize it away.

The alignment of other things like numbers is hard, it's really hard for the formatter to recognize these patterns and also to decide where to put spaces. In some obvious cases the formatter will align things, like in when and hash literals, but not in every place.

@will
Copy link
Contributor

will commented Oct 12, 2015

@asterite thanks. regarding optimizations, I did a lot of benchmarking with various approaches, and it seemed that llvm was able to figure out that it was byte swapping in every approach.

@asterite
Copy link
Member

@will after a big change to the algorithm used by the formatter, your code with many parentheses remains the same as how you originally wrote it :-)

Yeah, LLVM is a beast.

@will
Copy link
Contributor

will commented Oct 13, 2015

Amazing

@waj
Copy link
Member

waj commented Oct 15, 2015

You can try the "formatter" branch of the Sublime plugin to have integration with the formatter 😉 https://github.com/manastech/sublime-crystal/tree/formatter

Because the current release doesn't have the formatter, you need to compile one and specify the path in the settings of the plugin, with something like:

{
  "auto_format": true,
  "crystal_cmd": "/path/to/crystal/bin/crystal"
}

@sferik sferik closed this as completed Oct 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants