Built-in formatting tool #309

sferik · 2014-12-09T11:10:37Z

Following the discussion in #308, I realized it would be nice if Crystal had a built-in formatting command that would apply some basic style to the code (two-space indentation, etc.)

The Go language handles this quite nicely. Quoting from Effective Go:

Formatting issues are the most contentious but the least consequential. People can adapt to different formatting styles but it's better if they don't have to, and less time is devoted to the topic if everyone adheres to the same style. The problem is how to approach this Utopia without a long prescriptive style guide.

With Go we take an unusual approach and let the machine take care of most formatting issues. The gofmt program (also available as go fmt, which operates at the package level rather than source file level) reads a Go program and emits the source in a standard style of indentation and vertical alignment, retaining and if necessary reformatting comments.

The text was updated successfully, but these errors were encountered:

asterite · 2014-12-09T14:03:24Z

I think this would be nice to have. Many people are obsesses with correctly formatted syntax... including me :-P

More cases that come to my mind are:

No spaces between operators: 1+2*3.
Extra newlines between definitions:

class Foo

  def initialize
  end


  def bar
  end

end

I once wrote a D plugin for Eclispe that copied most of JDT's code. I had to adapt the Java formatter to D's language. There I learned how they do it. First they parse the file and get an AST from it. Then they create a lexer for that same file. Then they traverse the AST and lex the source at the same time, while applying formatting rules. That way you can deal with each token while knowing where you are in the AST. Very smart.

We could do the same, but right now the parser will create nodes that don't exactly reflect what was written. For example when you write this:

foo.try &.bar

The parser will actually emit something like this:

foo.try { |_arg0| _arg0.bar }

We would have to define AST nodes for those cases and later unsugar them, like we do with other things (case, &&, ||).

vendethiel · 2014-12-09T14:06:57Z

@asterite Isn't it easier with a CST rather than a AST?

sdogruyol · 2015-09-15T18:30:01Z

@asterite is there any improvement on this 🙏

I really like crystalfmt 👍 to be there 😄

asterite · 2015-09-15T18:37:46Z

No progress yet. A formatter is something that will come after 1.0, unless someone else wants to tackle this. But it's not something trivial to do, you would need to extend the Parser and as you parse the code you'd need to format it.

In Go it's much easier because the language is pretty simple in terms of grammar.

tristil · 2015-09-18T02:09:46Z

One idea would be to rip off a lot of the work that's gone into Rubocop. That could involve converting Rubocop itself and the underlying AST parsing gems (https://github.com/whitequark/parser and https://github.com/yujinakayama/astrolabe) or it could mean building a tool in Crystal that could reuse the cops. Some of the Rubocop cops wouldn't be relevant for Crystal, some would be wrong, but some could be used as is.

ozra · 2015-09-29T10:35:11Z

Why not implement this in the ast.to_s code? If "standard compile" flag - it renders code any way it wants (for macro use etc.). If "stylize", it looks at all necessary rules passed (indent=4, rowspace_after_def=0,rowspace_after_classdef=1, space_around_ops=1, etc. whatever)
Via "location", original code can be looked up for the "keep as is"-cases, without storing "original token" in AST (where there are more ways to the same AST.. [unless, etc.]).

The amount of options will of course grow daily from requests ;-) )

refi64 · 2015-09-29T14:11:59Z

@ozra The problem with that is that to_s will become cluttered quickly. See http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/, an article I found on Reddit by the author of dartfmt, Dart's auto formatter. This is really something large-scale enough to belong outside the compiler.

ozra · 2015-09-29T16:48:59Z

@kirbyfan64 - Yeah I pondered the alternatives, but came to the conclusion that every node's to_s would have an if-statement - one branch being the regular, the other being the "messier" one. So the plain version would still be first in every visitor. I just figured it would keep the stylizer always up to date (not implementing the super-options-renderer part still gives compiler-default formatted and keeps working).
Well, definitely a good point - it would be much clearer keeping it out. I'll take that into consideration for what I'm writing atm.
Thanks for reminding me of the line-breaking issue!!

asterite · 2015-09-30T14:22:25Z

If you want a built-in formatting tool, either open an issue describing exactly what the code formatter will do, or either submit a PR with the solution. Speaking of a hypothetical code formatter without knowing what it will do is useless.

Alternatively, I think a formatter can perfectly be done as 3rd party library.

asterite · 2015-10-03T04:57:13Z

I changed my mind :-)

I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes.

sdogruyol · 2015-10-03T09:33:39Z

Hey @asterite i'm curious why you changed your mind 😄

asterite · 2015-10-03T13:08:16Z

Some reasons:

It's fun and I like challenges 😉
@waj likes it too, and obviously many others too
I did it in the past in the Descent plugin for Eclipse (well, I copied the code from JDT, but I learned a lot doing it), so it should be pretty easy to do
We have a coding style document but applying it manually to existing code can be tedious. It would be nice if a tool could do it for you.
It can be integrated automatically into editors so you don't have to waste time aligning small spaces, adding/removing lines, etc.
Discussions about indentation and spaces will (hopefully) disappear

But note that the formatter I have in mind will mostly affects spaces and newlines: it won't change names or literals, etc. And gofmt works that way too. It would be really bad if the formatter somehow got in your way in a bad way.

The only "worry" I have is for big arrays, like this. But there are some heuristics that could be applied there.

will · 2015-10-12T02:24:09Z

Tried out the formatter as of 757bd1c . Mostly great changes.

Just a few changes jumped out to me that I didn't accept:

It'd be nice to keep the right aligned numbers

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)

The right alignment of the *Decoder.new is maybe too much, but again the right aligned integers would be nice

     # https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h
-    register_decoder    BoolDecoder.new,   16 # bool
-    register_decoder   ByteaDecoder.new,   17 # bytea
-    register_decoder    Int8Decoder.new,   20 # int8 (bigint)
-    register_decoder    Int2Decoder.new,   21 # int2 (smallint)
-    register_decoder     IntDecoder.new,   23 # int4 (integer)
-    register_decoder DefaultDecoder.new,   25 # text
-    register_decoder    JsonDecoder.new,  114 # json
-    register_decoder   JsonbDecoder.new, 3802 # jsonb
-    register_decoder Float32Decoder.new,  700 # float4
-    register_decoder Float64Decoder.new,  701 # float8
-    register_decoder DefaultDecoder.new,  705 # unknown
-    register_decoder    DateDecoder.new, 1082 # date
-    register_decoder    TimeDecoder.new, 1114 # timestamp
-    register_decoder    TimeDecoder.new, 1184 # timestamptz
-    register_decoder    UuidDecoder.new, 2950 # uuid
+    register_decoder BoolDecoder.new, 16     # bool
+    register_decoder ByteaDecoder.new, 17    # bytea
+    register_decoder Int8Decoder.new, 20     # int8 (bigint)
+    register_decoder Int2Decoder.new, 21     # int2 (smallint)
+    register_decoder IntDecoder.new, 23      # int4 (integer)
+    register_decoder DefaultDecoder.new, 25  # text
+    register_decoder JsonDecoder.new, 114    # json
+    register_decoder JsonbDecoder.new, 3802  # jsonb
+    register_decoder Float32Decoder.new, 700 # float4
+    register_decoder Float64Decoder.new, 701 # float8
+    register_decoder DefaultDecoder.new, 705 # unknown
+    register_decoder DateDecoder.new, 1082   # date
+    register_decoder TimeDecoder.new, 1114   # timestamp
+    register_decoder TimeDecoder.new, 1184   # timestamptz
+    register_decoder UuidDecoder.new, 2950   # uuid
   end

My formatting for this is weird, so I don't know if it could make it into a tool

-        (((((((((((((((( 0_u64
-          ) | ptr[0] ) << 8
-          ) | ptr[1] ) << 8
-          ) | ptr[2] ) << 8
-          ) | ptr[3] ) << 8
-          ) | ptr[4] ) << 8
-          ) | ptr[5] ) << 8
-          ) | ptr[6] ) << 8
-          ) | ptr[7] )
+        ((((((((((((((((0_u64) | ptr[0]) << 8) | ptr[1]) << 8) | ptr[2]) << 8) | ptr[3]) << 8) | ptr[4]) << 8) | ptr[5]) << 8) | ptr[6]) << 8) | ptr[7])

asterite · 2015-10-12T02:40:25Z

I don't understand the first suggestion, spaces will be stripped.

The second suggestion seems pretty arbitrary and kind of depends on the semantic you give the lines. The rule would be "align the last argument of a run of calls if they are number literals"? Mmm... I algo see the names are aligned, kind of really hard to guess what you want there.

The last case is a bug, the formatter should preserve newlines. I'll try to fix it.

By the way, the idea of the formatter is that it gives some kind of beauty and consistency to the code, but there will inevitably be cases you won't like. The idea is to stop worrying about these small cases and learn to live with the formatter.

will · 2015-10-12T03:13:31Z

I don't understand the first suggestion, spaces will be stripped.

That's because I left out a line of the diff that showed what was going on. Before the ones place in the digits and commas were aligned with the 10 line at the end

-          bytes[ 0, 4].hexstring(buffer + 0)
-          bytes[ 4, 2].hexstring(buffer + 9)
-          bytes[ 6, 2].hexstring(buffer + 14)
-          bytes[ 8, 2].hexstring(buffer + 19)
+          bytes[0, 4].hexstring(buffer + 0)
+          bytes[4, 2].hexstring(buffer + 9)
+          bytes[6, 2].hexstring(buffer + 14)
+          bytes[8, 2].hexstring(buffer + 19)
           bytes[10, 6].hexstring(buffer + 24)

The idea is to stop worrying about these small cases and learn to live with the formatter.

Yes, once the formatter lands I'll go with whatever it does, so giving early feedback :)

ozra · 2015-10-12T13:21:53Z

These alignments, and especially the register_decoder example, are somewhat akin to the elastic tabstops #1682 (in a formatter, not a realtime elastic way ofc.). Alignments are a real nice feature of formatting, however of course quite a bit more complex to implement than construct-by-construct formatting.

asterite · 2015-10-12T14:44:54Z

@will I "fixed" the formatting of your long parenthesised expression, though now it ends up like this:

((((((((((((((((0_u64
               ) | ptr[0]) << 8
             ) | ptr[1]) << 8
           ) | ptr[2]) << 8
         ) | ptr[3]) << 8
       ) | ptr[4]) << 8
     ) | ptr[5]) << 8
   ) | ptr[6]) << 8
 ) | ptr[7])

The parentheses are aligned. I could try to make it nicer, but maybe this won't happen very often. I also think you might want to use a loop there, LLVM might optimize it away.

The alignment of other things like numbers is hard, it's really hard for the formatter to recognize these patterns and also to decide where to put spaces. In some obvious cases the formatter will align things, like in when and hash literals, but not in every place.

will · 2015-10-12T17:59:12Z

@asterite thanks. regarding optimizations, I did a lot of benchmarking with various approaches, and it seemed that llvm was able to figure out that it was byte swapping in every approach.

asterite · 2015-10-13T21:43:56Z

@will after a big change to the algorithm used by the formatter, your code with many parentheses remains the same as how you originally wrote it :-)

Yeah, LLVM is a beast.

will · 2015-10-13T23:54:59Z

Amazing

waj · 2015-10-15T02:06:08Z

You can try the "formatter" branch of the Sublime plugin to have integration with the formatter 😉 https://github.com/manastech/sublime-crystal/tree/formatter

Because the current release doesn't have the formatter, you need to compile one and specify the path in the settings of the plugin, with something like:

{
  "auto_format": true,
  "crystal_cmd": "/path/to/crystal/bin/crystal"
}

sferik changed the title ~~Built-in formatter tool~~ Built-in formatting tool Dec 9, 2014

asterite added the kind:feature label Dec 9, 2014

asterite mentioned this issue May 3, 2015

Crystal fmt #606

Closed

sdogruyol mentioned this issue Sep 27, 2015

Feature request: Enforce code convention. #1616

Closed

asterite closed this as completed Sep 30, 2015

rmosolgo mentioned this issue Oct 1, 2015

Add simple style guide to docs #1667

Merged

asterite reopened this Oct 3, 2015

technorama mentioned this issue Oct 3, 2015

[RFC] Adopt elastic tabstops in the style guide and code. #1682

Closed

rhysd mentioned this issue Oct 5, 2015

Automatic formatting vim-crystal/vim-crystal#21

Closed

sferik closed this as completed Oct 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built-in formatting tool #309

Built-in formatting tool #309

sferik commented Dec 9, 2014

asterite commented Dec 9, 2014

vendethiel commented Dec 9, 2014

sdogruyol commented Sep 15, 2015

asterite commented Sep 15, 2015

tristil commented Sep 18, 2015

ozra commented Sep 29, 2015

refi64 commented Sep 29, 2015

ozra commented Sep 29, 2015

asterite commented Sep 30, 2015

asterite commented Oct 3, 2015

sdogruyol commented Oct 3, 2015

asterite commented Oct 3, 2015

will commented Oct 12, 2015

asterite commented Oct 12, 2015

will commented Oct 12, 2015

ozra commented Oct 12, 2015

asterite commented Oct 12, 2015

will commented Oct 12, 2015

asterite commented Oct 13, 2015

will commented Oct 13, 2015

waj commented Oct 15, 2015

Built-in formatting tool #309

Built-in formatting tool #309

Comments

sferik commented Dec 9, 2014

asterite commented Dec 9, 2014

vendethiel commented Dec 9, 2014

sdogruyol commented Sep 15, 2015

asterite commented Sep 15, 2015

tristil commented Sep 18, 2015

ozra commented Sep 29, 2015

refi64 commented Sep 29, 2015

ozra commented Sep 29, 2015

asterite commented Sep 30, 2015

asterite commented Oct 3, 2015

sdogruyol commented Oct 3, 2015

asterite commented Oct 3, 2015

will commented Oct 12, 2015

asterite commented Oct 12, 2015

will commented Oct 12, 2015

ozra commented Oct 12, 2015

asterite commented Oct 12, 2015

will commented Oct 12, 2015

asterite commented Oct 13, 2015

will commented Oct 13, 2015

waj commented Oct 15, 2015