New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built-in formatting tool #309
Comments
I think this would be nice to have. Many people are obsesses with correctly formatted syntax... including me :-P More cases that come to my mind are:
class Foo
def initialize
end
def bar
end
end I once wrote a D plugin for Eclispe that copied most of JDT's code. I had to adapt the Java formatter to D's language. There I learned how they do it. First they parse the file and get an AST from it. Then they create a lexer for that same file. Then they traverse the AST and lex the source at the same time, while applying formatting rules. That way you can deal with each token while knowing where you are in the AST. Very smart. We could do the same, but right now the parser will create nodes that don't exactly reflect what was written. For example when you write this: foo.try &.bar The parser will actually emit something like this: foo.try { |_arg0| _arg0.bar } We would have to define AST nodes for those cases and later unsugar them, like we do with other things ( |
@asterite Isn't it easier with a CST rather than a AST? |
@asterite is there any improvement on this 🙏 I really like crystalfmt 👍 to be there 😄 |
No progress yet. A formatter is something that will come after 1.0, unless someone else wants to tackle this. But it's not something trivial to do, you would need to extend the Parser and as you parse the code you'd need to format it. In Go it's much easier because the language is pretty simple in terms of grammar. |
One idea would be to rip off a lot of the work that's gone into Rubocop. That could involve converting Rubocop itself and the underlying AST parsing gems (https://github.com/whitequark/parser and https://github.com/yujinakayama/astrolabe) or it could mean building a tool in Crystal that could reuse the cops. Some of the Rubocop cops wouldn't be relevant for Crystal, some would be wrong, but some could be used as is. |
Why not implement this in the ast.to_s code? If "standard compile" flag - it renders code any way it wants (for macro use etc.). If "stylize", it looks at all necessary rules passed (indent=4, rowspace_after_def=0,rowspace_after_classdef=1, space_around_ops=1, etc. whatever) The amount of options will of course grow daily from requests ;-) ) |
@ozra The problem with that is that |
@kirbyfan64 - Yeah I pondered the alternatives, but came to the conclusion that every node's to_s would have an if-statement - one branch being the regular, the other being the "messier" one. So the plain version would still be first in every visitor. I just figured it would keep the stylizer always up to date (not implementing the super-options-renderer part still gives compiler-default formatted and keeps working). |
If you want a built-in formatting tool, either open an issue describing exactly what the code formatter will do, or either submit a PR with the solution. Speaking of a hypothetical code formatter without knowing what it will do is useless. Alternatively, I think a formatter can perfectly be done as 3rd party library. |
I changed my mind :-) I started a formatter branch. Right now it only works with some AST nodes (nil, chars, numbers, if, array literals, begin/end, parenthesis) but I'll eventually add support for all nodes. |
Hey @asterite i'm curious why you changed your mind 😄 |
Some reasons:
But note that the formatter I have in mind will mostly affects spaces and newlines: it won't change names or literals, etc. And gofmt works that way too. It would be really bad if the formatter somehow got in your way in a bad way. The only "worry" I have is for big arrays, like this. But there are some heuristics that could be applied there. |
Tried out the formatter as of 757bd1c . Mostly great changes. Just a few changes jumped out to me that I didn't accept: It'd be nice to keep the right aligned numbers - bytes[ 0, 4].hexstring(buffer + 0)
- bytes[ 4, 2].hexstring(buffer + 9)
- bytes[ 6, 2].hexstring(buffer + 14)
- bytes[ 8, 2].hexstring(buffer + 19)
+ bytes[0, 4].hexstring(buffer + 0)
+ bytes[4, 2].hexstring(buffer + 9)
+ bytes[6, 2].hexstring(buffer + 14)
+ bytes[8, 2].hexstring(buffer + 19) The right alignment of the *Decoder.new is maybe too much, but again the right aligned integers would be nice # https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h
- register_decoder BoolDecoder.new, 16 # bool
- register_decoder ByteaDecoder.new, 17 # bytea
- register_decoder Int8Decoder.new, 20 # int8 (bigint)
- register_decoder Int2Decoder.new, 21 # int2 (smallint)
- register_decoder IntDecoder.new, 23 # int4 (integer)
- register_decoder DefaultDecoder.new, 25 # text
- register_decoder JsonDecoder.new, 114 # json
- register_decoder JsonbDecoder.new, 3802 # jsonb
- register_decoder Float32Decoder.new, 700 # float4
- register_decoder Float64Decoder.new, 701 # float8
- register_decoder DefaultDecoder.new, 705 # unknown
- register_decoder DateDecoder.new, 1082 # date
- register_decoder TimeDecoder.new, 1114 # timestamp
- register_decoder TimeDecoder.new, 1184 # timestamptz
- register_decoder UuidDecoder.new, 2950 # uuid
+ register_decoder BoolDecoder.new, 16 # bool
+ register_decoder ByteaDecoder.new, 17 # bytea
+ register_decoder Int8Decoder.new, 20 # int8 (bigint)
+ register_decoder Int2Decoder.new, 21 # int2 (smallint)
+ register_decoder IntDecoder.new, 23 # int4 (integer)
+ register_decoder DefaultDecoder.new, 25 # text
+ register_decoder JsonDecoder.new, 114 # json
+ register_decoder JsonbDecoder.new, 3802 # jsonb
+ register_decoder Float32Decoder.new, 700 # float4
+ register_decoder Float64Decoder.new, 701 # float8
+ register_decoder DefaultDecoder.new, 705 # unknown
+ register_decoder DateDecoder.new, 1082 # date
+ register_decoder TimeDecoder.new, 1114 # timestamp
+ register_decoder TimeDecoder.new, 1184 # timestamptz
+ register_decoder UuidDecoder.new, 2950 # uuid
end My formatting for this is weird, so I don't know if it could make it into a tool - (((((((((((((((( 0_u64
- ) | ptr[0] ) << 8
- ) | ptr[1] ) << 8
- ) | ptr[2] ) << 8
- ) | ptr[3] ) << 8
- ) | ptr[4] ) << 8
- ) | ptr[5] ) << 8
- ) | ptr[6] ) << 8
- ) | ptr[7] )
+ ((((((((((((((((0_u64) | ptr[0]) << 8) | ptr[1]) << 8) | ptr[2]) << 8) | ptr[3]) << 8) | ptr[4]) << 8) | ptr[5]) << 8) | ptr[6]) << 8) | ptr[7]) |
I don't understand the first suggestion, spaces will be stripped. The second suggestion seems pretty arbitrary and kind of depends on the semantic you give the lines. The rule would be "align the last argument of a run of calls if they are number literals"? Mmm... I algo see the names are aligned, kind of really hard to guess what you want there. The last case is a bug, the formatter should preserve newlines. I'll try to fix it. By the way, the idea of the formatter is that it gives some kind of beauty and consistency to the code, but there will inevitably be cases you won't like. The idea is to stop worrying about these small cases and learn to live with the formatter. |
That's because I left out a line of the diff that showed what was going on. Before the ones place in the digits and commas were aligned with the - bytes[ 0, 4].hexstring(buffer + 0)
- bytes[ 4, 2].hexstring(buffer + 9)
- bytes[ 6, 2].hexstring(buffer + 14)
- bytes[ 8, 2].hexstring(buffer + 19)
+ bytes[0, 4].hexstring(buffer + 0)
+ bytes[4, 2].hexstring(buffer + 9)
+ bytes[6, 2].hexstring(buffer + 14)
+ bytes[8, 2].hexstring(buffer + 19)
bytes[10, 6].hexstring(buffer + 24)
Yes, once the formatter lands I'll go with whatever it does, so giving early feedback :) |
These alignments, and especially the |
@will I "fixed" the formatting of your long parenthesised expression, though now it ends up like this: ((((((((((((((((0_u64
) | ptr[0]) << 8
) | ptr[1]) << 8
) | ptr[2]) << 8
) | ptr[3]) << 8
) | ptr[4]) << 8
) | ptr[5]) << 8
) | ptr[6]) << 8
) | ptr[7]) The parentheses are aligned. I could try to make it nicer, but maybe this won't happen very often. I also think you might want to use a loop there, LLVM might optimize it away. The alignment of other things like numbers is hard, it's really hard for the formatter to recognize these patterns and also to decide where to put spaces. In some obvious cases the formatter will align things, like in |
@asterite thanks. regarding optimizations, I did a lot of benchmarking with various approaches, and it seemed that llvm was able to figure out that it was byte swapping in every approach. |
@will after a big change to the algorithm used by the formatter, your code with many parentheses remains the same as how you originally wrote it :-) Yeah, LLVM is a beast. |
Amazing |
You can try the "formatter" branch of the Sublime plugin to have integration with the formatter 😉 https://github.com/manastech/sublime-crystal/tree/formatter Because the current release doesn't have the formatter, you need to compile one and specify the path in the settings of the plugin, with something like: {
"auto_format": true,
"crystal_cmd": "/path/to/crystal/bin/crystal"
} |
Following the discussion in #308, I realized it would be nice if Crystal had a built-in formatting command that would apply some basic style to the code (two-space indentation, etc.)
The Go language handles this quite nicely. Quoting from Effective Go:
The text was updated successfully, but these errors were encountered: