-
Notifications
You must be signed in to change notification settings - Fork 49
A tale of two pretty printers and one executable #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`PrintAsCanonical` will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something. `PrintAsPattern` will print as a result builder DSL. It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below _that_ will also be printed as a canonical regex literal). Also included is an executable to drive these API.
@hamishknight, we will want to start deciding what our preferred spellings for things are. E.g. we prefer |
@natecook1000, @rxwei , @kylemacomber, whoever is driving the DSL API at the time, could you take over ownership of the print-as-pattern pretty printer? I just picked some spellings quickly, but ideally we'd use this tool to help work through spellings, especially since it will do the conversion from literal examples for us. |
AST's `renderFoo` will take options and output a String. AST value's `_canonicalBase` produces a string of that value in Swift's canonical regex form. PrettyPrinter's `output` will accumulate the given contents without indentation, newlines, and internal state updates. PrettyPrinter's `print` will indent, insert newlines, and update internal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, --experimental-syntax
works, such that:
% swift run PatternConverter --top-down-conversion-limit=2 --experimental-syntax --show-canonical '([0-9 A-F]+) (_: ".." ([0-9 A-F]+))? \s* ";" \s (\w+)'
[0/0] Build complete!
NOTE: This tool is experimental and its output is not
necessarily compilable.
Converting '|([0-9 A-F]+) (_: ".." ([0-9 A-F]+))? \s* ";" \s (\w+)|'
Canonical:
([0-9 A-F]+)(?:\Q..\E([0-9 A-F]+))?\s*\Q;\E\s(\w+)
Concatenation {
Group(.capture) {
'/[0-9 A-F]+/'
}
ZeroOrOne(.eager) {
'/(?:\Q..\E([0-9 A-F]+))/'
}
ZeroOrMore(.eager) {
'/\s/'
}
#";"#
.whitespace
Group(.capture) {
'/\w+/'
}
}
Where "canonical" will use traditional style quotes like \Q..\E
.
col.formIndex(after: &idx) | ||
} | ||
return result.isEmpty ? nil : result._quoted | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timvermeulen is this a standard algorithm?
// parseTest(#"[\0]"#, charClass("\u{0}")) | ||
// parseTest(#"[\01]"#, charClass("\u{1}")) | ||
// parseTest(#"[\070]"#, charClass("\u{38}")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hamishknight I changed the AST dump functions to print this out as \u{0}
instead of inserting a literal null byte, so some tests fail. AFAICT, this is just a testing infra concern and not a observable behavioral change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think these will need to be e.g charClass(scalar_m("\u{0}"))
instead of charClass("\u{0}")
@swift-ci please test linux platform |
Merging to unblock or tidy up some potential conflict places |
Canonical literal and pattern DSL pretty-printers `PrintAsCanonical` will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something. `PrintAsPattern` will print as a result builder DSL. It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below _that_ will also be printed as a canonical regex literal). Also included is an executable to drive these API. Naming scheme: AST's `renderFoo` will take options and output a String. AST value's `_canonicalBase` produces a string of that value in Swift's canonical regex form. PrettyPrinter's `output` will accumulate the given contents without indentation, newlines, and internal state updates. PrettyPrinter's `print` will indent, insert newlines, and update internal state.
This is better described as "inverted top down depth filter", not a "bottom up depth filter". It's unclear how bottom-up would get printed at all. |
This includes two inter-connected pretty-printers and a command-line executable.
PrintAsCanonical
will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something. What precisely is a compier fixit vs warning vs error is TBD. I have enough hooked up to pretty-print many of our examples.PrintAsPattern
will print as a result builder DSL (I made up some syntax for this). It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below that will also be printed as a canonical regex literal). I have enough hooked up to convert many of our examples.If you run
swift run PatternConverter '([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+)'
, you will see:If you pass
--render-source-ranges
, you'll also see:If you pass
--show-canonical
, you'll also see:If you pass
--top-down-conversion-limit=3
(better names welcome), you'll see that at most 3 levels of AST are converted, the rest is a canonical literal:If you pass
--bottom-up-conversion-limit=3
(better names welcome), you'll see that anything of height <= 3 is put into a canonical literal: