Skip to content

A tale of two pretty printers and one executable #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 4, 2022

Conversation

milseman
Copy link
Member

This includes two inter-connected pretty-printers and a command-line executable.

PrintAsCanonical will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something. What precisely is a compier fixit vs warning vs error is TBD. I have enough hooked up to pretty-print many of our examples.

PrintAsPattern will print as a result builder DSL (I made up some syntax for this). It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below that will also be printed as a canonical regex literal). I have enough hooked up to convert many of our examples.

If you run swift run PatternConverter '([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+)', you will see:

Concatenation {
  Group(.capture) {
    OneOrMore(.eager) {
      CharacterClass() {
        "0"..."9"
        "A"..."F"
      }
    }
  }
  ZeroOrOne(.eager) {
    Group() {
      Concatenation {
        ".."
        Group(.capture) {
          OneOrMore(.eager) {
            CharacterClass() {
              "0"..."9"
              "A"..."F"
            }
          }
        }
      }
    }
  }
  ZeroOrMore(.eager) {
    .whitespace
  }
  ";"
  .whitespace
  Group(.capture) {
    OneOrMore(.eager) {
      .wordCharacter
    }
  }
}

If you pass --render-source-ranges, you'll also see:


([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+)
 -------^     -^-^ -------^    -^ ^-^ -^  
 --------^         --------^   --^    --^ 
----------^       ----------^        ----^
              --------------^             
           ------------------^            
           -------------------^           
-----------------------------------------^

If you pass --show-canonical, you'll also see:

Canonical:

([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+)


If you pass --top-down-conversion-limit=3 (better names welcome), you'll see that at most 3 levels of AST are converted, the rest is a canonical literal:

Concatenation {
  Group(.capture) {
    OneOrMore(.eager) {
      '/[0-9A-F]/'
    }
  }
  ZeroOrOne(.eager) {
    Group() {
      '/\.\.([0-9A-F]+)/'
    }
  }
  ZeroOrMore(.eager) {
    .whitespace
  }
  ";"
  .whitespace
  Group(.capture) {
    OneOrMore(.eager) {
      '/\w/'
    }
  }
}

If you pass --bottom-up-conversion-limit=3 (better names welcome), you'll see that anything of height <= 3 is put into a canonical literal:

Concatenation {
  '/([0-9A-F]+)/'
  ZeroOrOne(.eager) {
    Group() {
      Concatenation {
        ".."
        '/([0-9A-F]+)/'
      }
    }
  }
  '/\s*/'
  ";"
  '/\s/'
  '/(\w+)/'
}

`PrintAsCanonical` will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something.

`PrintAsPattern` will print as a result builder DSL. It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below _that_ will also be printed as a canonical regex literal).

Also included is an executable to drive these API.
@milseman milseman requested review from rxwei and hamishknight and removed request for rxwei January 3, 2022 13:48
@milseman
Copy link
Member Author

milseman commented Jan 3, 2022

@hamishknight, we will want to start deciding what our preferred spellings for things are. E.g. we prefer \u{12} over \x12, we'll want to pick some named capture syntax, etc. The best kind of specification is executable, and this is hooked up to the included executable under the --show-canonical flag. Could you take over ownership of this print-as-canonical pretty-printer and use it to help drive future syntax discussions and decisions?

@milseman
Copy link
Member Author

milseman commented Jan 3, 2022

@natecook1000, @rxwei , @kylemacomber, whoever is driving the DSL API at the time, could you take over ownership of the print-as-pattern pretty printer? I just picked some spellings quickly, but ideally we'd use this tool to help work through spellings, especially since it will do the conversion from literal examples for us.

AST's `renderFoo` will take options and output a String.

AST value's `_canonicalBase` produces a string of that value in Swift's canonical regex form.

PrettyPrinter's `output` will accumulate the given contents without indentation, newlines, and internal state updates.

PrettyPrinter's `print` will indent, insert newlines, and update internal state.
Copy link
Member Author

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, --experimental-syntax works, such that:

% swift run PatternConverter --top-down-conversion-limit=2 --experimental-syntax --show-canonical   '([0-9 A-F]+) (_: ".." ([0-9 A-F]+))? \s* ";" \s (\w+)'
[0/0] Build complete!

NOTE: This tool is experimental and its output is not
      necessarily compilable.

Converting '|([0-9 A-F]+) (_: ".." ([0-9 A-F]+))? \s* ";" \s (\w+)|'
Canonical:

([0-9 A-F]+)(?:\Q..\E([0-9 A-F]+))?\s*\Q;\E\s(\w+)



Concatenation {
  Group(.capture) {
    '/[0-9 A-F]+/'
  }
  ZeroOrOne(.eager) {
    '/(?:\Q..\E([0-9 A-F]+))/'
  }
  ZeroOrMore(.eager) {
    '/\s/'
  }
  #";"#
  .whitespace
  Group(.capture) {
    '/\w+/'
  }
}

Where "canonical" will use traditional style quotes like \Q..\E.

col.formIndex(after: &idx)
}
return result.isEmpty ? nil : result._quoted
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timvermeulen is this a standard algorithm?

// parseTest(#"[\0]"#, charClass("\u{0}"))
// parseTest(#"[\01]"#, charClass("\u{1}"))
// parseTest(#"[\070]"#, charClass("\u{38}"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamishknight I changed the AST dump functions to print this out as \u{0} instead of inserting a literal null byte, so some tests fail. AFAICT, this is just a testing infra concern and not a observable behavioral change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think these will need to be e.g charClass(scalar_m("\u{0}")) instead of charClass("\u{0}")

@milseman
Copy link
Member Author

milseman commented Jan 4, 2022

@swift-ci please test linux platform

@milseman
Copy link
Member Author

milseman commented Jan 4, 2022

Merging to unblock or tidy up some potential conflict places

@milseman milseman merged commit e9e8e08 into swiftlang:main Jan 4, 2022
@milseman milseman deleted the daguerreotype branch January 4, 2022 14:52
milseman added a commit to milseman/swift-experimental-string-processing that referenced this pull request Jan 11, 2022
Canonical literal and pattern DSL pretty-printers

`PrintAsCanonical` will pretty-print an AST using Swift's preferred, canonical regex literal syntax. The parser accepts many things, but Swift can have an opinion on which is the preferred way to spell something.

`PrintAsPattern` will print as a result builder DSL. It is parameterized over a maximum top-down levels to convert (everything below that will be a canonical regex literal) as well as a minimum-tree-height value (everything below _that_ will also be printed as a canonical regex literal).

Also included is an executable to drive these API.

Naming scheme:

AST's `renderFoo` will take options and output a String.

AST value's `_canonicalBase` produces a string of that value in Swift's canonical regex form.

PrettyPrinter's `output` will accumulate the given contents without indentation, newlines, and internal state updates.

PrettyPrinter's `print` will indent, insert newlines, and update internal state.
@riking
Copy link

riking commented Jan 13, 2022

If you pass --bottom-up-conversion-limit=3 (better names welcome),

This is better described as "inverted top down depth filter", not a "bottom up depth filter". It's unclear how bottom-up would get printed at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants