New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - spike out a tiny JS parser #10

Open
wants to merge 19 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@piki
Copy link
Member

piki commented Jan 26, 2019

We've wanted a Javascript parser for the .workflow format for a while. Several entirely sane, don't-repeat-yourself approaches didn't pan out. The golang parser binary is too big (4.5MB+) to run in wasm. The ANTLR-generated JS parser is about 500KB. gopherjs generated too big a file, too.

I figured a language this simple could be parsed by a much smaller Javascript file. This PR introduces such a file, implementing a hand-rolled recursive-descent parser. At the moment, it parses syntax but does not validate things like key names, value types, event-name and uses value contents, or the dependency graph. It just turns a .workflow file into an AST.

The good news is it's absolutely tiny. The human-readable JS is about 7KB, and minified+gzipped, it's currently 1467 bytes.

piki added some commits Jan 26, 2019

Use str.substring instead of str.substr
str.substr's second argument is the length of the string, not the final
index.  `str.substr(..., str.length)` ends up the same, because there are
no index-out-of-bounds errors, but `str.substring(..., str.length)` feels
tidier.
Keep lexing after bad escapes
We still terminate lexing after an unterminated heredoc (which by
definition eats to EOF) and after illegal character (which often is the
beginning of a longer bad sequence).

@piki piki referenced this pull request Jan 29, 2019

Merged

Disallow control characters #12

piki added some commits Jan 29, 2019

Disallow some control characters
`\s` allows `[\t\n\v\f\r ]`.  Now we allow only the common whitespace
characters, `[\t\n\r ]`.  That means we exclude all control characters
except \n, \r, and \t.
Simplify comment regex
The previous regex (roughly `/#.*\n/`) rejected any line like `#\r\n` with
`#` as an illegal character, because the `\r` doesn't match `.`, so the
regex never finds its `\n`.  If we don't look for the `\n`, we get the
right result.
@shawnbot

This comment has been minimized.

Copy link

shawnbot commented Feb 7, 2019

Dumb question: Since the workflow format is HCL-compatible, would it be possible to use something that already exists, like tf-hcl (made with nearley)?

@piki

This comment has been minimized.

Copy link
Member Author

piki commented Feb 8, 2019

tf-hcl would be a fine replacement for this branch. Since it's a vanilla HCL parser, we'd need to fork it and make our changes to ignore string interpolations and prohibit control characters.

For comparison's sake, the minified-and-gzipped JS from tf-hcl is around 5KB. Bigger than this parser, but not enough to matter. It's an impressive result for a generated parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment