Component: Parsing, Pretty-Printing #5259

StachuDotNet · 2024-01-14T20:02:52Z

This Issue exists to collect many items that relate to Dark's parser(s), pretty-printer(s), name resolution, etc.

Here's our current state:

in dark-classic, we didn't have a parser used for user code
that said, we did have a hacky parser used internally, for running many tests stored in .dark test files
that parser was a simple wrapper around F#'s parser, and so our syntax was limited somewhat by what the 'upper' parser could handle

These are tasks currently available to be worked on:

Once the tree-sitter grammar and parser has 'caught up' with our full language:

throw away the F#-wrapper parser entirely

Once that is done, we can tackle the fun stuff:

add ! ? to language, to assist with ergonomic error-handling
refer to package items with a @paul.module1.module2-like syntax, rather than PACKAGE.Paul.Module1.Module2
prevent conflicts of type names
- e.g. users shouldn't be allowed to define a List type
- in addition to preventing conflicts of existing types, keywords and other reserved word as well (i.e. Set)
- potentially something in the name resolver
- or maybe we allow users to use whatever type names they want, and deal with things closer to how Unison does

All of these tasks are worth some discussion, either here or in Discord, before starting.

The text was updated successfully, but these errors were encountered:

StachuDotNet · 2024-04-03T20:17:20Z

Copying this from some thoughts I posted on Discord recently:

tl;dr: is tree-sitter really the best tool for our parser, or should we reconsider writing a parser combinator thing in Darklang?

The way we're currently set up for the new/tree-sitter parser is:
A. write Darklang source code
B. use tree-sitter and tree-sitter-darklang to parse to tree-sitter's internal representation of the syntax tree
C. map that to a Dark type "ParsedNode," via a built-in function (the type:

dark/packages/darklang/languageTools/parser.dark

Lines 11 to 27 in a68b808

type ParsedNode =

{

// e.g., a node of `typ` `let_expression` has a child node with a `body` field name

fieldName: Stdlib.Option.Option<String>

/// e.g. `source_file`, `fn_decl`, `expression`, `let_expression`

typ: String

/// The text of this node as it was in the unparsed source code

text: String

/// Where in the source code is this node written/contained

/// i.e. Line 1, Column 2 to Line 1, Column 5

sourceRange: Range

children: List<ParsedNode>

}

; the builtin fn:

dark/backend/src/BuiltinExecution/Libs/Parser.fs

Line 37 in a68b808

[ { name = fn "parserParseToSimplifiedTree" 0

)
D. map ParsedNode to WrittenTypes

those WrittenTypes are used:

to map to ProgramTypes, where relevant

to map to semantic tokens, for VS Code syntax highlighting

I've been questioning whether depending on tree-sitter for all of our parsing is a good idea.

An alternative would be that we write the parser in Darklang instead, potentially as wrapper/equivalent to Farkle or FParsec, via minimal Builtins.
(relevant links:

https://github.com/stephan-tolksdorf/fparsec

https://teo-tsirpanis.github.io/Farkle (seems to be better for us than FParsec, per https://teo-tsirpanis.github.io/Farkle/choosing-a-parser.html)

https://www.youtube.com/watch?v=RDalzi7mhdY not expecting you to watch this, but a good talk on the subject.)

Here are some potential trade-offs to consider:

the current A->B step:

requires us to build tree-sitter as an .so, as well as our grammar's .so. This is all set up now, but takes a few seconds of time, esp CI time.

requires our cli app to be ~1MB larger, to package those .sos along with our exe

requires a fancy extract-and-load setup to use both of those at run-time ()

the current C->D step:

is pretty complicated, and involves some fragile code. there might be abstractions available here we haven't yet discovered, but it's a bit rough.

see https://github.com/darklang/dark/blob/main/packages/darklang/languageTools/parser.dark

we're broadly missing out on immediate feedback, throughout the process. We wait for the parser to be built, and have to follow each of those changes with ParsedNode-> WrittenTypes functions. And every grammar upgrade depends on a full build/release cycle, waiting for CI etc, to get things to users

I've no clear path forward on versioning the parser with our langauge, in a reasonably seamless way. as opposed to an in-Dark solution that would allow us to properly version the parser fns like anything else in the package manager.

our current setup provides only one big parser for a 'file', but what if we want to allow/disallow different parseable things if we're parsing a Canvas, vs parsing a Script, etc. I've been hoping we'd figure out a proper solution for that eventually, but everything I've come up with so far feels like a hack (i.e. passing a 'header' to the tree-sitter grammar where we). I think the composability of a parser combinator would prepare us for these scenarios much better.

broadly, it feels like we're doing (more than) double-work: we're writing the grammar.js, which builds into a parser, and writing a bunch of "parser.dark code" to map that back to WrittenTypes.

I suspect we'd still need a tree-sitter parser around, for highlighting and such in contexts outside of our VS Code plugin.

Am I forgetting a bit reason why we chose tree-sitter rather than exploring writing a parser in Dark/F#?
Or maybe we've just learned more since and it makes sense to reconsider?
Maybe we're making ParsedNode -> WrittenTypes more complicated than it needs to be?

Paul's response:

As I recall, the reasons to use tree sitter:

performance

ability to adapt to use in existing syntax highlighting frameworks and therefore reuse the definition

I would add that parser combinator frameworks are, afaik, possibly not powerful enough for real programming languages. But I could be wrong on that note

I don't think there's anything to do here, and we're close to a successful use of tree-sitter such that we'll be able to abandon our old F#-based parser, but I think it's worth reflecting here more, if we're doing the right thing fundamentally.

StachuDotNet mentioned this issue Jan 14, 2024

Reference DBs, UserTypes, and UserFunctions in ASTs by ID rather than Name #3964

Closed

StachuDotNet added needs-review I plan on going through each of the issues and clarifying them -- this is to mark remaining issues and removed needs-review I plan on going through each of the issues and clarifying them -- this is to mark remaining issues labels Feb 14, 2024

StachuDotNet mentioned this issue Feb 19, 2024

Prevent conflicts of type names #4783

Closed

StachuDotNet added the ready-for-contribs There's work here that's relatively approachable, if you're interested! label Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component: Parsing, Pretty-Printing #5259

Component: Parsing, Pretty-Printing #5259

StachuDotNet commented Jan 14, 2024 •

edited

StachuDotNet commented Apr 3, 2024

Component: Parsing, Pretty-Printing #5259

Component: Parsing, Pretty-Printing #5259

Comments

StachuDotNet commented Jan 14, 2024 • edited

StachuDotNet commented Apr 3, 2024

StachuDotNet commented Jan 14, 2024 •

edited