Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preliminary implementation of a tree-sitter backend #471

Closed
wants to merge 6 commits into from

Conversation

chaserhkj
Copy link
Contributor

Generates grammar.js file that could be used with tree-sitter to generate parsers with many different languages. Most of the functionalities of BNFC should be already implemented and working.

There are two remaining issues/features that are not implemented yet:

  1. tree-sitter does not distinguish between a lexer and a parser, this created issues in their keyword recognition (documented here). They use a keyword extraction mechanism to list token rules that could accept a keyword explicitly.
    The only way to do this cleanly from the BNFC side is to run all user-defined token patterns against all keywords in the input grammar to determine which patterns should be listed to do keyword extraction. Unfortunately, there are no regex facilities in BNFC to do this currently and an external regex engine is needed.
    This branch currently just puts all user-defined token patterns into the keyword extraction list for safety. The performance impact of such practice is unknown. How should BNFC approach this problem should be discussed further.
  2. Like many other lexers, the regular expression used by tree-sitter does not support general set differences. However, tree-sitter supports a mechanism known as external scanners that could be used to implement this. This is not implemented yet.

These are documented in the source code using TODO comments as well.

Should help #193 and #394

@chaserhkj
Copy link
Contributor Author

Rev: Removed usage of BlockArguments to be compatible with older GHC

@wenkokke
Copy link

I think the primary thing that appears to be missing is support for layouting, for which we'd have to generate custom scanners.

@chaserhkj
Copy link
Contributor Author

Yes, layouting features would require a custom scanner as well. I think that would be an easier starting point for introducing custom scanners for this backend. I would look into it in my spare time.

That being said, the current state of the branch already satisfies my current needs thus it would probably be a lower priority item on my list.

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

There is end-to-end testing for the other backends in the testing project (unfortunately not run by CI), how would you go about integrating the tree-sitter backend there?

The matrix for these system tests is set up here:

parameters :: [TestParameters]
parameters = concat
[ []
-- OCaml/Menhir
, [ ocaml { tpName = "OCaml/Menhir"
, tpBnfcOptions = ["--ocaml", "--menhir"] }
]
-- OCaml
, [ ocaml ]
-- Functor (Haskell & Agda)
, [ haskellAgdaFunctorParameters]
-- C++ (extras)

source/src/BNFC/Backend/TreeSitter/RegToJSReg.hs Outdated Show resolved Hide resolved
@chaserhkj
Copy link
Contributor Author

I just added the parameters for testing tree-sitter backend to the testing parameter matrix.

Currently, the backend is only generating grammar.js and no test code is generated together with the grammar. So in testing I just called tree-sitter parse to use the grammar file to parse input directly.

Unfortunately, there are a number of test failures. I will use next week to try to fix these failures. Before that, I will mark this PR as draft.

@chaserhkj chaserhkj marked this pull request as draft December 10, 2023 08:05
@chaserhkj
Copy link
Contributor Author

I am closing this PR since the main commits of it has already been merged (Ref: f1c5101)

A few issues are emerging and related larger changes are made to the tree-sitter backend when I am working on the tests. I will include them in a separate PR.

@chaserhkj chaserhkj closed this Feb 2, 2024
@andreasabel
Copy link
Member

I am closing this PR since the main commits of it has already been merged (Ref: f1c5101)

I have not recollection of merging the branch but maybe I did it by accident when I checkout out your PR. :$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants