Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight (some) regular expressions using another grammar #11

Open
sogaiu opened this issue May 29, 2023 · 7 comments
Open

Highlight (some) regular expressions using another grammar #11

sogaiu opened this issue May 29, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@sogaiu
Copy link

sogaiu commented May 29, 2023

I saw the following bit in the emacs-devel archives:

some files may consist of several parts requiring different tree-sitter
grammars. For example, a JavaScript file may have its documentation
written with jsdoc: JavaScript and jsdoc have a tree-sitter grammar
each.

Is there a way to use a tree-sitter grammar in parts of the file and
another one in other parts? There could be a main grammar and secondary
grammars would be activated on some kinds of nodes of the main one.

Yes, it should be possible, AFAIU. See the node "Multiple Languages"
in the ELisp manual, I believe it explains how to do what you want.

As an idea for "somewhere down the line", perhaps it would be interesting to consider the following...

Since tree-sitter-clojure can recognize regex literals, may be one could apply an appropriate regular expression grammar to highlight the portions within the double quotes.

I don't know how close this grammar is to Clojure's flavor of regex, but may be it or some appropriate modification to it (or something that inherits from it) might be used for the task.

For reference, the part of the manual being referred to in the quote above can be see in .texi form here. I didn't manage to find an HTML version. If you've got a recent enough Emacs from the emacs-29 branch, the info may be viewable from within emacs. Worked for me anyway...


Ah sorry. May be I should have made this in the Discussions area?

@dannyfreeman
Copy link
Contributor

Ah sorry. May be I should have made this in the Discussions area

No an issue is fine. I don't even get notifications from discussions lol.

This is a good idea. Clojure uses java flavored regular expressions. I'm not sure how much they are different from that grammar. If it is it might be worth forking and calling it tree-sitter-java-regex if the dialects of regex have enough differences.

@dannyfreeman dannyfreeman added the enhancement New feature or request label May 29, 2023
@dannyfreeman dannyfreeman self-assigned this May 29, 2023
@sogaiu
Copy link
Author

sogaiu commented May 29, 2023

I don't have the various flavors loaded into my head lately [1].

If I had to guess without looking too closely, I think this is likely to be some JavaScript flavor (or subset of one).

I also don't know / recall whether the various Clojure dialects all support the same regex syntax.

Perhaps this might come in handy eventually.


[1] Mostly working with PEGs in another language ;)

@sogaiu
Copy link
Author

sogaiu commented Jun 20, 2023

Came across this content among Lapce's files:

((regex_lit) @injection.content
 (#set! injection.language "regex"))

dannyfreeman added a commit that referenced this issue Aug 24, 2023
This grammar is bundled in nixos by default and seems good enough for
java regular expressions (the grammar probably supports more features
than java, idk).

Should address issue #11
dannyfreeman added a commit that referenced this issue Aug 24, 2023
This grammar is bundled in nixos by default and seems good enough for
java regular expressions. It is also maintained under the tree-sitter
github org so is "official".

In order to property identify the #" and closing " characters we have to
parse them with the clojure grammar (in case the regex grammar is not
available) and again with the regex grammar as part of the actual
pattern. This could be avoided if either the clojure grammar captured a
node for the inner contents of the regex literal, or the
treesit-range-settings supported some kind of offest argument like the
neovim tree-sitter mechanisms do.

Should address issue #11
@dannyfreeman
Copy link
Contributor

@sogaiu check this out 855cddd

Seems useful for other languages as well. Maybe even belongs in emacs core.

@sogaiu
Copy link
Author

sogaiu commented Aug 25, 2023

Thanks for the heads up!

Hope to take a look soon.

@sogaiu
Copy link
Author

sogaiu commented Aug 25, 2023

Ok, I gave it a try.

I see about capturing #" and ":

clojure-ts-mode-with-regex

@sogaiu
Copy link
Author

sogaiu commented Aug 25, 2023

On a side note, may be it's worth requesting that tree-sitter-regex get added to tree-sitter-module?

dannyfreeman added a commit that referenced this issue Aug 27, 2023
This grammar is bundled in nixos by default and seems good enough for
java regular expressions. It is also maintained under the tree-sitter
github org so is "official".

In order to property identify the #" and closing " characters we have to
parse them with the clojure grammar (in case the regex grammar is not
available) and again with the regex grammar as part of the actual
pattern. This could be avoided if either the clojure grammar captured a
node for the inner contents of the regex literal, or the
treesit-range-settings supported some kind of offest argument like the
neovim tree-sitter mechanisms do.

Should address issue #11
dannyfreeman added a commit that referenced this issue Aug 27, 2023
This grammar is bundled in nixos by default and seems good enough for
java regular expressions. It is also maintained under the tree-sitter
github org so is "official".

In order to property identify the #" and closing " characters we have to
parse them with the clojure grammar (in case the regex grammar is not
available) and again with the regex grammar as part of the actual
pattern. This could be avoided if either the clojure grammar captured a
node for the inner contents of the regex literal, or the
treesit-range-settings supported some kind of offest argument like the
neovim tree-sitter mechanisms do.

Should address issue #11
dannyfreeman added a commit that referenced this issue Aug 27, 2023
This grammar is bundled in nixos by default and seems good enough for
java regular expressions. It is also maintained under the tree-sitter
github org so is "official".

In order to property identify the #" and closing " characters we have to
parse them with the clojure grammar (in case the regex grammar is not
available) and again with the regex grammar as part of the actual
pattern. This could be avoided if either the clojure grammar captured a
node for the inner contents of the regex literal, or the
treesit-range-settings supported some kind of offest argument like the
neovim tree-sitter mechanisms do.

Should address issue #11

I think that multiple parsers per buffer may be too buggy to use right
now. There are situations where no regex will be present on in a buffer,
but the entire buffer will be highlighted as a regular expression. This
functionality probably needs upstream work in Emacs before we can merge
this into the main branch of clojure-ts-mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants