Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Clojure-mode tree sitter support in Emacs v29 #640

Closed
jasonjckn opened this issue Nov 27, 2022 · 22 comments
Closed

Feature: Clojure-mode tree sitter support in Emacs v29 #640

jasonjckn opened this issue Nov 27, 2022 · 22 comments

Comments

@jasonjckn
Copy link

Feature Request: Clojure-mode tree sitter support in Emacs v29

@dannyfreeman
Copy link
Contributor

dannyfreeman commented Dec 2, 2022

The current wisdom from the Emacs mailing list is to introduce tree-sitter support in a new distinct mode, in this case clojure-ts-mode. Integrating tree-sitter into the existing clojure-mode would make it extremely complicated because it would need to maintain backwards compatibility with older versions of emacs without tree-sitter.

With that in mind, I really do think creating a separate clojure-ts-mode would be the best move. I recently found myself with a LOT more free time and would be happy work on it. I think this would also be a good opportunity for Emacs to support clojure files out of the box by having clojure-ts-mode development done in the Emacs source tree instead of an independent package.

I don't really want to do that though without talking to the maintainers of this package first. What are your thoughts? Were there already plans underway or in the making to do this?

@dakra
Copy link
Contributor

dakra commented Dec 2, 2022

The current wisdom from the Emacs mailing list is to introduce tree-sitter support in a new distinct mode, in this case clojure-ts-mode. Integrating tree-sitter into the existing clojure-mode would make it extremely complicated because it would need to maintain backwards compatibility with older versions of emacs without tree-sitter.

While they always introduced a new mode they only introduced completely new files when it made sense.
E.g. In python.el, where python-mode comes from, now additionally has python-ts-mode.
Both modes derived from a common python-base-mode. See https://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/progmodes/python.el?h=64044f545add60e045ff16a9891b06f429ac935f#n6617
I don't know clojure-mode well enough to say a completely new file or simply extending clojure-mode.el is better, but just writing this here as an option.
This would avoid duplicating all the extra features like refactoring etc in both files.

@dannyfreeman
Copy link
Contributor

dannyfreeman commented Dec 2, 2022

E.g. In python.el, where python-mode comes from, now additionally has python-ts-mode.
Both modes derived from a common python-base-mode. See https://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/progmodes/python.el?h=64044f545add60e045ff16a9891b06f429ac935f#n6617

Including that code in the same file of python.el is a lot easier when it's part of Emacs though since it doesn't need to work with emacs 28 and below. Even if a clojure-ts-mode is written here I think there is a strong argument for keeping it in a separate file / package.

I don't know clojure-mode well enough to say a completely new file or simply extending clojure-mode.el is better, but just writing this here as an option.

What else is in clojure-mode besides some refactoring tools, font-locking, and indentation (all I personally rely on is font-locking and indent)? I believe tree sitter will take over the last two.

This would avoid duplicating all the extra features like refactoring etc in both files.

I'd imagine a lot of the refactoring code could greatly benefit from tree-sitter, and rewriting them to use tree-sitter would simplify the code and make it more performant. That's mostly speculation on my part though.

@bbatsov
Copy link
Member

bbatsov commented Dec 2, 2022

I think this would also be a good opportunity for Emacs to support clojure files out of the box by having clojure-ts-mode development done in the Emacs source tree instead of an independent package.

I don't really want to do that though without talking to the maintainers of this package first. What are your thoughts? Were there already plans underway or in the making to do this?

I'd welcome help with the introduction of tree-sitter support, but I think it'd be a huge mistake to move for something to be included in Emacs. We'd gain almost nothing, but we'd limit who can contribute to the package. Not to mention that now essentially the same people people maintain clojure-mode, inf-clojure, CIDER, etc, and this allows us to push changes quickly across all packages. That will be gone if something is bundled with Emacs.

Even today clojure-mode is part of NonGNU ELPA, so it's trivial to install out-of-the-box. I honest don't get people's obsession to push more things into Emacs's core with all the implications of this.

@bbatsov
Copy link
Member

bbatsov commented Dec 2, 2022

What else is in clojure-mode besides some refactoring tools, font-locking, and indentation (all I personally rely on is font-locking and indent)? I believe tree sitter will take over the last two.

Probably you're right. I'm not particularly familiar with TreeSitter, so I can't say much on the topic. I'd certainly welcome simpler font-locking and indentation logic (much of the indentation logic we've borrowed from lisp-mode). There's the practical matter that people will actually need Emacs 29 for this to work, that all Clojure dev tools depend on clojure-mode (e.g. to identify the boundaries of expressions or to font-lock results) and that we have to be careful for the sake of people who can't easily upgrade their Emacs (e.g. people constrained by some company policies). That'd be a tricky problem to solve.

My suggestion would be to start the new mode that you propose here, alongside the existing mode and to figure out how to integrate with the broader ecosystem as we go. As noted above I don't see much point in trying to have a Clojure mode built-into Emacs. If it were up to me - I'd move 2/3 of Emacs into packages - slim down the core and make it easier to release updates. And get rid of the damn contributor agreement.

@bbatsov
Copy link
Member

bbatsov commented Dec 2, 2022

The current wisdom from the Emacs mailing list is to introduce tree-sitter support in a new distinct mode, in this case clojure-ts-mode. Integrating tree-sitter into this clojure-mode would make it extremely complicated because it would need to maintain backwards compatibility with older versions of emacs without tree-sitter.

I skipped over this part, so clearly you're aware of the main problem with the introduction of the tree-sitter support. Again, I'm not against creating a separate package. I think somewhere in the issues here you'll find some plans to completely decouple the current package from lisp-mode, as there's plenty of legacy coming directly from it. Perhaps that's the right opportunity for a clean start. My only preference is to keep developing all Clojure tools for Emacs under the same umbrella, as we've done for the past decade. (I brought all the main Clojure projects for Emacs together in a single organization in early 2013 if I recall correctly)

@dannyfreeman
Copy link
Contributor

My only preference is to keep developing all Clojure tools for Emacs under the same umbrella, as we've done for the past decade. (I brought all the main Clojure projects for Emacs together in a single organization in early 2013 if I recall correctly)

Keeping it here is fine by me, adding something to Emacs was just a thought I wanted to float. I didn't realize clojure was in non-gnu elpa. Having it available in one of the built in package repositories is the next best thing. I won't touch your other comments about the Emacs contribution process 😉

I'm not against creating a separate package. I think somewhere in the issues here you'll find some plans to completely decouple the current package from lisp-mode, as there's plenty of legacy coming directly from it. Perhaps that's the right opportunity for a clean start.

How about this for a start: I'll see what it looks like to make a separate clojure-ts-mode.el in a personal fork of this repo. As I make progress I can report back here. Once there is a clearer picture of what a clojure-ts-mode mode looks like we can see what the next move is. Maybe it lives in this repo, maybe another repo under the clojure-emacs org.

@bbatsov
Copy link
Member

bbatsov commented Dec 2, 2022

@dannyfreeman Sounds like a great plan to me!

@dannyfreeman
Copy link
Contributor

dannyfreeman commented Dec 4, 2022

Here is a POC with tree sitter parsing clojure files: https://github.com/dannyfreeman/clojure-mode/blob/master/clojure-ts-mode.el

Something interesting we will need to figure out is how to distribute the tree-sitter grammer, which are compiled dynamic libraries. Do we precompile them for major OS/architectures combinations, do we include some setup script for users to compile it on their machines?

Edit: right now it's not clear to me how the built in tree sitter modes like for python and javascript are going to be distributed. I'll dig into that and look through the mailing list to see if the question of external packages adding tree sitter grammers had come up yet.

@bbatsov
Copy link
Member

bbatsov commented Dec 4, 2022

Here is a POC with tree sitter parsing clojure files: https://github.com/dannyfreeman/clojure-mode/blob/master/clojure-ts-mode.el

Nice! Do functions like forward-sexp, beginning-of-defun, etc work with this? I assume some Clojure grammar for TS already exists and we don't actually need to do anything on this front, right?

Something interesting we will need to figure out is how to distribute the tree-sitter grammer, which are compiled dynamic libraries. Do we precompile them for major OS/architectures combinations, do we include some setup script for users to compile it on their machines?

I think most modes that require modules to be compiled to some combination of both - you can compile the module locally or download it precompiled. The only one I've recently used was vterm, so maybe it can provide some inspiration.

Edit: right now it's not clear to me how the built in tree sitter modes like for python and javascript are going to be distributed. I'll dig into that and look through the mailing list to see if the question of external packages adding tree sitter grammers had come up yet.

Roger that! Btw, is there also some official docs of the tree sitter support in Emacs that we can peruse? I'm curious about everything it has to offer.

@kommen
Copy link
Contributor

kommen commented Dec 4, 2022

Roger that! Btw, is there also some official docs of the tree sitter support in Emacs that we can peruse? I'm curious about everything it has to offer.

Here are the tree-sitter notes in the emacs mirror repo, including a starter guide for how to add support for major modes: https://github.com/emacs-mirror/emacs/tree/master/admin/notes/tree-sitter

@dannyfreeman
Copy link
Contributor

Nice! Do functions like forward-sexp, beginning-of-defun, etc work with this? I assume some Clojure grammar for TS already exists and we don't actually need to do anything on this front, right?

I'm using an existing grammer, I think the same one used by neovim: https://github.com/sogaiu/tree-sitter-clojure
Sexpression operations like forward-sexp work fine on lists/vectors. They do not work well on keywords and symbols with periods in them, probably some other characters too.

Here's an example, where | is the cursor

:mxdCase/seg.mnt|
C-M-b
:mxdCase/seg.|mnt
C-M-b
:|mxdCase/seg.mnt

I'm not sure what the problem is because the tree sitter grammer correctly identifies the keyword as 1 unit, a kwd_lit.

beginning-of-defun and end-of-defun have no problem.

Roger that! Btw, is there also some official docs of the tree sitter support in Emacs that we can peruse? I'm curious about everything it has to offer.

I've been using the same thing @kommen posted. I've also been looking in the master branch's python-mode. Some pre-compiled docs can be found here: https://github.com/emacs-mirror/emacs/tree/master/admin/notes/tree-sitter/html-manual (these might get removed at some point, leaving only the source)

@dannyfreeman
Copy link
Contributor

dannyfreeman commented Dec 4, 2022

image

Looking very promising. Something that is stumping me right now though is highlighting namespaced keywords like clojure-mode does, where the namespace part and name part have different faces. I've asked in the grammer repo and in the mailing list about it today:
https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg00112.html
sogaiu/tree-sitter-clojure#28

This grammer is really pleasant to work with. It identifies metadata easily, so we can apply font locking to metadata, typehints without any trouble. I added a special rule that highlights the @ deref sugar with the font-lock-warning-face. It's just so easy once you figure out the tree sitter query syntax.

@jasonjckn
Copy link
Author

jasonjckn commented Dec 6, 2022

@dannyfreeman

Something interesting we will need to figure out is how to distribute the tree-sitter grammer, which are compiled dynamic libraries. Do we precompile them for major OS/architectures combinations, do we include some setup script for users to compile it on their machines?

Some package managers like Nix, are going to have built-in support for grammars, and easily be able to include clojure.

Outside, of this, we may be able to petition emacs-tree-sitter/tree-sitter-langs#143 to get clojure added as one of the supported language which does prebuilt binaries as you're saying. I downloaded https://github.com/emacs-tree-sitter/tree-sitter-langs/releases/download/0.12.8/tree-sitter-grammars-macos-0.12.8.tar.gz but they have virtually every language - but clojure.

As a final fallback, vterm approach sounds viable, the compiled library can be added to (concat user-emacs-directory "/tree-sitter/") directory under filename libtree-sitter-clojure.{dylib,so} and should get auto detected. Although there's also treesit-extra-load-path if needed.

Here is a POC with tree sitter parsing clojure files: https://github.com/dannyfreeman/clojure-mode/blob/master/clojure-ts-mode.el

amazing work! i will try out your POC when I get a chance.

@jasonjckn
Copy link
Author

jasonjckn commented Dec 6, 2022

If anyone is looking for a build of emacs with tree sitter support and clojure grammar, this should work on all platforms

nix build -L --no-write-lock-file github:jasonjckn/emacs-overlay/clojure\#emacsGit
./result/bin/emacs 

@dannyfreeman
Copy link
Contributor

Outside, of this, we may be able to petition emacs-tree-sitter/tree-sitter-langs#143 to get clojure added as one of the supported language which does prebuilt binaries as you're saying.

That could be useful. It should be noted that I'm not using the emacs-tree-sitter package. I'm using the one built into Emacs 29. They have different APIs, and the one in Emacs core is much faster.

@jasonjckn
Copy link
Author

@dannyfreeman

It should be noted that I'm not using the emacs-tree-sitter package. I'm using the one built into Emacs 29. They have different APIs, and the one in Emacs core is much faster.

No doubt, but the parser binaries in tree-sitter-langs should still work with the new emacs master 'treesit', afaik.

@dannyfreeman
Copy link
Contributor

I made a draft pr for this to start gathering feedback #644

@dannyfreeman
Copy link
Contributor

@bbatsov I'm starting to think that the development of a clojure-ts-mode might be better done under a different repository rather than in this repository clojure mode.

Users could opt into installing it if they are interested and can use new functionality in Emacs like major-mode-remap-alist to prefer loading it over clojure-mode. Meanwhile, clojure-mode specific functions should continue to work while a separate clojure-ts-mode is active.

It's going to remain in some kind of "beta" stage for a long time I suspect. There are some outstanding bugs with tree-sitter that we have discovered that may not be fixed for while: sogaiu/tree-sitter-clojure#32 These bugs affect basic functions like beginning-of-defun and end-of-defun when tree-sitter is enabled. While clojure-ts-mode is in that state, it would be easier for me to develop in a separate repository. It could be distributed independently of clojure-mode, gather feedback and bugs from users, that kind of thing.

What do you think about that? Would it be possible to create a separate repo under the clojure-emacs org for this? I would happily maintain it if given access. When it stabilizes and Emacs 29 is more widespread we could consider merging it back in here. I think that would be a year or more down the line though.

@bbatsov
Copy link
Member

bbatsov commented Dec 20, 2022

@bbatsov I'm starting to think that the development of a clojure-ts-mode might be better done under a different repository rather than in this repository clojure mode.

I've been thinking about this myself and I completely agree, so here you go https://github.com/clojure-emacs/clojure-ts-mode You'll have admin access to the repo and the original clojure-mode repo once you accept the invite I sent you.

I think a separate repo will simplify distribution and documentation a lot anyways, and I really want to publish your work very soon to MELPA (at least), so it'd be easier for people to play with it.

@dannyfreeman
Copy link
Contributor

Thank you! I'll get to work on that repo soon. I can handle getting it on melpa and nongnu elpa. I can try to get it published there a little after Christmas. I'll also close down my PR on this repo.

@bbatsov
Copy link
Member

bbatsov commented Dec 20, 2022

@dannyfreeman Sounds like a plan to me! Let's close this ticket and have all the conversations about clojure-ts-mode on its issue tracker going forward.

I'll add a mention to clojure-mode's README about it, so it's easier to discover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants