Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emacs crashes due to SIGABRT when using new scanner #38

Closed
erik-overdahl opened this issue Oct 14, 2023 · 25 comments
Closed

Emacs crashes due to SIGABRT when using new scanner #38

erik-overdahl opened this issue Oct 14, 2023 · 25 comments

Comments

@erik-overdahl
Copy link

erik-overdahl commented Oct 14, 2023

When building the grammar from HEAD (currently b553906) and attempting to use the grammar to highlight an HCL file in Emacs 29.1, the grammar causes Emacs to crash with a munmap_chunk(): invalid pointer error.

I am unsure of whether this bug resides in the grammar or within Emacs. It only occurs when Emacs is configured with the flag --with-pgtk, which tends to be finicky. However, it does not occur with the v1.1.0 release from this repo, and so may be a problem with the rewritten scanner. I have also submitted a report to the GNU Emacs bug tracker.

I've written up a full reproduction of the crash here: https://github.com/erik-overdahl/emacs-29-pgtk-ts-crash-bugreport. Please let me know if you need more information.

@MichaHoffmann
Copy link
Collaborator

Hey thanks for the report! It's weird that this doesn't happen during CI since that example.hcl is also parsed there i think! I'll give it a look tomorrow!

@MichaHoffmann
Copy link
Collaborator

Thanks, your repro script worked fine!

Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
Using grammar at main with pgtk. This should FAIL.
Cloning repository
Compiling library
Library installed to ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so
munmap_chunk(): invalid pointer
Fatal error 6: Aborted
Backtrace:
emacs[0x513632]
emacs[0x4f211f]
emacs[0x511c6a]
emacs[0x511cc2]
/lib64/libc.so.6(+0x3dbb0)[0x7f4d5a1babb0]
/lib64/libc.so.6(+0x8e884)[0x7f4d5a20b884]
/lib64/libc.so.6(gsignal+0x1e)[0x7f4d5a1baafe]
/lib64/libc.so.6(abort+0xdf)[0x7f4d5a1a387f]
/lib64/libc.so.6(+0x2760f)[0x7f4d5a1a460f]
/lib64/libc.so.6(+0x987b5)[0x7f4d5a2157b5]
/lib64/libc.so.6(+0x98a4c)[0x7f4d5a215a4c]
/lib64/libc.so.6(__libc_free+0xca)[0x7f4d5a21a25a]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(+0x41076)[0x7f4d590db076]
/root/.emacs.d/tree-sitter/libtree-sitter-hcl.so(tree_sitter_hcl_external_scanner_deserialize+0x31)[0x7f4d590dc422]
/lib64/libtree-sitter.so.0(ts_parser_parse+0x630)[0x7f4d5a688250]
emacs[0x5ea49d]
emacs[0x578fd6]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b850]
emacs[0x57a478]
emacs[0x57a9f9]
emacs[0x57a198]
emacs[0x57c1f0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x5682c0]
emacs[0x57a198]
emacs[0x57a9f9]
emacs[0x57bc88]
emacs[0x57a198]
emacs[0x59e2a9]
emacs[0x5a5743]
emacs[0x5a6651]
emacs[0x579036]
emacs[0x5b9151]
emacs[0x57abab]
emacs[0x57b13b]
emacs[0x576bc5]
emacs[0x5a5ff6]
emacs[0x579036]

@MichaHoffmann
Copy link
Collaborator

MichaHoffmann commented Oct 15, 2023

When i provide it the library that gets generated by tree-sitter when you do tree-sitter parse it just works

docker run -it -v /var/home/mhoffm/git/emacs-29-pgtk-ts-crash-bugreport/tmp:/root/.emacs.d emacs-29-bugreport-pgtk:latest emacs --batch -l minimal-reproduce.el
 fedora  ~  git  emacs-29-pgtk-ts-crash-bugreport   main 

@erik-overdahl
Copy link
Author

An Emacs maintainer has also found that the grammar works fine if it is built manually: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#11

I imagine that means the problem is in the way Emacs builds the grammar? Here is the relevant code from Emacs' treesit-install-language-grammar function:

          ;; We need to go into the source directory because some
          ;; header files use relative path (#include "../xxx").
          ;; cd "${sourcedir}"
          (setq default-directory source-dir)
          (message "Compiling library")
          ;; cc -fPIC -c -I. parser.c
          (treesit--call-process-signal
           cc nil t nil "-fPIC" "-c" "-I." "parser.c")
          ;; cc -fPIC -c -I. scanner.c
          (when (file-exists-p "scanner.c")
            (treesit--call-process-signal
             cc nil t nil "-fPIC" "-c" "-I." "scanner.c"))
          ;; c++ -fPIC -I. -c scanner.cc
          (when (file-exists-p "scanner.cc")
            (treesit--call-process-signal
             c++ nil t nil "-fPIC" "-c" "-I." "scanner.cc"))
          ;; cc/c++ -fPIC -shared *.o -o "libtree-sitter-${lang}.${soext}"
          (apply #'treesit--call-process-signal
                 (if (file-exists-p "scanner.cc") c++ cc)
                 nil t nil
                 `("-fPIC" "-shared"
                   ,@(directory-files
                      default-directory nil
                      (rx bos (+ anychar) ".o" eos))
                   "-o" ,lib-name))
          ;; Copy out.

So it looks like the shared object is built with

git clone --depth=1 https://github.com/MichaHoffmann/tree-sitter-hcl.git
cd tree-sitter-hcl/src
cc -fPIC -I. -c parser.c
cc -fPIC -I. -c scanner.c
cc -fPIC -shared *.o -o libtree-sitter-hcl.so

If I run this and then mount the resulting object into the container at /root/.emacs.d/tree-sitter/libtree-sitter-hcl.so, I still get the crash.

I don't see any shared object produced when running tree-sitter parse. I am misunderstanding? How did you produce it?

@MichaHoffmann
Copy link
Collaborator

MichaHoffmann commented Oct 15, 2023

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.

Cc @amaanq do you know if any other parser had similar issues?

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

@erik-overdahl
Copy link
Author

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

I did, yes; that's the most recent commit in my repo.

I can confirm that the .so object created by tree-sitter parse at ~/.cache/tree-sitter/lib/hcl.so works just fine :) This is an acceptable workaround for me, but I will keep the bug open with the Emacs maintainers since most people will install these with the Emacs function treesit-install-language-grammar, and the result of that function still produces this crash.

The shared object produced by tree-sitter parse is around 20% larger than the one produced by Emacs, but looking into the binaries with something like diff <(readelf -h -S ~/.cache/tree-sitter/lib/hcl.so) <(readelf -h -S ~/.emacs.d/tree-sitter/libtree-sitter-hcl.so) seems to show that most of the difference is debug info, e.g.:

77,98c77,82
<   [26] .debug_aranges    PROGBITS         0000000000000000  0003f1b8
<        0000000000000070  0000000000000000           0     0     1
<   [27] .debug_info       PROGBITS         0000000000000000  0003f228
<        000000000000355b  0000000000000000           0     0     1
<   [28] .debug_abbrev     PROGBITS         0000000000000000  00042783
<        00000000000006ea  0000000000000000           0     0     1
<   [29] .debug_line       PROGBITS         0000000000000000  00042e6d
<        0000000000017bfc  0000000000000000           0     0     1
<   [30] .debug_str        PROGBITS         0000000000000000  0005aa69
<        00000000000013e9  0000000000000001  MS       0     0     1
<   [31] .debug_line_str   PROGBITS         0000000000000000  0005be52
<        000000000000024c  0000000000000001  MS       0     0     1
<   [32] .debug_loclists   PROGBITS         0000000000000000  0005c09e
<        000000000000bc65  0000000000000000           0     0     1
<   [33] .debug_rnglists   PROGBITS         0000000000000000  00067d03
<        00000000000002f2  0000000000000000           0     0     1
<   [34] .symtab           SYMTAB           0000000000000000  00067ff8
<        00000000000006f0  0000000000000018          35    52     8
<   [35] .strtab           STRTAB           0000000000000000  000686e8
<        00000000000005f9  0000000000000000           0     0     1
<   [36] .shstrtab         STRTAB           0000000000000000  00068ce1
<        000000000000018d  0000000000000000           0     0     1
---
>   [26] .symtab           SYMTAB           0000000000000000  000551b8
>        0000000000000828  0000000000000018          27    65     8
>   [27] .strtab           STRTAB           0000000000000000  000559e0
>        000000000000069b  0000000000000000           0     0     1
>   [28] .shstrtab         STRTAB           0000000000000000  0005607b
>        000000000000011d  0000000000000000           0     0     1

Anyway, thanks for looking at this bug. I'm now more convinced that the problem is on Emacs's side. I'll leave it up to you whether or not to keep this issue open.

@amaanq
Copy link
Member

amaanq commented Oct 15, 2023

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.

Cc @amaanq do you know if any other parser had similar issues?

edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

No, it'd be nice to know how the parser is compiled in emacs but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though

Edit - didn't read enough

@mhoffm-aiven
Copy link
Collaborator

mhoffm-aiven commented Oct 16, 2023

It's in $HOME/.cache/tree-sitter for me but I also compiled with the emacs commands and it worked on my machine. I'll try tomorrow again, maybe I made a mistake. I also patched emacs to compile the parser with -g and during serialization and deserialization the state of the scanner looks pretty wrong. If I compile outside of the container in my checkout of this repo it works and the state looks fine though.
Cc @amaanq do you know if any other parser had similar issues?
edit: did you also change the minimal-reproduce.el so that it not compiles and overwrites your mounted file?

No, it'd be nice to know how the parser is compiled in emacs but if it's not reproducible w/ the tree-sitter compiled binary then I'd say it's an emacs issue, still would be nice to figure out why though

Edit - didn't read enough

Thank you @amaanq !

@mhoffm-aiven
Copy link
Collaborator

@erik-overdahl I'll keep this open. It feels like the library should not fail with invalid memory access, regardless of compilation flags or compiler there. Also the way emacs compiles it does not even look offensive at all so i would really like to know whats happening before i'll close the issue. I'm happy that we found a workaround for you though!

@erik-overdahl
Copy link
Author

Unfortunately, I have discovered that using the object file produced by tree-sitter parse does not actually work in Emacs. It does not crash Emacs outright, but the parse tree produced contains only a small handful of ERROR tokens, while the same file fully parses the same file when used via tree-sitter parse. The new scanner is unusable in Emacs.

@erik-overdahl
Copy link
Author

erik-overdahl commented Oct 18, 2023

The Emacs maintainers have decided that this isn't a problem on their side https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66549#35

Unless you want to keep digging further, the easiest "fix" is probably to put a note in the readme that HEAD doesn't work with Emacs 29.1 builtin treesit mode.

If you look at the last few messages in the Emacs bugreport thread, you'll see that the crash only happens when compiled at -O0 (which Emacs's treesit-install-language-grammar function does), but that I never was able to produce the correct parse tree in Emacs.

@MichaHoffmann
Copy link
Collaborator

Hey sorry for not responding but I had no time last days. Let's dog a bit more over the weekend, it's definitely something I want to understand

@MichaHoffmann
Copy link
Collaborator

@amaanq do you have an idea? I have not been able to reproduce this outside of emacs, even fuzzing for a while. Because i have no idea how to fix forward I start leaning towards rolling back the move to C, would that be acceptable from your side?

@amaanq
Copy link
Member

amaanq commented Oct 20, 2023

I have an idea but let me get to a computer first in a couple hours and I'll send a patch as to what I think a fix could be, but if you'd like to implement my idea now replace the serialized byte lengths that are casted to char/uint8_t with a memcpy of all 4 bytes instead, or just write each byte properly with shifts

@MichaHoffmann
Copy link
Collaborator

Sorry @amaanq ; are you able to send a PR please?

@amaanq
Copy link
Member

amaanq commented Nov 7, 2023

Sorry, yes let me do that

#39

It'd be nice @erik-overdahl if you can try those changes out and see if the crashing is fixed

@MichaHoffmann
Copy link
Collaborator

Hey @erik-overdahl ; latest main solves it for me with your repro instructions! Can you check it out please?

@erik-overdahl
Copy link
Author

erik-overdahl commented Nov 13, 2023 via email

@MichaHoffmann
Copy link
Collaborator

Can you help me reproduce that? it should not contain errors. Is there a way to dump the parse tree from emacs? [ sorry no emacs user here ]

@MichaHoffmann
Copy link
Collaborator

But not crashing is a good first step !

@MichaHoffmann
Copy link
Collaborator

i managed with

(with-temp-buffer
  (insert-file-contents "example.hcl")
  ;; `treesit-parser-create' creates a parser for the buffer that is
  ;; then invoked lazily. Using `treesit-parse-string' to force parse.
  (append-to-file (treesit-node-string (treesit-parse-string (buffer-string) 'hcl)) nil "/dev/stdout"))

@MichaHoffmann
Copy link
Collaborator

yeah strings look pretty messed up; let me try to fix it!

@MichaHoffmann
Copy link
Collaborator

@erik-overdahl i have a version that parses successfully! ill push in a minute

@MichaHoffmann
Copy link
Collaborator

@erik-overdahl can you try locally from mhoffm-fix-emacs-parse-errors ?

@MichaHoffmann
Copy link
Collaborator

Since I was able to reproduce initially but not on latest main, ill close this. @erik-overdahl if you still have the issue please reopen! If you can confirm that this is also solved for you it would be cool to know too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants