-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
complete the grammar #1
Conversation
Ok, this is a huge mix of things I really want in with things I won't accept at all, so let me try to separate them here first. Before the list: for me tree-sitter is not only about syntax-highlight. I want this parser to be mostly correct with what it parses, and not to simply match as much of wrong code as possible. I'm not ok with oversimplifications for no reason, as an ERROR token should signal something is in fact wrong with your code. I don't want the tree to be the simplest possible, I want it to be mostly correct. I already changed my README yesterday to reflect that (you didn't git pull, the "fuzzyfication" you did I had already done).
So, I think those are all the changes to the code itself (the extra files you added I'm ok with). Point it out if I overlooked something and tell me whether you want to adapt the pull request so you get properly credited in the commit history or if you don't want to have the trouble, in which case I'll commit the Love part myself, making a mention of you in the README and leaving your name in the header you added. |
Oh, I was busy reading the changes and forgot to say thank you for the pull request 🙈 So, thank you for your work, even if I don't agree with everything. |
To be perfectly frank, that is not the stance of Neovim (and nvim-treesitter). For us, tree-sitter is a tool for working with code in a text editor, not a general-purpose parser and definitely not a linter. This:
is very much a non-goal. So
(We had major issues with parsers that have objectively unacceptable parsing speed, which makes them unusable for our purpose.) Of course, if your goals are different and don't align, that is fine -- there are many valid applications of tree-sitter besides ours -- but it may mean that we just cannot use your parser in Neovim. (Although I hope that it doesn't come to that and both goals can be met satisfactorily.) |
I understand that we have some different goals, but I don't think that they are necessarily conflicting yet. There is one question: is this parser with all the modifications OBJECTIVELY more performant than the parser withtout the ones I'm rejecting? If yes, than maybe we should clone this repo into a I don't plan to go overboard with the parser either, don't expect me to all of sudden come back changing things unless it is to fix corner cases or add missing/new features of MATLAB. No matter the result of this disscussion I'll implement the "Love" part and then the parser will be considered stable/release, so I won't be modifying it without need. For me, looking at it through your perspective of nvim-treesitter's goals, the things in the "Hate" list are premature optmizations. If they don't really make a meaninful difference in performance, then there is no reason for that kind of simplification. And about the |
I don't think we are necessarily in conflict, either; I just wanted to make clear our stance after you made clear yours, so everybody knows where the other person is coming from. And the question is less about optimal performance and more about unacceptable performance. If (and only if) your goal of a strict parser leads to the latter, we'll simply stick with the current parser until someone else writes and proposes one that is a better fit for us (no need to change the name; you didn't feel the need to rename yours to distinguish it from the previous one). As regards the specific changes, I have no personal opinion either way (having left Matlab years ago and never looked back; I'm not even recommending it to my students anymore). |
It's good to know where we are standing, I agree. I just created a file with about 1000 lines using a concatenation of some of my code (lots of functions one after the other) and my parser (without any modification) took about 3ms to run it, the same as this one. Then I copied the file into itself 10 times for a file with about 10000 lines. Both parsers took 31ms. The changes don't see to make any meaninful difference in performance. Way of measuring: Afterwards I though about using only the different parts (multivar assignment and string with formatting) to see how badly it goes. Same thing again, same time to both. And yeah, soft-power made me stick to MATLAB. "Manda quem pode, obedece quem tem juízo". |
Oh, I skipped that. Actually, tree-sitter's documentation asks for projects to be named this way, that's way I named it so. https://tree-sitter.github.io/tree-sitter/creating-parsers#project-setup |
No, that's fine, that's why I'm saying different parsers can (and need to) share the same name. Anyway, the point is simply that: as long as the parser is objectively more correct and not significantly less performant, we will switch. And the (non)existence of a hypothetical better parser does not factor into this decision. (Of course, if someone later comes with a better parser -- say, just as correct but noticeably faster -- we have no qualms of switching to that from yours...) |
I haven't touched matlab in a while, but creating complex rules to wiggle out a couple of items to be a strict syntax checker is not really what tree-sitter is about, it makes sense if, say, the state count is massively inflated or it causes conflicts that resolve incorrectly everywhere, but in this case, I would think it's not worth it.
Well, the format verbs are only valid in the context of a printing function, no? Otherwise, a
Well, isn't that an LSPs job? This is one change I could add back, but it was causing some annoying conflicts that I thought solving it in 5 seconds by removing it was easier.
I don't think it'd make it simpler, and all it does for the purposes of the cst is pollute the tree, you'd still check the actual operator either way for a formatter I'd think.
Well, I saw |
I'm not giving up correctness for no reason. If it is not causing a real problem nor creating a performance drawback then it's not going away.
Escapes are also only valid in that context, but there is no way for us to know since the user can just be doing fmt = "%.2f"
sprintf(fmt, 2)
I'm not giving up correctness for no reason. I think that at this point it has already been established that I would rather not be in nvim-treesitter than to accept those 3 specific changes.
Not for a formatter, you would only probably care what kind of operation you have (binary, boolean, post, pre). The actual operator doesn´t matter. If you know where it actually is you just copy the string without having to do any extra check.
At first I thought I had missed it too, but it's not an operator but a function. And there is nothing to be changed, command arguments accepts those arguments just fine. I'm sorry but you won't change my mind on the 3 rejections, I really dislike those 3 changes. If running the performance tests I had seen a difference I would change my mind, but there is no gain at all. Truly none. So I'm not changing my stance on those. This talk about complexity reminds me of people in my field talking about the conservativeness of equations: really nice on paper with zero improvements on real life, so what's even the point? I'm not giving up on that correctness for an imaginary improvement. |
It's your parser; you can do whatever you like with it. And as I said, as long as it's better in some regards and not worse in every other (by a non-negligible margin), I am happy to switch to it. (But I can assure you that conservativeness of schemes does matter, practically ;)) |
@clason at this point I'm just waiting for @amaanq to remove the three changes from the request or tell me to apply the changes myself, so I can fix the tests and other queries and be done with it. Going off-topic, but conservativeness of schemes? Are you talking IT or math here? What I'm talking about is this kind of conservativeness: you have that |
That is not a use of the word I was familiar with ;) (And that sounds like a fairly standard semidefinite optimization problem, for which there exist a whole zoo of solvers?) |
This is quite common meaning of the word in the field of Electrical Engineering, more specifically Control Theory. I believe mathematicians use it too. And yes, there are plenty of LINEAR solvers. But the problems are born, usually, non-linear, and that's where you start to have this kind of problem. You cannot give |
I apologize if I came off as trying to strong-arm you into accepting my changes as absolute - I chose what was simpler and passed all tests nicely, while ensuring syntax highlighting & related worked fine. I added back what you wanted back for the most part, and fixed a couple of scanner bugs detected in test suites, though whether some of these bugs were actually bugs is subjective..for example, is a standalone % a format verb? "Hey, % is a %". I'd assume not, but it was being parsed as one. Another was double quotes only surrounded by whitespace, e.g. One of the fuzzing failures, interestingly, is a bug with tree-sitter and not this repo (it's mentioned here if you're interested) Let me know if there's anything else to discuss, and if not I can update all the tests then |
That was exactly the impression I was under. Apologies accepted. I'm now in accordance with the changes, except The file a.b(3) + 2 is parsed by these changes as
and it should be something like
So I'll just accept the pull request and fix that and the tests myself. Waiting for you to do that would be nitpicking. Thank you very much for the changes, especially for catching those bugs. On the infinity-loop you found: was it when running tests? I didn't come across any on my machines. |
It's from fuzzing, I can send the script to test it, and usually it does catch bugs with external scanners pretty quickly Actually its in ts questions here now: https://github.com/sogaiu/ts-questions/blob/master/questions/failed-fuzzing/script/test-fuzzing.sh A simpler version: #!/bin/sh
set -eu
ROOT_DIR="fuzzer"
LANG=$1
TIME=$2
CPP=$3
# if scanner = scanner.cc then XFLAG = c++ else XFLAG = c
if [ "$CPP" = "cpp" ]; then
SCANNER="scanner.cc"
XFLAG="c++"
else
SCANNER="scanner.c"
XFLAG="c"
fi
shift 3
export PATH="/root/.cargo/bin:$PATH"
export CFLAGS="$(pkg-config --cflags --libs tree-sitter) -O0 -g -Wall"
JQ_FILTER='.. | if .type? == "STRING" or (.type? == "ALIAS" and .named? == false) then .value else null end'
build_dict() {
jq "$JQ_FILTER" <src/grammar.json |
grep -v "\\\\" | grep -v null >"$ROOT_DIR/dict"
}
build_fuzzer() {
cat <<END | clang -fsanitize=fuzzer,address $CFLAGS -lstdc++ -g -x $XFLAG - src/$SCANNER src/parser.c $@ -o $ROOT_DIR/fuzzer
#include <stdio.h>
#include <stdlib.h>
#include <tree_sitter/api.h>
#ifdef __cplusplus
extern "C"
#endif
TSLanguage *tree_sitter_$LANG();
#ifdef __cplusplus
extern "C"
#endif
int LLVMFuzzerTestOneInput(const uint8_t * data, const size_t len) {
// Create a parser.
TSParser *parser = ts_parser_new();
// Set the parser's language.
ts_parser_set_language(parser, tree_sitter_$LANG());
// Build a syntax tree based on source code stored in a string.
TSTree *tree = ts_parser_parse_string(
parser,
NULL,
(const char *)data,
len
);
// Free all of the heap-allocated memory.
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
END
}
generate_fuzzer() {
tree-sitter generate
}
makedirs() {
mkdir -p "$ROOT_DIR"
mkdir -p "$ROOT_DIR/out"
}
makedirs
generate_fuzzer
build_dict
build_fuzzer $@
cd "$ROOT_DIR"
./fuzzer -dict=dict -timeout=2 -max_total_time=$TIME out/ and then just run ./fuzz.sh matlab 5 c |
from nvim-treesitter/nvim-treesitter#4944
There's a lot of changes, frankly I was too busy writing code to segment my commits so I hope you'll just review it as it is altogether
Main updates:
Todo:
Thanks!