-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop parsing on first syntax error. #20168
Conversation
dcfc218
to
1502965
Compare
I'm somewhat leery of this. I wrote a Lisp interpreter in Snobol for a school assignment. Snobol stops at the first error. This was using punch cards and the turnaround time was 8-12 hours during the day, dropping to .5 hr at 3am. It was awful. I have hated compilers that don't try to recover ever since. I can see giving up the current section of code if there are several errors in a few adjacent lines. But why not then skip ahead some looking for a semi colon immediately followed by a new line, and continue trying from there? |
1e18ad6
to
d34aa2c
Compare
Hey @khwilliamson can I ask you to try it before you pass judgement? My experience is that the storm of error messages from perl getting confused is really unhelpful, and results in old-times teaching new-comers to "go through it and look for the lowest line number" and such things. I have been using it for some hacking and i have been quite pleased with it. At first it was a little jarring, I'm used to having to sift through a heap of ridiculous and meaningless errors to find the one that is relevant and with the patch it "throws your eyeballs" a bit not having all that crap there, but I got quickly used to it and when i go back to the old rules it throws me the other way now. :-) If you have thoughts how to safely restart parsing at a semicolon then i think you could do a follow up patch, but given the vagaries of parsing perl IMO that might not be quite as useful as you think: consider code like: Also, I am happy to make this a configurable option with whatever default we want. If people like the storm of hallucinatory errors that perl produces from common syntax errors then they are welcome to build with them, so long as I am welcome to build without them. :-) |
Also note that this patch includes a revert of a patch from #16300 which caused breakage with Module::Install. We need to decide what to do about that. |
b8309b6
to
2d10740
Compare
I have added a workaround patch for the issue in Module::Install::DSL. We convert INIT blocks from that namespace to be BEGIN blocks. I thought about the added restriction of "INIT blocks in an eval", but it didnt seem necessary. With that i think in theory this PR should be "ok" to go and not cause havok in the CPAN river. |
This converts INIT {} blocks from the Module::Install::DSL namespace into BEGIN blocks. This works around the bug reported in GH Issue #16300. (Hopefully, not fully tested yet.) Which in turn should allow us to close the bug in #2754. See also PR: #20168 and Issue: #20161 both of which are blocked by this.
@khw note that "stop on first error" has for quite a while been advocated by @iabyn, if I remember correctly - we're not good at ensuring everything is restored to a valid state after an error, and the attempt to continue after errors has been the source of numerous security issues in the past. (That said, I think they were all rejected as security issues, because they needed code from an untrusted source to exploit - but they also cost us a lot of effort to analyse.) For me it is second nature to use the strategy @demerphq mentions - to scan a screed of garbage on the screen for the lowest-mentioned line number - but I'm always aware when doing so that a) I'm making up for perl's failings in doing so, and b) that someone new to perl probably won't know about that strategy. |
d07c6aa
to
a9d2da5
Compare
This converts INIT {} blocks from the Module::Install::DSL namespace into BEGIN blocks. This works around the bug reported in GH Issue #16300. (Hopefully, not fully tested yet.) Which in turn should allow us to close the bug in #2754. See also PR: #20168 and Issue: #20161 both of which are blocked by this.
a9d2da5
to
9cbc685
Compare
9cbc685
to
a948189
Compare
@leonerd you expressed some interest in this, it is now out of draft and ready for merge. |
e3eb417
to
64dce8a
Compare
squashed them down to a single patch now. |
9be0cbc
to
7a5e435
Compare
We try to keep parsing after many types of errors, up to a (current) maximum of 10 errors. Continuing after a semantic error (like undeclared variables) can be helpful, for instance showing a set of common errors, but continuing after a syntax error isn't helpful most of the time as the internal state of the parser can get confused and is not reliably restored in between attempts. This can produce sometimes completely bizarre errors which just obscure the true error, and has resulted in security tickets being filed in the past. This patch makes the parser stop after the first syntax error, while preserving the current behavior for other errors. An error is considered a syntax error if the error message from our internals is the literal text "syntax error". This may not be a complete list of true syntax errors, we can iterate on that in the future. This fixes the segfaults reported in Issue #17397, and #16944 and likely fixes other "segfault due to compiler continuation after syntax error" bugs that we have on record, which has been a recurring issue over the years.
7a5e435
to
ae7f5d3
Compare
This fixes Issue #16057, prototypes on BEGIN blocks cause segfaults. This patch warns about the use of either.
ae7f5d3
to
66b139e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems a reasonable direction. I'm not hugely a fan of special-casing the exception message "syntax error"; but I gather this is just a first-step in the direction of having a better mechanism - such as a dedicated abort-the-parse function.
Yes, indeed, that will come once I start getting some feedback. Please let me know if you encounter something that should stop the parse but doesnt. I will merge this! |
On Sun, 28 Aug 2022 at 20:05, Karl Williamson ***@***.***> wrote:
I'm somewhat leery of this.
I wrote a Lisp interpreter in Snobol for a school assignment. Snobol stops
at the first error. This was using punch cards and the turnaround time was
8-12 hours during the day, dropping to .5 hr at 3am. It was awful. I have
hated compilers that don't try to recover ever since.
I can see giving up the current section of code if there are several errors
in a few adjacent lines. But why not then skip ahead some looking for a
semi colon immediately followed by a new line, and continue trying from
there?
I think you misunderstand the patch. It doesnt stop on first error. it
stop on first *syntax* error, which is where the compiler gets totally
confused. For other errors the old behavior is preserved.
Personally I *strongly* disagree with continuing after a syntax error. The
storm of bogus errors from restarting the parse obscures the true error in
a storm of halucinatory rubbish. I dont think trying to find the first
semicolon is reliable. Perl syntax is too messed up. At my old job I had to
do a lot of hand holding of people new to perl and one of the things they
complained about was the rubbish errors. Its a bit embarrasing when you
have to train people to ignore most of the mesages and root through them to
find the gem hidden in a pile of dung.
Why dont you try the branch and see what you think? So far I have found it
quite nice. When I mess up and make a syntax error there is one error
message, and it's always correct. I dont have to stare at 9 other errors
which are figments of perl imagination. In fact I found it a little
confusing at first as I am so used to ignoring most of the error messages
Perl produces that when it gave me a single error that was correct it threw
me my a bit (in a good way): "Where did all the garbarge go?".
I think until you have tried the branch you should reserve judgement, i
dont think extrapolation from snobol is a reasonable thing to do. Perl is a
very different language.
Anyway, if you have ideas on how to sanely restart the parser somewhere
else then go for it.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
This converts INIT {} blocks from the Module::Install::DSL namespace into BEGIN blocks. This works around the bug reported in GH Issue Perl#16300. (Hopefully, not fully tested yet.) Which in turn should allow us to close the bug in Perl#2754. See also PR: Perl#20168 and Issue: Perl#20161 both of which are blocked by this.
We try to keep parsing after many types of errors, up to a (current)
maximum of 10 errors. Continuing after a semantic error (like
undeclared variables) can be helpful, for instance showing a set of
common errors, but continuing after a syntax error isn't helpful
most of the time as the internal state of the parser can get confused
and is not reliably restored in between attempts. This can produce
sometimes completely bizarre errors which just obscure the true error,
and has resulted in security tickets being filed in the past.
This patch makes the parser stop after the first syntax error, while
preserving the current behavior for other errors. An error is considered
a syntax error if the error message from our internals is the literal
text "syntax error". This may not be a complete list of true syntax
errors, we can iterate on that in the future.
This fixes the segfaults reported in Issue #17397, and #16944 and
likely fixes other "segfault due to compiler continuation after syntax
error" bugs that we have on record, which has been a recurring issue
over the years.
The PR also includes a fix to another segfault/assert (Issue #16057)
related prototypes on BEGIN blocks, which is in this PR because it
originally looked related to the stop on first error problem, and given it
/is/ related to "stopping segfaults during compilation" it seems reasonable
to save some work and keep it in this PR.
Sorry for the weird wrapping of this ticket.