Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop parsing on first syntax error. #20168

Merged
merged 2 commits into from Sep 9, 2022
Merged

Stop parsing on first syntax error. #20168

merged 2 commits into from Sep 9, 2022

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Aug 26, 2022

We try to keep parsing after many types of errors, up to a (current)
maximum of 10 errors. Continuing after a semantic error (like
undeclared variables) can be helpful, for instance showing a set of
common errors, but continuing after a syntax error isn't helpful
most of the time as the internal state of the parser can get confused
and is not reliably restored in between attempts. This can produce
sometimes completely bizarre errors which just obscure the true error,
and has resulted in security tickets being filed in the past.

This patch makes the parser stop after the first syntax error, while
preserving the current behavior for other errors. An error is considered
a syntax error if the error message from our internals is the literal
text "syntax error". This may not be a complete list of true syntax
errors, we can iterate on that in the future.

This fixes the segfaults reported in Issue #17397, and #16944 and
likely fixes other "segfault due to compiler continuation after syntax
error" bugs that we have on record, which has been a recurring issue
over the years.

The PR also includes a fix to another segfault/assert (Issue #16057)
related prototypes on BEGIN blocks, which is in this PR because it
originally looked related to the stop on first error problem, and given it
/is/ related to "stopping segfaults during compilation" it seems reasonable
to save some work and keep it in this PR.

Sorry for the weird wrapping of this ticket.

@demerphq demerphq marked this pull request as draft August 26, 2022 17:20
pp_ctl.c Outdated Show resolved Hide resolved
t/comp/retainedlines.t Outdated Show resolved Hide resolved
t/comp/retainedlines.t Outdated Show resolved Hide resolved
t/comp/retainedlines.t Outdated Show resolved Hide resolved
t/comp/retainedlines.t Outdated Show resolved Hide resolved
@demerphq demerphq force-pushed the yves/stop_first_error branch 2 times, most recently from dcfc218 to 1502965 Compare August 27, 2022 08:51
@khwilliamson
Copy link
Contributor

I'm somewhat leery of this.

I wrote a Lisp interpreter in Snobol for a school assignment. Snobol stops at the first error. This was using punch cards and the turnaround time was 8-12 hours during the day, dropping to .5 hr at 3am. It was awful. I have hated compilers that don't try to recover ever since.

I can see giving up the current section of code if there are several errors in a few adjacent lines. But why not then skip ahead some looking for a semi colon immediately followed by a new line, and continue trying from there?

@demerphq demerphq force-pushed the yves/stop_first_error branch 3 times, most recently from 1e18ad6 to d34aa2c Compare August 29, 2022 12:12
@demerphq
Copy link
Collaborator Author

Hey @khwilliamson can I ask you to try it before you pass judgement? My experience is that the storm of error messages from perl getting confused is really unhelpful, and results in old-times teaching new-comers to "go through it and look for the lowest line number" and such things. I have been using it for some hacking and i have been quite pleased with it. At first it was a little jarring, I'm used to having to sift through a heap of ridiculous and meaningless errors to find the one that is relevant and with the patch it "throws your eyeballs" a bit not having all that crap there, but I got quickly used to it and when i go back to the old rules it throws me the other way now. :-)

If you have thoughts how to safely restart parsing at a semicolon then i think you could do a follow up patch, but given the vagaries of parsing perl IMO that might not be quite as useful as you think: consider code like: if if (/;/) { ... }. It seems like something that is pretty darn hard to do well in perl, so we shouldnt try and stop giving users bogus errors.

Also, I am happy to make this a configurable option with whatever default we want. If people like the storm of hallucinatory errors that perl produces from common syntax errors then they are welcome to build with them, so long as I am welcome to build without them. :-)

@demerphq
Copy link
Collaborator Author

Also note that this patch includes a revert of a patch from #16300 which caused breakage with Module::Install. We need to decide what to do about that.

@demerphq
Copy link
Collaborator Author

I have added a workaround patch for the issue in Module::Install::DSL. We convert INIT blocks from that namespace to be BEGIN blocks. I thought about the added restriction of "INIT blocks in an eval", but it didnt seem necessary. With that i think in theory this PR should be "ok" to go and not cause havok in the CPAN river.

demerphq added a commit that referenced this pull request Aug 29, 2022
This converts INIT {} blocks from the Module::Install::DSL
namespace into BEGIN blocks. This works around the bug reported in
GH Issue #16300. (Hopefully, not fully tested yet.) Which in turn
should allow us to close the bug in #2754.

See also PR: #20168 and Issue: #20161 both of which are blocked by
this.
@hvds
Copy link
Contributor

hvds commented Aug 29, 2022

I'm somewhat leery of this.
[...]
I can see giving up the current section of code if there are several errors in a few adjacent lines. But why not then skip ahead some looking for a semi colon immediately followed by a new line, and continue trying from there?

@khw note that "stop on first error" has for quite a while been advocated by @iabyn, if I remember correctly - we're not good at ensuring everything is restored to a valid state after an error, and the attempt to continue after errors has been the source of numerous security issues in the past. (That said, I think they were all rejected as security issues, because they needed code from an untrusted source to exploit - but they also cost us a lot of effort to analyse.)

For me it is second nature to use the strategy @demerphq mentions - to scan a screed of garbage on the screen for the lowest-mentioned line number - but I'm always aware when doing so that a) I'm making up for perl's failings in doing so, and b) that someone new to perl probably won't know about that strategy.

@demerphq
Copy link
Collaborator Author

@khw note that "stop on first error" has for quite a while been advocated by @iabyn, if I remember correctly

Thanks @hvds - he actually pointed me at the code that started me down this rabbit hole. This task has lead to a bunch of bugs and old issues being identified and fixed.

@demerphq
Copy link
Collaborator Author

FWIW #20181 and #20182 need to be applied in that order, and then the code from this can be applied on top. Ill rebase as things get merged.

@demerphq demerphq force-pushed the yves/stop_first_error branch 3 times, most recently from d07c6aa to a9d2da5 Compare August 29, 2022 20:12
demerphq added a commit that referenced this pull request Sep 3, 2022
This converts INIT {} blocks from the Module::Install::DSL
namespace into BEGIN blocks. This works around the bug reported in
GH Issue #16300. (Hopefully, not fully tested yet.) Which in turn
should allow us to close the bug in #2754.

See also PR: #20168 and Issue: #20161 both of which are blocked by
this.
@demerphq demerphq changed the title DRAFT: Stop parsing on first syntax error. Stop parsing on first syntax error. Sep 5, 2022
@demerphq demerphq marked this pull request as ready for review September 5, 2022 06:14
@demerphq
Copy link
Collaborator Author

demerphq commented Sep 5, 2022

@leonerd you expressed some interest in this, it is now out of draft and ready for merge.

toke.c Outdated Show resolved Hide resolved
@demerphq
Copy link
Collaborator Author

demerphq commented Sep 6, 2022

squashed them down to a single patch now.

We try to keep parsing after many types of errors, up to a (current)
maximum of 10 errors. Continuing after a semantic error (like
undeclared variables) can be helpful, for instance showing a set of
common errors, but continuing after a syntax error isn't helpful
most of the time as the internal state of the parser can get confused
and is not reliably restored in between attempts. This can produce
sometimes completely bizarre errors which just obscure the true error,
and has resulted in security tickets being filed in the past.

This patch makes the parser stop after the first syntax error, while
preserving the current behavior for other errors. An error is considered
a syntax error if the error message from our internals is the literal
text "syntax error". This may not be a complete list of true syntax
errors, we can iterate on that in the future.

This fixes the segfaults reported in Issue #17397, and #16944 and
likely fixes other "segfault due to compiler continuation after syntax
error" bugs that we have on record, which has been a recurring issue
over the years.
This fixes Issue #16057, prototypes on BEGIN blocks cause
segfaults. This patch warns about the use of either.
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems a reasonable direction. I'm not hugely a fan of special-casing the exception message "syntax error"; but I gather this is just a first-step in the direction of having a better mechanism - such as a dedicated abort-the-parse function.

@demerphq
Copy link
Collaborator Author

demerphq commented Sep 9, 2022

but I gather this is just a first-step in the direction of having a better mechanism - such as a dedicated abort-the-parse function.

Yes, indeed, that will come once I start getting some feedback. Please let me know if you encounter something that should stop the parse but doesnt.

I will merge this!

@demerphq demerphq merged commit e5d2a2e into blead Sep 9, 2022
@demerphq demerphq deleted the yves/stop_first_error branch September 9, 2022 16:48
@demerphq
Copy link
Collaborator Author

demerphq commented Oct 11, 2022 via email

scottchiefbaker pushed a commit to scottchiefbaker/perl5 that referenced this pull request Nov 3, 2022
This converts INIT {} blocks from the Module::Install::DSL
namespace into BEGIN blocks. This works around the bug reported in
GH Issue Perl#16300. (Hopefully, not fully tested yet.) Which in turn
should allow us to close the bug in Perl#2754.

See also PR: Perl#20168 and Issue: Perl#20161 both of which are blocked by
this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants