Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move-compiler] Parser resilience #16673

Merged
merged 5 commits into from
Apr 9, 2024
Merged

[move-compiler] Parser resilience #16673

merged 5 commits into from
Apr 9, 2024

Conversation

awelc
Copy link
Contributor

@awelc awelc commented Mar 14, 2024

Description

This PR adds resilience to parser errors to the compiler. The high-level idea is to (eventually) always return a correctly (though potentially only partially) parsed AST node at every level. For example, fun foo would be parsed as a correct function even though it does not have parameter list or a body.

Test Plan

Tests need to be adjusted to check that everything works.


If your changes are not user-facing and do not break anything, you can skip the following section. Otherwise, please briefly describe what has changed under the Release Notes section.

Type of Change (Check all that apply)

  • protocol change
  • user-visible impact
  • breaking change for a client SDKs
  • breaking change for FNs (FN binary must upgrade)
  • breaking change for validators or node operators (must upgrade binaries)
  • breaking change for on-chain data layout
  • necessitate either a data wipe or data migration

Release notes

Developers might see more compiler diagnostics as selected parsing errors no longer prevent compilation and diagnostics from the compiler reaching later compilation stages where additional diagnostics may be generated.

@awelc awelc requested a review from cgswords March 14, 2024 22:52
Copy link

vercel bot commented Mar 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
mysten-ui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 9, 2024 6:32pm
sui-core ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 9, 2024 6:32pm
4 Ignored Deployments
Name Status Preview Comments Updated (UTC)
explorer ⬜️ Ignored (Inspect) Visit Preview Apr 9, 2024 6:32pm
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Apr 9, 2024 6:32pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Apr 9, 2024 6:32pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Apr 9, 2024 6:32pm

@awelc awelc requested a review from tnowacki March 14, 2024 22:52
@awelc awelc self-assigned this Mar 14, 2024
@@ -1,4 +1,4 @@
module a::m {
macro fun foo<$T>($x: $T): $T { $x }
fun bar(): u64 { foo<u64>!(42) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sor tof change seems rather suspect to me. What happened to cause this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an error in the body of the function which causes the returned value to be incorrect. Previously, the parser error caused compilation to stop but now additional typing error would be reported (wrong return type). I made this type of change in other places as well to make it more clear which compilation error we are testing for.

│ ^
│ │
│ Unexpected '+'
│ Expected ','
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit on these: we can pretty precisely tell the user which things we expected: anything in the start set plus the end token. This gets too large sometimes to bother reporting (the exp start set, for example), but here we could say "Expected , or )". I added a macro format_oxford_list under shared/mod.rs that I wrote originally precisely for these sorts of errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump on this, seems helpful to say , or )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

error[E01002]: unexpected token
┌─ tests/move_2024/parser/macro_identifier_invalid_no_following_characters.move:3:1
3 │
Copy link
Contributor

@cgswords cgswords Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to suppress these errors when we find EOF so that we don't get these weird ghost errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not much we can do here (and in some other places) without an additional machinery of some kind as EOF is the only part of the stop set that can be a match here.

One idea is ignore "unexpected EOF if another error has already been recorded. We could do this (as already suggested in the other comment by @cgswords) simply checking if there are any diags in the compilation environment already (should we suppress other errors the same way?). What do you think, @cgswords and @tnowacki ?

Comment on lines +2 to +8
┌─ tests/move_2024/parser/labeled_lambda_body_invalid.move:7:9
7 │ call!(|x| -> u64 'a: 0); // parsing error needs a block
│ ^^^^^^^^^^^^^^^^^^^^^^^
│ │ │
│ │ Found 0 argument(s) here
│ Invalid call of 'a::m::call'. The call expected 1 argument(s) but got 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth it to build some machinery to avoid reporting these sorts of "double errors"? We already have an error, since we messed up parsing. Is it useful to report this second one, too?

If we decide not to, a few idea of how to avoid it:

(1) We keep track of bad-parse locations and, when reporting errors, compare the errors to see if it's the same problem (specifically look through the env diagnostics and see if we had a parsing error, and decide not to reporting a naming error at the same location). This could prove to be rather finnicky.

(2) We could mark the arguments as having errors (vec[E::UnexpectedError]) and check for that sort of thing when deciding to report errors later:

if args.length() != fn_args.length() && args.iter().any(|arg| arg.is_error) {
  .. report arity error
}

This would require hand-addressing these in a bunch of spots in the compiler.

(3) When reporting diagnostics, only report the first one we receive for each location. This would actually still produce this error, but not the one below it. That might be an improvement.

Alternatively, maybe we should just live with all these errors. We have, however, gotten some feedback that errors can be too noisy to find the root cause and I'm afraid this is going to make the problem worse.

This is possibly worth discussing as a group (cc @tnowacki )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea generally of avoiding a lot of double errors. And it mostly involves changing UnresovledError to UnresolvedError(Vec<Exp>). This lets us drop expressions less often, and also let's us know when we might need to suppress errors (since we can wrap things as Errors more often without fearing dropping them)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, we could mark it possibly as call!(UnresolvedError(vec![])) conceptually, which gives us an easy spot to suppress arity errors and the like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we even need to go that far, we just need to do a bit of error tuning. For this case, we could see if any of the call args contains an error and avoid reporting arity errors in that case.

Comment on lines 12 to 5
6 │ S { mut f } = s1;
│ ^^^^^^^^^^^ Missing assignment for field 'f' in 'a::m::S'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this error should not be reported. Why aren't we grabbing f as a field?

Comment on lines 4 to 6
public fun foo(_s: S) {
let f = 0;
S { mut f } = s1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public fun foo(_s: S) {
let f = 0;
S { mut f } = s1;
public fun foo(s: S) {
let f = 0;
S { mut f } = s;

@@ -1439,6 +1439,7 @@ fn parse_sequence(context: &mut Context) -> Result<Sequence, Box<Diagnostic>> {
let mut seq: Vec<SequenceItem> = vec![];
let mut last_semicolon_loc = None;
let mut eopt = None;
let mut parsing_error = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood here but it feels a bit strange. For the errors we had in question, I would have imagined parse_sequence_item to have returned an UnresolvedError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have imagined parse_sequence_item to have returned an UnresolvedError?

This would indeed be preferred, but I'd rather not go into this rabbit hole just yet. The reason it's a bit complicated is that on failure in parsing SequenceItem, we need to fast forward the parser to the end of the item (so that parsing at Sequence level does not produce a bunch of weird errors trying to restart parsing). The stop set for sequence item set at the Sequence level would include ; and } but we can't simply stop at } if we are parsing something like this:

    fun test00(): S<u64> {
        0x42<u64>::m::S { u: 0 }
    }

If we do, we will stop at } finishing pack rather than at the } finishing sequence. What we'd need to do is to count opening and closing braces, but at the level we need to do it, we technically don't know that it should be done (as we only have a stop set as the source of information).

Currently we handle this by having an error returned from SequenceItem parsing and doing the correct fast-forwarding at SequenceLevel where we know how SequenceItem can end and what tokens to look for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just have match parse_sequence_item(context) { ... | Err(diag) => { seq.push(UnresolvedError); ...? Is more my point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh... Now that I tried it, though, I realized that it wouldn't quite work. The reason for it is that in order to suppress the typing error for non-parsable function body, there can only be one sequence element and this element must be UnresolvedError. Unfortunately, if we place UnresolvedError in seq here and the last no-semicolon statement is empty, expansion's translation pass will insert another element to the sequence for the non-semicolon statement.

I can still insert UnresolvedError into the sequence and at the end move it to no-semicolon placeholder if the placeholder is empty and there is only one list element, but this arguably wouldn't be much prettier...

Added a comment along those lines, but open to additional suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on the second thought, perhaps translation to expansion should be changed to not produce a final unit-typed expression if the only sequence member is unresolved. I made a change along those lines, please let me know what you think.

Copy link
Contributor

@tnowacki tnowacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things are looking good! Approving if we need to get it in quickly, but I think @cgswords has been following this much more thoroughly than me. So I think it would benefit getting his eyes before landing

@cgswords cgswords dismissed their stale review April 9, 2024 20:12

Dang, some of these matching error changes are a bit unfortunate -- i wonder if we want to add => to the stop set for patterns. I'm also still curious if we can get rid of EOFs in some cases by checking if we are standing at the } that matches the module start. That said, stamping this so that it doesn't stall IDE work. We can come back and tune those later in smaller PRs.

@awelc awelc merged commit b80a6ae into main Apr 9, 2024
43 of 44 checks passed
@awelc awelc deleted the aw/parser-resilience branch April 9, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants