[move-compiler] Parser resilience #16673

awelc · 2024-03-14T22:52:20Z

Description

This PR adds resilience to parser errors to the compiler. The high-level idea is to (eventually) always return a correctly (though potentially only partially) parsed AST node at every level. For example, fun foo would be parsed as a correct function even though it does not have parameter list or a body.

Test Plan

Tests need to be adjusted to check that everything works.

If your changes are not user-facing and do not break anything, you can skip the following section. Otherwise, please briefly describe what has changed under the Release Notes section.

Type of Change (Check all that apply)

protocol change
user-visible impact
breaking change for a client SDKs
breaking change for FNs (FN binary must upgrade)
breaking change for validators or node operators (must upgrade binaries)
breaking change for on-chain data layout
necessitate either a data wipe or data migration

Release notes

Developers might see more compiler diagnostics as selected parsing errors no longer prevent compilation and diagnostics from the compiler reaching later compilation stages where additional diagnostics may be generated.

vercel · 2024-03-14T22:52:29Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
mysten-ui	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 9, 2024 6:32pm
sui-core	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 9, 2024 6:32pm

4 Ignored Deployments

Name	Status	Preview	Updated (UTC)
explorer	⬜️ Ignored (Inspect)	Visit Preview	Apr 9, 2024 6:32pm
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Apr 9, 2024 6:32pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Apr 9, 2024 6:32pm
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Apr 9, 2024 6:32pm

cgswords · 2024-03-17T21:38:00Z

external-crates/move/crates/move-compiler/tests/move_2024/parser/invalid_macro_bang.move

@@ -1,4 +1,4 @@
 module a::m {
    macro fun foo<$T>($x: $T): $T { $x }
-    fun bar(): u64 { foo<u64>!(42) }


This sor tof change seems rather suspect to me. What happened to cause this?

There is an error in the body of the function which causes the returned value to be incorrect. Previously, the parser error caused compilation to stop but now additional typing error would be reported (wrong return type). I made this type of change in other places as well to make it more clear which compilation error we are testing for.

cgswords · 2024-03-17T21:40:01Z

...rnal-crates/move/crates/move-compiler/tests/move_2024/parser/labeled_lambda_body_invalid.exp

+   │                                              ^
+   │                                              │
+   │                                              Unexpected '+'
+   │                                              Expected ','


A nit on these: we can pretty precisely tell the user which things we expected: anything in the start set plus the end token. This gets too large sometimes to bother reporting (the exp start set, for example), but here we could say "Expected , or )". I added a macro format_oxford_list under shared/mod.rs that I wrote originally precisely for these sorts of errors.

Bump on this, seems helpful to say , or )

...rnal-crates/move/crates/move-compiler/tests/move_2024/parser/labeled_lambda_body_invalid.exp

cgswords · 2024-03-17T21:41:41Z

...es/move-compiler/tests/move_2024/parser/macro_identifier_invalid_no_following_characters.exp

+error[E01002]: unexpected token
+  ┌─ tests/move_2024/parser/macro_identifier_invalid_no_following_characters.move:3:1
+  │
+3 │ 


We need to suppress these errors when we find EOF so that we don't get these weird ghost errors

There is not much we can do here (and in some other places) without an additional machinery of some kind as EOF is the only part of the stop set that can be a match here.

One idea is ignore "unexpected EOF if another error has already been recorded. We could do this (as already suggested in the other comment by @cgswords) simply checking if there are any diags in the compilation environment already (should we suppress other errors the same way?). What do you think, @cgswords and @tnowacki ?

cgswords · 2024-03-17T21:50:44Z

...rnal-crates/move/crates/move-compiler/tests/move_2024/parser/labeled_lambda_body_invalid.exp

+  ┌─ tests/move_2024/parser/labeled_lambda_body_invalid.move:7:9
+  │
+7 │         call!(|x| -> u64 'a: 0); // parsing error needs a block
+  │         ^^^^^^^^^^^^^^^^^^^^^^^
+  │         │    │
+  │         │    Found 0 argument(s) here
+  │         Invalid call of 'a::m::call'. The call expected 1 argument(s) but got 0


Would it be worth it to build some machinery to avoid reporting these sorts of "double errors"? We already have an error, since we messed up parsing. Is it useful to report this second one, too?

If we decide not to, a few idea of how to avoid it:

(1) We keep track of bad-parse locations and, when reporting errors, compare the errors to see if it's the same problem (specifically look through the env diagnostics and see if we had a parsing error, and decide not to reporting a naming error at the same location). This could prove to be rather finnicky.

(2) We could mark the arguments as having errors (vec[E::UnexpectedError]) and check for that sort of thing when deciding to report errors later:

if args.length() != fn_args.length() && args.iter().any(|arg| arg.is_error) { .. report arity error }

This would require hand-addressing these in a bunch of spots in the compiler.

(3) When reporting diagnostics, only report the first one we receive for each location. This would actually still produce this error, but not the one below it. That might be an improvement.

Alternatively, maybe we should just live with all these errors. We have, however, gotten some feedback that errors can be too noisy to find the root cause and I'm afraid this is going to make the problem worse.

This is possibly worth discussing as a group (cc @tnowacki )

I have an idea generally of avoiding a lot of double errors. And it mostly involves changing UnresovledError to UnresolvedError(Vec<Exp>). This lets us drop expressions less often, and also let's us know when we might need to suppress errors (since we can wrap things as Errors more often without fearing dropping them)

In this example, we could mark it possibly as call!(UnresolvedError(vec![])) conceptually, which gives us an easy spot to suppress arity errors and the like

I don't think we even need to go that far, we just need to do a bit of error tuning. For this case, we could see if any of the call args contains an error and avoid reporting arity errors in that case.

cgswords · 2024-03-17T21:54:56Z

...nal-crates/move/crates/move-compiler/tests/move_2024/parser/mut_field_pun_invalid_assign.exp

+6 │         S { mut f } = s1;
+  │         ^^^^^^^^^^^ Missing assignment for field 'f' in 'a::m::S'


IMO this error should not be reported. Why aren't we grabbing f as a field?

cgswords · 2024-03-17T21:55:16Z

...al-crates/move/crates/move-compiler/tests/move_2024/parser/mut_field_pun_invalid_assign.move

+    public fun foo(_s: S) {
        let f = 0;
        S { mut f } = s1;


Suggested change

public fun foo(_s: S) {

let f = 0;

S { mut f } = s1;

public fun foo(s: S) {

let f = 0;

S { mut f } = s;

tnowacki · 2024-04-03T21:20:24Z

external-crates/move/crates/move-compiler/src/parser/syntax.rs

@@ -1439,6 +1439,7 @@ fn parse_sequence(context: &mut Context) -> Result<Sequence, Box<Diagnostic>> {
    let mut seq: Vec<SequenceItem> = vec![];
    let mut last_semicolon_loc = None;
    let mut eopt = None;
+    let mut parsing_error = false;


Understood here but it feels a bit strange. For the errors we had in question, I would have imagined parse_sequence_item to have returned an UnresolvedError?

I would have imagined parse_sequence_item to have returned an UnresolvedError?

This would indeed be preferred, but I'd rather not go into this rabbit hole just yet. The reason it's a bit complicated is that on failure in parsing SequenceItem, we need to fast forward the parser to the end of the item (so that parsing at Sequence level does not produce a bunch of weird errors trying to restart parsing). The stop set for sequence item set at the Sequence level would include ; and } but we can't simply stop at } if we are parsing something like this:

fun test00(): S<u64> { 0x42<u64>::m::S { u: 0 } }

If we do, we will stop at } finishing pack rather than at the } finishing sequence. What we'd need to do is to count opening and closing braces, but at the level we need to do it, we technically don't know that it should be done (as we only have a stop set as the source of information).

Currently we handle this by having an error returned from SequenceItem parsing and doing the correct fast-forwarding at SequenceLevel where we know how SequenceItem can end and what tokens to look for.

Why can't we just have match parse_sequence_item(context) { ... | Err(diag) => { seq.push(UnresolvedError); ...? Is more my point

Doh... Now that I tried it, though, I realized that it wouldn't quite work. The reason for it is that in order to suppress the typing error for non-parsable function body, there can only be one sequence element and this element must be UnresolvedError. Unfortunately, if we place UnresolvedError in seq here and the last no-semicolon statement is empty, expansion's translation pass will insert another element to the sequence for the non-semicolon statement.

I can still insert UnresolvedError into the sequence and at the end move it to no-semicolon placeholder if the placeholder is empty and there is only one list element, but this arguably wouldn't be much prettier...

Added a comment along those lines, but open to additional suggestions.

Actually, on the second thought, perhaps translation to expansion should be changed to not produce a final unit-typed expression if the only sequence member is unresolved. I made a change along those lines, please let me know what you think.

tnowacki

Things are looking good! Approving if we need to get it in quickly, but I think @cgswords has been following this much more thoroughly than me. So I think it would benefit getting his eyes before landing

Dang, some of these matching error changes are a bit unfortunate -- i wonder if we want to add => to the stop set for patterns. I'm also still curious if we can get rid of EOFs in some cases by checking if we are standing at the } that matches the module start. That said, stamping this so that it doesn't stall IDE work. We can come back and tune those later in smaller PRs.

awelc requested a review from cgswords March 14, 2024 22:52

awelc requested a review from tnowacki March 14, 2024 22:52

awelc self-assigned this Mar 14, 2024

vercel bot deployed to Preview – sui-kiosk March 15, 2024 22:41 View deployment

vercel bot deployed to Preview – explorer March 15, 2024 22:41 View deployment

vercel bot deployed to Preview – multisig-toolkit March 15, 2024 22:41 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 15, 2024 22:41 View deployment

vercel bot deployed to Preview – mysten-ui March 15, 2024 22:41 View deployment

vercel bot deployed to Preview – sui-core March 15, 2024 22:45 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 16, 2024 02:27 View deployment

vercel bot deployed to Preview – sui-kiosk March 16, 2024 02:27 View deployment

vercel bot deployed to Preview – multisig-toolkit March 16, 2024 02:27 View deployment

vercel bot deployed to Preview – explorer March 16, 2024 02:27 View deployment

vercel bot deployed to Preview – mysten-ui March 16, 2024 02:27 View deployment

vercel bot deployed to Preview – sui-core March 16, 2024 02:27 View deployment

awelc force-pushed the aw/parser-resilience branch from 10fedc7 to 77b4a74 Compare March 16, 2024 15:26

vercel bot deployed to Preview – sui-kiosk March 16, 2024 15:27 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 16, 2024 15:27 View deployment

vercel bot deployed to Preview – multisig-toolkit March 16, 2024 15:27 View deployment

vercel bot deployed to Preview – explorer March 16, 2024 15:27 View deployment

vercel bot deployed to Preview – mysten-ui March 16, 2024 15:28 View deployment

vercel bot deployed to Preview – sui-core March 16, 2024 15:28 View deployment

cgswords reviewed Mar 17, 2024

View reviewed changes

...rnal-crates/move/crates/move-compiler/tests/move_2024/parser/labeled_lambda_body_invalid.exp Outdated Show resolved Hide resolved

cgswords reviewed Mar 17, 2024

View reviewed changes

vercel bot deployed to Preview – sui-core April 3, 2024 20:56 View deployment

awelc force-pushed the aw/parser-resilience branch from b455200 to 45da52e Compare April 3, 2024 21:10

vercel bot deployed to Preview – mysten-ui April 3, 2024 21:11 View deployment

vercel bot deployed to Preview – sui-core April 3, 2024 21:11 View deployment

tnowacki reviewed Apr 3, 2024

View reviewed changes

awelc force-pushed the aw/parser-resilience branch from 45da52e to 9cc054a Compare April 3, 2024 22:44

vercel bot deployed to Preview – mysten-ui April 3, 2024 22:46 View deployment

vercel bot deployed to Preview – sui-core April 3, 2024 22:49 View deployment

awelc requested a review from tnowacki April 3, 2024 23:32

vercel bot deployed to Preview – sui-core April 6, 2024 00:30 View deployment

vercel bot deployed to Preview – sui-core April 6, 2024 00:33 View deployment

vercel bot deployed to Preview – sui-core April 6, 2024 02:00 View deployment

awelc force-pushed the aw/parser-resilience branch from 12930f9 to c8dacc3 Compare April 7, 2024 01:49

vercel bot deployed to Preview – sui-core April 7, 2024 01:50 View deployment

awelc force-pushed the aw/parser-resilience branch from c8dacc3 to 796a5b5 Compare April 8, 2024 22:34

vercel bot deployed to Preview – sui-core April 8, 2024 22:36 View deployment

vercel bot deployed to Preview – sui-core April 8, 2024 23:05 View deployment

tnowacki approved these changes Apr 9, 2024

View reviewed changes

awelc added 5 commits April 9, 2024 11:30

[move-compiler] Parser resilience

2a33167

Removed EOF diag filter

4445969

Regenerated test output

c634488

Do not insert unit expr into sequence in case of error

b93a661

Post-rebase fixes and test regenaration

0899e7e

awelc force-pushed the aw/parser-resilience branch from d1d3193 to 0899e7e Compare April 9, 2024 18:31

vercel bot deployed to Preview – sui-core April 9, 2024 18:32 View deployment

awelc merged commit b80a6ae into main Apr 9, 2024
43 of 44 checks passed

awelc deleted the aw/parser-resilience branch April 9, 2024 20:41

This was referenced Apr 10, 2024

[move-compiler] Added dot chain parsing resilience #17106

Merged

[move-ide] Added support for init function auto-completion #16140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[move-compiler] Parser resilience #16673

[move-compiler] Parser resilience #16673

awelc commented Mar 14, 2024

vercel bot commented Mar 14, 2024 •

edited

cgswords Mar 17, 2024

awelc Mar 19, 2024

cgswords Mar 17, 2024

tnowacki Apr 2, 2024

awelc Apr 3, 2024

cgswords Mar 17, 2024 •

edited

awelc Mar 19, 2024

cgswords Mar 17, 2024

tnowacki Apr 2, 2024

tnowacki Apr 2, 2024

cgswords Apr 3, 2024

cgswords Mar 17, 2024

cgswords Mar 17, 2024

tnowacki Apr 3, 2024

awelc Apr 3, 2024

tnowacki Apr 5, 2024

awelc Apr 6, 2024

awelc Apr 6, 2024

tnowacki left a comment

		6 │ S { mut f } = s1;
		│ ^^^^^^^^^^^ Missing assignment for field 'f' in 'a::m::S'

[move-compiler] Parser resilience #16673

[move-compiler] Parser resilience #16673

Conversation

awelc commented Mar 14, 2024

Description

Test Plan

Type of Change (Check all that apply)

Release notes

vercel bot commented Mar 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgswords Mar 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnowacki left a comment

Choose a reason for hiding this comment

vercel bot commented Mar 14, 2024 •

edited

cgswords Mar 17, 2024 •

edited