Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move-compiler] Added dot chain parsing resilience #17106

Merged
merged 8 commits into from
Apr 20, 2024

Conversation

awelc
Copy link
Contributor

@awelc awelc commented Apr 10, 2024

Description

This PR adds parsing resilience to dot chains (e.g., some_struct.some_field). It's a pre-requisite to adding auto-completion for struct fields and struct functions (aka methods) to the IDE.

Implementation-wise, I was also trying making parse_identifier parser-resilient (have it always return a value instead of Result). This does not quite work, though, since in parse_dot_or_index_chain we need to know that if an identifier fails to parse, we should stop parsing the chain itself. If the parse_identifier never returns an error, we would simply continue parsing the chain and generate weird errors.

Test plan

An additional compiler test has been added. Also a symbolicator test has been added to verify that prefixes of partially parsed dot chains are parsed correctly.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI: Dot chain compiler diagnostics are more prevalent (for example, some_struct.some_field) as parsing errors no longer prevent compilation. Consequently, the compiler can reach later compilation stages where it might generate additional diagnostics.
  • Rust SDK:

@awelc awelc requested review from tnowacki and cgswords April 10, 2024 02:13
Copy link

vercel bot commented Apr 10, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-core ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 20, 2024 3:12am
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Apr 20, 2024 3:12am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Apr 20, 2024 3:12am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Apr 20, 2024 3:12am

@awelc awelc requested review from cgswords and tzakian and removed request for cgswords April 10, 2024 02:14
@awelc awelc self-assigned this Apr 10, 2024
@awelc awelc requested a review from dariorussi April 10, 2024 02:14
Copy link
Contributor

@tnowacki tnowacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems rather subtle and I don't quite follow all of it, so just some initial questions :)

@@ -117,12 +117,6 @@ fn unexpected_token_error_(
))
}

fn add_type_args_ambiguity_label(loc: Loc, mut diag: Box<Diagnostic>) -> Box<Diagnostic> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this go away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was no longer used after the "main" parser resilience PR (#16673) landed. This was due to the fact that generating this additional label was predicated on parse_comma_list returning an error which it no longer does. It just got more explicit in the current PR as parse_optional_type_args (wrapping parse_comma_list) now also no longer returns an error.

context.add_diag(*diag);
let end_loc = context.tokens.previous_end_loc();
let loc = make_loc(context.tokens.file_hash(), start_loc, end_loc);
sp(loc, Symbol::from(""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this symbol here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are trying to do here is to parse a piece of code like this:

fun foo(s: SomeStruct) {
    s.
}

This is happening in parse_dot_or_index_chain which contains a loop parsing a potentially multi-segment dot chain (e.g., s.foo().f).

Considering the example above, we would first (before the loop even starts) parse s, then encounter a dot and try to parse whatever comes after the dot. In our example, this would fail as due to } being encountered. Still, we want the partial access chain to be available for auto-completion so that we can determine the type of s, so we "complete" the chain with a fake (empty) identifier, and keep going with the parsing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we will end up with a really strange error message Unbound field '' in 'a::m::AnotherStruct'

Which feels like something you maybe want to suppress? Could we add a new node or some sort of error node that the IDE can latch onto?

let n = match parse_identifier(context) {
Ok(id) => id,
Err(diag) => {
done_parsing = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you consider an example I tried to put together in the other response, if we break here (instead of completing construction of Exp_::Dot) we will not accomplish our goal as the AST node for the s. statement will not be created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just return then?

let n = match parse_identifier(context) {
Ok(id) => id,
Err(diag) => {
done_parsing = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just return then?

Comment on lines 2545 to 2548
loop {
if done_parsing {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a while loop :)

context.add_diag(*diag);
let end_loc = context.tokens.previous_end_loc();
let loc = make_loc(context.tokens.file_hash(), start_loc, end_loc);
sp(loc, Symbol::from(""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we will end up with a really strange error message Unbound field '' in 'a::m::AnotherStruct'

Which feels like something you maybe want to suppress? Could we add a new node or some sort of error node that the IDE can latch onto?

Copy link
Contributor

@tnowacki tnowacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, and I think we would benefit on getting some thoughts from @cgswords.
Otherwise looks good!

Comment on lines 3356 to 3406
ExpDottedAccess::UnresolvedError => {
assert!(idx == num_accessors - 1);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it will not be enough for the IDE. I would imagine you will need something like UnannotatedExp_::InvalidAccess(Box<Exp>, Option<Name>) which will let you float access to invalid fields and invalid methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the limitation here and was hoping you could you give an example when this is not going to be sufficient.

I was going for the IDE to be able to understand what all valid prefixes of an otherwise only partially parsed chain are. As per added test it should understand what s (in s.;) and s.another_field (in s.another_field.;) are, even if parsing fails after the final dot (as for auto-completion we don't really care what "wrong thing" comes after the final dot)

    fun foo(s: AnotherStruct) {
        let _tmp1 = s.;                // incomplete with `;` (next line should parse)
        let _tmp2 = s.another_field.;  // incomplete with `;` (next line should parse)
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have s.foo_ in the IDE, you want all fields and functions that are prefixed by foo_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me personally, I never do s. and wait, I always think I know the name of the function or field, and then am sadly mistaken. But autocorrect picks up about halfway through me typing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. From what I understand, though, integration with the IDE will work in a slightly different way. There is only one trigger event for auto-completion, and it's the appearance of the . character. Whatever you type afterwards does not trigger further auto-completion events. This means, that the LSP implementation needs to provide all possible completions upon just seeing . (as otherwise it will not have another chance). It may appear as if it's the LSP that does the filtering (in your example, show only functions prefixed by foo_) but it's the "client" side that actually does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can include IDE integration in this PR and make sure that everything works as expected but per @cgswords's previous request I am trying to keep the compiler PRs and IDE PRs separate.

Copy link
Contributor Author

@awelc awelc Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I take it back :-) I thought it over and now I am understanding much better what you had in mind! My argument still partially stands, but there are cases when we'd need the "invalid" names. More changes (likely) coming!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked things a bit. We indeed need new UnannotatedExp_::InvalidAccess to support a dot-chain with no proper identifier after the dot (e.g., tmp.;), even if the dot is on a separate line (that's why we need to preserve dot's location in conjunction with the type of expression that precedes it, so that the symbolicator can get the preceding expression type having only location of the dot at its disposal).

I don't think we need anything new to represent a dot chain with an "invalid" field or method name name (e.g., tmp.invalid where invalid does not represent a field or a method name). This will still parse as a Exp_::Dot (rather than Exp_::DotUnresolved) but with UnresolvedType instead of some real one, and we just have to make sure that we process these correctly in the symbolicator.

_ => match parse_identifier(context) {
Err(diag) => {
context.add_diag(*diag);
Exp_::DotUnresolved(Box::new(lhs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this turned out relatively cleanly!

@@ -1734,6 +1735,10 @@ impl AstDebug for ExpDotted_ {
w.comma(&rhs.value, |w, e| e.ast_debug(w));
w.write("]");
}
D::DotUnresolved(_, e) => {
e.ast_debug(w);
w.write("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: have this print something more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was supposed to be (fixed now):

                e.ast_debug(w);
                w.write("")

Exp_::DotUnresolved(loc, Box::new(lhs))
}
Ok(n) => {
if is_start_of_call_after_function_name(context, &n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check this either way so that we support s.foo.(x, y) ? I can imagine trying to get IDE help by hvaing s.foo.bar(x, y) and removing bar to try to get autocomplete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work without any changes as s.foo.(x, y) will parse to the DotUnresolved case and auto-completion will work correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to symbolicator tests, but I am also planning to have a more comprehensive auto-completion tests once auto-completion features start landing (i.e., have unit tests that actually check what would auto-completion actually insert in these partially-parsed cases).

Comment on lines 2598 to 2610
let is_macro = if let Tok::Exclaim = context.tokens.peek() {
let loc = current_token_loc(context.tokens);
context.advance();
Some(loc)
} else {
None
};
let mut tys = None;
if context.tokens.peek() == Tok::Less
&& n.loc.end() as usize == call_start
{
tys = parse_optional_type_args(context);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just call parse_macro_opt_and_tyargs_opt here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can see. Previously, I just used what was already there, but now I rewrote it to use parse_macro_opt_and_tyargs_opt.

Copy link
Contributor

@cgswords cgswords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small follow-ups.

One broader question is: why do we maintaing DotResolved so deep into the compiler? We don't appear to be using it anywhere yet. Can you say what the plan is?

@awelc
Copy link
Contributor Author

awelc commented Apr 17, 2024

A few small follow-ups.

One broader question is: why do we maintaing DotResolved so deep into the compiler? We don't appear to be using it anywhere yet. Can you say what the plan is?

For an incomplete dot chain that does not have a parsable identifier after dot (e.g., tmp.;), we need to carry the location of the dot and the type of the dot's prefix to the typing AST so that the symbolicator can retrieve the right type of information for dot auto-completion. I have been implementing this in the symbolicator on the side to see if we have all wee need and it seems to be working as expected.

let _tmp2 = s.another_field.; // incomplete with `;` (next line should parse)
let _tmp3 = s.another_field. // incomplete without `;` (unexpected `let`)
let _tmp4 = s;
let _tmp = s. // incomplete without `;` (unexpected `}`)
Copy link
Contributor

@cgswords cgswords Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is a bit weird because it just says it's expected an identifier, when a number is also allowed. Maybe a last small thing to fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This was the original behavior, though, and I simply kept it. That being said, it definitely makes sense to be more precise here so I changed it to generate a diagnostic according to your suggestion.

Copy link
Contributor

@cgswords cgswords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One last little request.

@awelc awelc force-pushed the aw/compiler-dot-resilience branch from 4d29768 to 61466e4 Compare April 20, 2024 03:11
@awelc awelc enabled auto-merge (squash) April 20, 2024 03:25
@awelc awelc merged commit 9a202f1 into main Apr 20, 2024
47 checks passed
@awelc awelc deleted the aw/compiler-dot-resilience branch April 20, 2024 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants