Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse more c++ expression alternatives #296

Merged

Conversation

i-garrison
Copy link
Contributor

If parsing expression part for an alternative terminates with BacktrackException, selectFallback() would short-circuit to the longest remaining variant. If that happens to successfully complete parsing till the end of expression token sequence, all of remaining variants are discarded, including the first found alternative which was to parse identifier as template name.

This causes expression() to only consider one branchpoint out of all possible variants. Allow it to find more variants by scanning through all branchpoints looking for the alternative with leftmost parsed boundary.

To make it work add these operators to possible end-of-template-id markers due to variants with start of unary expression: + - & * for unary ops and && for unary op with label reference extension.

This is probably still not ideal but fixes this common std library construct:

  template <typename T>
  inline constexpr bool X = true;
  template <typename T>
  inline constexpr bool Y = true;

  template<typename T> bool predicate() {
    return X<T> && Y<T>; // CDT finds this one: (X) < (T) > (&&Y<T>)
                         // Fix it to also consider (X<T>) && (Y<T>)
  }

Bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=497931

If parsing expression part for an alternative terminates with BacktrackException,
selectFallback() would short-circuit to the longest remaining variant. If that
happens to successfully complete parsing till the end of expression token
sequence, all of remaining variants are discarded, including the first found
alternative which was to parse identifier as template name.

This causes expression() to only consider one branchpoint out of all possible
variants. Allow it to find more variants by scanning through all branchpoints
looking for the alternative with leftmost parsed boundary.

This is probably still not ideal but fixes this common std library construct:

  template <typename T>
  inline constexpr bool X = true;
  template <typename T>
  inline constexpr bool Y = true;

  template<typename T> bool predicate() {
    return X<T> && Y<T>; // CDT finds this one: (X) < (T) > (&&Y<T>)
                         // Fix it to also consider (X<T>) && (Y<T>)
  }

Bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=497931
@github-actions
Copy link

Test Results

   540 files     540 suites   17m 42s ⏱️
9 657 tests 9 635 ✔️ 22 💤 0
9 679 runs  9 657 ✔️ 22 💤 0

Results for commit e076caa.

@jonahgraham
Copy link
Member

Thanks @i-garrison This change seems reasonable to me, but I would appreciate someone with better understanding of the C++ to have a review if possible. @ddscharfe would you be able to give this a once over? I appreciate your help with these reviews.

@jonahgraham jonahgraham added the language C/C++ Language Support label Feb 23, 2023
@jonahgraham jonahgraham added this to the 11.1.0 milestone Feb 23, 2023
@i-garrison
Copy link
Contributor Author

Adding more color to this change, looks like I partially restored the original implementation which is 9a2e978#diff-c367560215f7d4e4d837a7567bd7b46c0f6f3ee18df384c8b63bceb189db92d5R157 note it scans through all of available variants too.

The later change in 73968cb#diff-c367560215f7d4e4d837a7567bd7b46c0f6f3ee18df384c8b63bceb189db92d5R143 seems to have caused the limiting behavior of recovery from parse failure that I found to be problematic for the test cases in this PR.

While this PR does not cause any of existing tests to fail, I believe there will be a need for further improvements. To my mind it should be possible to go down the tree of all possible alternatives at ambiguity resolution time (when class parsing is complete and type info is supposedly complete as well) and pick the path step of variant that fits available name type info. At the moment I do not see how existing implementation was supposed to do that unless there is a bug on exit path with building expression trees at the end of expression() before buildExpression() is called, will try to look into that separately.

@ddscharfe
Copy link
Contributor

Thanks @i-garrison This change seems reasonable to me, but I would appreciate someone with better understanding of the C++ to have a review if possible. @ddscharfe would you be able to give this a once over? I appreciate your help with these reviews.

+1, thanks @i-garrison, I will review this PR next week.

Copy link
Contributor

@ddscharfe ddscharfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, considering all branch points in selectFallback() seems logical to me.

While this PR does not cause any of existing tests to fail, I believe there will be a need for further improvements. To my mind it should be possible to go down the tree of all possible alternatives at ambiguity resolution time (when class parsing is complete and type info is supposedly complete as well) and pick the path step of variant that fits available name type info. At the moment I do not see how existing implementation was supposed to do that unless there is a bug on exit path with building expression trees at the end of expression() before buildExpression() is called, will try to look into that separately.

However there will be cases where the disambiguation won't be possible due to the architecture of CDT, according to https://bugs.eclipse.org/bugs/show_bug.cgi?id=497931#c8, right?

@i-garrison
Copy link
Contributor Author

However there will be cases where the disambiguation won't be possible due to the architecture of CDT, according to https://bugs.eclipse.org/bugs/show_bug.cgi?id=497931#c8, right?

This is partially correct, in that CDT's semantic analysis needs template declarations to be in the right place (e.g. global template to be in the list of TU declarations) already at the time semantic call is made; unfortunately at the moment template will only be added to the tree after it is completely parsed together with all of it's contents.

On the other hand, the preparation code leading to selectFallback() does not need any semantic analysis which requires templates to be accessible via expected locations - that merely creates a tree of all possible variants which is disambiguated later after tree is already complete (this is done during resolveAmbiguities() call after tree is parsed and TU is structurally complete, just before returning from parse() call.)

To my mind it should be absolutely possible to place declarations in the right places as we go, and I have partially working change to that extent which also allows you to perform semantic queries to bind names to template declarations right during parsing. Basic idea is to just add the declaration to the tree before proceeding with parsing contents of definition, and deal with caches of bindings in the process. Not sure if I will be able to make that change fully operational before we see LSP-based editor though; if that would work we will not need the name-vs-template disambiguations at all.

@jonahgraham jonahgraham merged commit 57a32fc into eclipse-cdt:main Feb 27, 2023
@jonahgraham
Copy link
Member

Thanks @i-garrison for the change and @ddscharfe for reviewing.

@i-garrison i-garrison deleted the pr/c++-expression-more-alternatives branch February 27, 2023 19:17
@ddscharfe
Copy link
Contributor

This is partially correct, in that CDT's semantic analysis needs template declarations to be in the right place (e.g. global template to be in the list of TU declarations) already at the time semantic call is made; unfortunately at the moment template will only be added to the tree after it is completely parsed together with all of it's contents.

On the other hand, the preparation code leading to selectFallback() does not need any semantic analysis which requires templates to be accessible via expected locations - that merely creates a tree of all possible variants which is disambiguated later after tree is already complete (this is done during resolveAmbiguities() call after tree is parsed and TU is structurally complete, just before returning from parse() call.)

To my mind it should be absolutely possible to place declarations in the right places as we go, and I have partially working change to that extent which also allows you to perform semantic queries to bind names to template declarations right during parsing. Basic idea is to just add the declaration to the tree before proceeding with parsing contents of definition, and deal with caches of bindings in the process. Not sure if I will be able to make that change fully operational before we see LSP-based editor though; if that would work we will not need the name-vs-template disambiguations at all.

Thanks @i-garrison for explaining your ideas.
I think in the long run moving to LSP will be the way to go, however having fixes like this provides a lot of value now, so thanks again for your PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language C/C++ Language Support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants