Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent caching outdated AST in CoreASTProvider #2714

Merged
merged 1 commit into from Jul 20, 2023

Conversation

0dinD
Copy link
Contributor

@0dinD 0dinD commented Jun 19, 2023

Related to #1918
Related to redhat-developer/vscode-java#2176

This PR fixes (only partially, see discussion below) #1918 via "Option C" described in this comment: #1918 (comment)

This PR is an alternative to #2709. I believe there are some problems with that workaround, see #2709 (comment)

To my knowledge, this PR should fix #1918 completely, without any edge cases. This was not the case, see review comments in the discussion below.

@testforstephen
Copy link
Contributor

Great, this seems a cheap way to fix the issue. I'll let @jdneo to double check it and also verify whether it has performance impact on the didChange update.

// We must call clearReconcilation here to prevent getAST calls on other threads
// from caching outdated AST after we just called disposeAST.
// See also: https://github.com/eclipse/eclipse.jdt.ls/issues/1918
sharedASTProvider.clearReconciliation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the upstream implementation, looks like clearReconciliation() and reconciled() are synchronized using fReconcileLock object.

If a thread calls reconciled() with an out-of-date AST before the clearReconciliation() being called from document handler, then fReconcileLock will be occupied by that thread, and the out-of-date AST will be cached.

I tested this patch, the chance of getting outdated AST decreases quite a lot, but still can met the issue after several more tries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some new findings, just for reference:

  1. I tried sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST(), but still had some chance to get the wrong AST.

  2. sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation(). Seems works well on my machine, tried a lot of times and didn't observe wrong highlighting.

To understand what's happening, I turned JavaManipulationPlugin.DEBUG_AST_PROVIDER on and set -Djdt.ls.debug=true to see the logs. But after enabling the log, and use sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST(), I cannot reproduce the issue on my machine anymore. 🤔 Maybe logging has some impact on the execution sequence and lower the chance of getting wrong AST?

So far, I do not have any concrete conclusion about what's going on. Just recorded all I observed and hope they can bring some inspiration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a thread calls reconciled() with an out-of-date AST before the clearReconciliation() being called from document handler, then fReconcileLock will be occupied by that thread, and the out-of-date AST will be cached.

You're right, good catch! I was a little too optimistic to think that clearReconciliation would solve the problem. It seemed to work, because it works most of the time, and because the method I previously used to trigger the race condition using the debugger gives me a false positive in this case.

I hadn't thought too much about how the different locks in CoreASTProvider interact with each other and how the different methods in CoreASTProvider are synchronized. I'm still not 100% sure how it all works honestly, it's quite complicated.


Some new findings, just for reference:

  1. I tried sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST(), but still had some chance to get the wrong AST.
  1. sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation(). Seems works well on my machine, tried a lot of times and didn't observe wrong highlighting.

Do we know whether that's just "luck" or if there's a reason why it works better? Like, what effect does the second clearReconciliation call have on the race condition?


I thought some more about it, and I'm still not convinced that (1) or (2) would prevent the outdated AST from being cached. It seems to reduce the chances of it happening (?), but then again, I'm having a hard time reproducing the race condition at all on my machine with any of the solutions involving clearReconciliation, so I can't verify this. Because it's a race condition, the timing of the text edits (and the computation time spent on each request) matters a lot when you're trying to reproduce the issue. That's why I've been trying to reproduce the issue using the debugger instead, by pasuing/unpausing threads in a more "surgical" way.

So why am I not convinced that (1) or (2) would prevent the outdated AST from being cached? Consider the following:

  1. Thread A has just created the outdated AST, but has not yet cached it. For example, it could be here: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L206

  2. The BaseDocumentLifeCycleHandler#handleChanged thread calls disposeAST and clearReconciliation (maybe even multiple times). Note that Thread A does not hold any of the CoreASTProvider locks, so disposeAST and clearReconciliation does not have to wait.

  3. Another thread (let's name it Thread B) calls getAST, which in turn calls aboutToBeReconciled: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L163

This call to aboutToBeReconciled will set fReconcilingJavaElement to something non-null, which will essentially "undo" the effect of our clearReconciliation call(s).

  1. Before Thread B is done with getAST (but after the aboutToBeReconciled call), Thread A suddenly calls reconciled with the outdated AST.

Now, because aboutToBeReconciled was called after clearReconciliation (i.e. fReconcilingJavaElement is non-null), Thread A will still cache the outdated AST: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L328-L340

This should be able to happen, at least in theory (I think?). And I would not be surprised if there are more cases like this.


To me, this seems like a hard problem to solve in JDT LS. How can we ever guarantee that clearReconciliation is not "undone" by aboutToBeReconciled? It seems like there is no reliable way to prevent Thread A from caching the outdated AST.

Of course, we could introduce an additional lock (or some other synchronization mechanism) in JDT LS to achieve mutual exclusion between the getAST and disposeAST calls. But maybe such a lock would be a bit too course-grained, and have an impact on performance? If we want something more fine-grained, I suspect we may want to investigate "Option A" or "Option B" (see #1918 (comment)), but I don't really have time to help with that (especially because I'm not familiar with the inner workings of JDT UI/Core).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this seems like a hard problem to solve in JDT LS. How can we ever guarantee that clearReconciliation is not "undone" by aboutToBeReconciled? It seems like there is no reliable way to prevent Thread A from caching the outdated AST.

Yes, I think if JDT.Core can introduce a new property to AST, sth like timestamp or document version, then an AST can be checked if it's out-of-date before being cached. And caller can also use that to do similar checks.

Considering document updating is a very fundamental part and might have a large impact on perf if we introduce additional locks on it, I don't tend to use this approach.

For now, to mitigate the issue, I think using sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() or #2709 can largely reduce the chance of the issue. Or even we can use both two of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think if JDT.Core can introduce a new property to AST, sth like timestamp or document version, then an AST can be checked if it's out-of-date before being cached. And caller can also use that to do similar checks.

Considering document updating is a very fundamental part and might have a large impact on perf if we introduce additional locks on it, I don't tend to use this approach.

Yeah, I agree.

For now, to mitigate the issue, I think using sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() or #2709 can largely reduce the chance of the issue. Or even we can use both two of them.

That is probably the best short-term solution, I can't think of anything better. I can update this PR to add the additional clearReconciliation call. The problem with #2709 is that it doesn't cover the case where the AST has changed but the document length is the same (which can happen in real-word scenarios, as I showed). It's also more complicated, and I feel like it could have a larger performance impact, since it may dispose the AST again and recompute it. But I'm not necessarily against #2709, just thought there are some potential problems to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm a bit confused now: I noticed that the suggestion from @testforstephen was originally:

sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()

But was then edited to be:

sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()

@jdneo Just to double-check, which variant did you try? Should I still go with sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation(), like you wrote in your comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation().

But I think there should be not too much difference between two of them. Maybe you can verify sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to verify it, but I couldn't really reproduce #1918 with either solution, I probably have the wrong timings. I agree that there probably isn't much of a difference between them, and neither of them would completely fix #1918 anyway, as I noted above.

But since you had the best chances of preventing the race condition with sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation(), I have updated the PR to use that implementation.

Signed-off-by: Odin Dahlström <zerodind@gmail.com>
@0dinD

This comment was marked as resolved.

Copy link
Contributor

@jdneo jdneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This should largely mitigate the wrong highlighting issue before the upstream JDT.Core fix it entirely.

@testforstephen testforstephen merged commit c22b229 into eclipse-jdtls:master Jul 20, 2023
6 checks passed
@testforstephen testforstephen added this to the End July 2023 milestone Jul 20, 2023
@testforstephen
Copy link
Contributor

LGTM. This should largely mitigate the wrong highlighting issue before the upstream JDT.Core fix it entirely.

Let's merge first and monitor the situation closely. We can explore other options if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants