New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent caching outdated AST in CoreASTProvider #2714
Prevent caching outdated AST in CoreASTProvider #2714
Conversation
Great, this seems a cheap way to fix the issue. I'll let @jdneo to double check it and also verify whether it has performance impact on the didChange update. |
// We must call clearReconcilation here to prevent getAST calls on other threads | ||
// from caching outdated AST after we just called disposeAST. | ||
// See also: https://github.com/eclipse/eclipse.jdt.ls/issues/1918 | ||
sharedASTProvider.clearReconciliation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the upstream implementation, looks like clearReconciliation()
and reconciled()
are synchronized using fReconcileLock
object.
If a thread calls reconciled()
with an out-of-date AST before the clearReconciliation()
being called from document handler, then fReconcileLock
will be occupied by that thread, and the out-of-date AST will be cached.
I tested this patch, the chance of getting outdated AST decreases quite a lot, but still can met the issue after several more tries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some new findings, just for reference:
-
I tried
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
, but still had some chance to get the wrong AST. -
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
. Seems works well on my machine, tried a lot of times and didn't observe wrong highlighting.
To understand what's happening, I turned JavaManipulationPlugin.DEBUG_AST_PROVIDER
on and set -Djdt.ls.debug=true
to see the logs. But after enabling the log, and use sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
, I cannot reproduce the issue on my machine anymore. 🤔 Maybe logging has some impact on the execution sequence and lower the chance of getting wrong AST?
So far, I do not have any concrete conclusion about what's going on. Just recorded all I observed and hope they can bring some inspiration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a thread calls
reconciled()
with an out-of-date AST before theclearReconciliation()
being called from document handler, thenfReconcileLock
will be occupied by that thread, and the out-of-date AST will be cached.
You're right, good catch! I was a little too optimistic to think that clearReconciliation
would solve the problem. It seemed to work, because it works most of the time, and because the method I previously used to trigger the race condition using the debugger gives me a false positive in this case.
I hadn't thought too much about how the different locks in CoreASTProvider
interact with each other and how the different methods in CoreASTProvider
are synchronized. I'm still not 100% sure how it all works honestly, it's quite complicated.
Some new findings, just for reference:
- I tried
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
, but still had some chance to get the wrong AST.
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
. Seems works well on my machine, tried a lot of times and didn't observe wrong highlighting.
Do we know whether that's just "luck" or if there's a reason why it works better? Like, what effect does the second clearReconciliation
call have on the race condition?
I thought some more about it, and I'm still not convinced that (1) or (2) would prevent the outdated AST from being cached. It seems to reduce the chances of it happening (?), but then again, I'm having a hard time reproducing the race condition at all on my machine with any of the solutions involving clearReconciliation
, so I can't verify this. Because it's a race condition, the timing of the text edits (and the computation time spent on each request) matters a lot when you're trying to reproduce the issue. That's why I've been trying to reproduce the issue using the debugger instead, by pasuing/unpausing threads in a more "surgical" way.
So why am I not convinced that (1) or (2) would prevent the outdated AST from being cached? Consider the following:
-
Thread A has just created the outdated AST, but has not yet cached it. For example, it could be here: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L206
-
The
BaseDocumentLifeCycleHandler#handleChanged
thread callsdisposeAST
andclearReconciliation
(maybe even multiple times). Note that Thread A does not hold any of theCoreASTProvider
locks, sodisposeAST
andclearReconciliation
does not have to wait. -
Another thread (let's name it Thread B) calls
getAST
, which in turn callsaboutToBeReconciled
: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L163
This call to aboutToBeReconciled
will set fReconcilingJavaElement
to something non-null, which will essentially "undo" the effect of our clearReconciliation
call(s).
- Before Thread B is done with
getAST
(but after theaboutToBeReconciled
call), Thread A suddenly callsreconciled
with the outdated AST.
Now, because aboutToBeReconciled
was called after clearReconciliation
(i.e. fReconcilingJavaElement
is non-null), Thread A will still cache the outdated AST: https://github.com/eclipse-jdt/eclipse.jdt.ui/blob/5e02b306fbe22247843b1e09fcf4212b48f2d7f6/org.eclipse.jdt.core.manipulation/common/org/eclipse/jdt/core/manipulation/CoreASTProvider.java#L328-L340
This should be able to happen, at least in theory (I think?). And I would not be surprised if there are more cases like this.
To me, this seems like a hard problem to solve in JDT LS. How can we ever guarantee that clearReconciliation
is not "undone" by aboutToBeReconciled
? It seems like there is no reliable way to prevent Thread A from caching the outdated AST.
Of course, we could introduce an additional lock (or some other synchronization mechanism) in JDT LS to achieve mutual exclusion between the getAST
and disposeAST
calls. But maybe such a lock would be a bit too course-grained, and have an impact on performance? If we want something more fine-grained, I suspect we may want to investigate "Option A" or "Option B" (see #1918 (comment)), but I don't really have time to help with that (especially because I'm not familiar with the inner workings of JDT UI/Core).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this seems like a hard problem to solve in JDT LS. How can we ever guarantee that clearReconciliation is not "undone" by aboutToBeReconciled? It seems like there is no reliable way to prevent Thread A from caching the outdated AST.
Yes, I think if JDT.Core can introduce a new property to AST, sth like timestamp or document version, then an AST can be checked if it's out-of-date before being cached. And caller can also use that to do similar checks.
Considering document updating is a very fundamental part and might have a large impact on perf if we introduce additional locks on it, I don't tend to use this approach.
For now, to mitigate the issue, I think using sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
or #2709 can largely reduce the chance of the issue. Or even we can use both two of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think if JDT.Core can introduce a new property to AST, sth like timestamp or document version, then an AST can be checked if it's out-of-date before being cached. And caller can also use that to do similar checks.
Considering document updating is a very fundamental part and might have a large impact on perf if we introduce additional locks on it, I don't tend to use this approach.
Yeah, I agree.
For now, to mitigate the issue, I think using
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
or #2709 can largely reduce the chance of the issue. Or even we can use both two of them.
That is probably the best short-term solution, I can't think of anything better. I can update this PR to add the additional clearReconciliation
call. The problem with #2709 is that it doesn't cover the case where the AST has changed but the document length is the same (which can happen in real-word scenarios, as I showed). It's also more complicated, and I feel like it could have a larger performance impact, since it may dispose the AST again and recompute it. But I'm not necessarily against #2709, just thought there are some potential problems to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm a bit confused now: I noticed that the suggestion from @testforstephen was originally:
sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
But was then edited to be:
sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
@jdneo Just to double-check, which variant did you try? Should I still go with sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
, like you wrote in your comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
.
But I think there should be not too much difference between two of them. Maybe you can verify sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST()
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to verify it, but I couldn't really reproduce #1918 with either solution, I probably have the wrong timings. I agree that there probably isn't much of a difference between them, and neither of them would completely fix #1918 anyway, as I noted above.
But since you had the best chances of preventing the race condition with sharedASTProvider.clearReconciliation() -> sharedASTProvider.disposeAST() -> sharedASTProvider.clearReconciliation()
, I have updated the PR to use that implementation.
Signed-off-by: Odin Dahlström <zerodind@gmail.com>
0cb8681
to
4b19237
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This should largely mitigate the wrong highlighting issue before the upstream JDT.Core fix it entirely.
Let's merge first and monitor the situation closely. We can explore other options if necessary. |
Related to #1918
Related to redhat-developer/vscode-java#2176
This PR fixes (only partially, see discussion below) #1918 via "Option C" described in this comment: #1918 (comment)
This PR is an alternative to #2709. I believe there are some problems with that workaround, see #2709 (comment)
To my knowledge, this PR should fix
#1918
completely,without any edge cases.This was not the case, see review comments in the discussion below.