Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression - deadlock on undo (#15) #16

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

iloveeclipse
Copy link
Member

Reverted commit d610ff2 as it caused
deadlock during undo if the undo operation also wants to use workspace
rule.

This fixes issue
#15

@laeubi laeubi requested a review from jukzi April 19, 2022 07:27
@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

What I always wondered: The Job APi seems to use quite low level synchronization primitives. Was there any attempt in using one of the more higher level synchronization primitives (Executors, Queues, Semaphores, Barriers and alike)? As we are now at Java 11 there should not be a problem with backward-compat to older JVMs...

Reverted commit d610ff2 as it caused
deadlock during undo if the undo operation also wants to use workspace
rule.

This fixes issue
eclipse-platform#15
@iloveeclipse
Copy link
Member Author

@laeubi : a the lock is a lock, independently which API is used, at the end there is an exclusive resource that needs to be guarded. The problem here is the workspace lock that was added at the beginning of the undo action, so the actual task that also wants take the same workspace lock has to wait now forever. Not sure how any new API may help here.

Not saying new API's are useless in general, if you have something concrete in mind, feel free to propose.

jukzi
jukzi previously requested changes Apr 19, 2022
Copy link
Contributor

@jukzi jukzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to reproduce the issue?

@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

a the lock is a lock, independently which API is used

Still there are techniques that are prone to errors and deadlocks, then there are synchronization techniques that don't need locks at all and so on...

The problem here is the workspace lock that was added at the beginning of the undo action, so the actual task that also wants take the same workspace lock has to wait now forever.

And that sounds like the problem is that instead on locking at the thread/object level it should more be that if i have a lock on the workspace and want other tasks perform on behalf of my lock.

if you have something concrete in mind, feel free to propose

Nothing specific, I just noticed that Job api is suing a lot of wait/notify/synchronized (also in this case) and this generally is hard to do right. But I'm not very familiar with the internals to decide if it could be done better (e.g. lock free mutual exclusion) instead.

@iloveeclipse
Copy link
Member Author

@jukzi : please remove your minus vote. There is nothing to do here, original code must be restored.

@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

You can request to review again in the upper right by clicking the "refresh" icon near the user name of the reviewer.

@iloveeclipse iloveeclipse requested a review from jukzi April 19, 2022 08:21
@jukzi
Copy link
Contributor

jukzi commented Apr 19, 2022

It does not sound any useful to nest progressService within an undo operation as the undo framework will show a progress window after some time anyway. You should not fork join from swt thread. Just run the operation and the beginRule will succeed as you already have the lock.
It's totally expected that the undo operation itself requires a rule - it is just not expected nor good to fork join that into another thread.

Instead of revert i would prefer to add an API to the Operation (like "IAdvancedUndoableOperation3") that it can tell if it wants OperationHistoryActionHandler.run() to aquire a ISchedulingRule in advance or not.... and make the old behaviour default. But use the new pattern for eclipses own Operations to prevent UI freezes.

@mickaelistria
Copy link
Contributor

+1 for a revert. And as @iloveeclipse noticed, there should be no dependency from workbench to resources.

@mickaelistria mickaelistria removed the request for review from jukzi April 19, 2022 08:27
@mickaelistria mickaelistria dismissed jukzi’s stale review April 19, 2022 08:27

We need to revert anyway

@mickaelistria
Copy link
Contributor

@iloveeclipse I could "dismiss" the negative review by clicking the ... besides it on the summary box. You should now be able to merge.

@iloveeclipse
Copy link
Member Author

It does not sound any useful to nest progressService within an undo operation as the undo framework will show a progress window after some time anyway. You should not fork join from swt thread. Just run the operation and the beginRule will succeed as you already have the lock. It's totally expected that the undo operation itself requires a rule - it is just not expected nor good to fork join that into another thread.

@jukzi : Eclipse is about extensions, and the change breaks that. I've already pointed out, that introducing a lock in undo operation is not OK, same with the dependency on resources bundle.

Instead of revert i would prefer to add an API to the Operation (like "IAdvancedUndoableOperation3") that it can tell if it wants OperationHistoryActionHandler.run() to aquire a ISchedulingRule in advance or not.... and make the old behaviour default. But use the new pattern for eclipses own Operations to prevent UI freezes.

This will not help much as long as the lock & dependency from workbench code on resources bundle remains.
Please rethink the original change and for now let us revert that.

@iloveeclipse iloveeclipse merged commit 7c144e6 into eclipse-platform:master Apr 19, 2022
@iloveeclipse iloveeclipse deleted the issue_15 branch April 19, 2022 08:29
@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

I could "dismiss" the negative review by clicking the ... besides it on the summary box

Every comitter can do so, but using this feature makes the whole point of branch protection useless, we should then simply remove that branch protection rule if we just override this if there is a disagreement.

@mickaelistria
Copy link
Contributor

using this feature makes the whole point of branch protection useless, we should then simply remove that branch protection rule if we just override this if there is a disagreement.

It's not a disagreement, there are things that are technically wrong in their context and there is no room to agree or disagree with it. The former patch was wrong, the revert is right.
The agreement/disagreement here is not on whether code is good, it's more on what's the context of this code. The current context is code is meant to work with downstream clients without dependency on resource model. We can discuss whether to change the context (eg target only Eclipse IDE), and then, upon agreement, consider changing things accordingly.
But here, the context remains the same and the technical aspect is clearly that reverting is better, for the moment.

@jukzi
Copy link
Contributor

jukzi commented Apr 19, 2022

I dislike a dismiss war. If we revert every commit that causes an error we end up with an empty repository. You could simply wait some time for a fix or agreement.

@iloveeclipse
Copy link
Member Author

It's totally expected that the undo operation itself requires a rule - it is just not expected nor good to fork join that into another thread.

Another note regarding that: what our code does is of course much more as the snippet I've added.

@jukzi : one can't assume anything about client code needs beside the contract / constraints declared in the javadoc on the IUndoableOperation class and org.eclipse.core.commands.operations.IUndoableOperation.undo(IProgressMonitor, IAdaptable). There is nothing about restrictions how undo is supposed to run, in which particular thread or using which locks. This is fully open and up to the concrete implementation.

So as long as it is not specified in the API contract, undo implementation can do whatever it wants in any thread it wants.
That worked (for years) and used legal API (no internal hacks etc), and if we want improve something, we should take care that existing valid implementations shouldn't be broken.

@mickaelistria
Copy link
Contributor

You could simply wait some time for a fix or agreement.

Sorry, but adding a dependency from workbench to resoures is a major issue that qualifies for an immediate revert as it breaks some important design. The main issue is that It was unfortunately not identified during review, so it was merged/dismissed instead of prevented. It's indeed less nice, but the net result on the code is the same and that's what matters.
Sorry if it seems unfriendly, but keep in mind there is nothing personal here; and also it's not really an individual issue but also a collective one (in our failure to properly review such important change such as changing dependency tree that hard).

@jukzi
Copy link
Contributor

jukzi commented Apr 19, 2022

It's indeed less nice, but the net result on the code is the same and that's what matters.

committers don't matter? think twice.
And you just recreated a UI freeze...

@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

I agree with @jukzi that just reverting things because they cause issues somewhere in a unknown consumer is bad, and always leave a bad impression. We should always try to move forward, so instead of reverting a better approach would be to try fixing the issues, what should include also consider:

  • why was it not discovered through the review?
  • are there probably some implications that are not documented?
  • more important: Why was it not discovered through the automatic checks in advance?

And I disagree with @iloveeclipse that one can't assume nothing especially in the context of SWT, the workbench and on other places we actually require some restrictions to prevent deadlocks, so if actually an implementation needs to be lock free we either should clearly document this, the same for the consumer if its not ok to fork/join then it might needs to be better expressed here.

Beside that using join() is always bad and prone to dead-locks, probably we should (if not already there) have CompletableFuture support on the job api to allow for a notify model instead of a blocking model.

@iloveeclipse
Copy link
Member Author

iloveeclipse commented Apr 19, 2022

@laeubi : I absolutely agree that join() is bad etc, but that's not the point here.
Our implementation was deadlock free before the change in question, that added extra lock in context of SWT, see https://github.com/eclipse-platform/eclipse.platform.ui/pull/6/files#r852668795.

We can't add a restriction on "undo" operation like: "you are not supposed to take any workspace locks". That is API contract breaking.

If the "improved" undo can only work in context of a workspace root lock taken, that would surely require new API and explicit dependency added from the workbench to resources, which is another point here.

@mickaelistria mickaelistria linked an issue Apr 19, 2022 that may be closed by this pull request
@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

That is API contract breaking.

Sure I just wondering if it is really documented in the API that it will be lock free? Anyways I think the problem is not necessarily a lock here but the BusyIndicator and progressDialog.run both need to be handled with care and might not behave as expsected as when not running in the background require that the caller make sure to drive the eventqueue!

@iloveeclipse
Copy link
Member Author

Sure I just wondering if it is really documented in the API that it will be lock free?

No, it is not, but it is also wasn't implemented with any locks, so there was nothing to document.

Otherwise, if a public API relies on some locks/resources, it must be documented that client code running in this specific API context is exposed to this and shouldn't use locks/resources in question, or isn't supposed to execute tasks in other threads etc.

@laeubi
Copy link
Contributor

laeubi commented Apr 19, 2022

You can easily dead-lock any SWT component using async-exec + join, that is nowhere documented, so is SWT broken and should be fixed?

@iloveeclipse
Copy link
Member Author

@laeubi : not sure what do you want to say. It is possible to deadlock any API that uses locks or dedicated threadas to run operations on.

SWT is single threaded, so anyone who uses SWT should at least know that.

In context of this bug we are talking about API that wasn't limited in the sense of which thread or scheduling rules the contributions could use.

@jukzi
Copy link
Contributor

jukzi commented Apr 19, 2022

Documentation? "Truth can only be found in one place: the code."
See also Hyrum's Law
Anyway can we please focus on finding a solution (eclipse-platform/eclipse.platform.runtime#30)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock on undo
4 participants