Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix add-on race condition on client #4038

Merged
merged 2 commits into from
May 9, 2023

Conversation

cwisniew
Copy link
Member

@cwisniew cwisniew commented May 7, 2023

Identify the Bug or Feature request

Fixes #4032

Description of the Change

Removed a race condition in the processing of onInit in add-ons. This race condition would only occur for clients as their add-on import happens on the connection thread. I know the bug report has onFirstInit but the freeze was happening in onInit (clients don't process onFirstInit, its only run first-time add-on is added to the campaign, the removal of onFirstInit causing it to work I guess can be put down to the vagaries of outcomes when it comes to race conditions).

Possible Drawbacks

There should be none.

Documentation Notes

Fixes a secnaio where clients connecting to a server with an add-on that has onInit could encounter a MapTool freeze.

Release Notes

  • Fixes a secnaio where clients connecting to a server with an add-on that has onInit could encounter a MapTool freeze.

This change is Reviewable

@emmebi
Copy link
Collaborator

emmebi commented May 7, 2023

Is it possible that #3906 is related to this?

@cwisniew cwisniew changed the title Bump log4j Fix add-on race condition on client May 7, 2023
@kwvanderlinde
Copy link
Collaborator

@cwisniew I think you meant to reference #4032, is that right?

@cwisniew
Copy link
Member Author

cwisniew commented May 8, 2023

@cwisniew I think you meant to reference #4032, is that right?

I did indeed, sometimes autocomplete on GitHub issues is not my friend... Anyways I have update it.

@kwvanderlinde
Copy link
Collaborator

Cool, thanks!

Looking at the code change, I have to admit I don't see how this fixing things. If there's a race condition, removing the wait isn't going to resolve it, if anything it increases the possibility of races. Seeing that this is a freezing issue, I guess "race condition" is really meant to read "deadlock"?

Assuming deadlock was meant, what is actually being deadlocked here? It doesn't look to me like the receive thread is being held up, since it makes a non-blocking call to the asset manager to load the library asset. Though the asset loader thread is waiting on the initialization to finish, and the initialization is waiting on the MTScript to finish running on the Swing thread. So is the Swing thread somehow also waiting on the asset loader thread, or ... ?

One step further, the initialization code also joins() on the JS results. Is that okay or can it similarly hold things up? Is MTScript only a problem because its being run on the Swing thread?

@cwisniew
Copy link
Member Author

cwisniew commented May 8, 2023

Cool, thanks!

Looking at the code change, I have to admit I don't see how this fixing things. If there's a race condition, removing the wait isn't going to resolve it, if anything it increases the possibility of races. Seeing that this is a freezing issue, I guess "race condition" is really meant to read "deadlock"?

Assuming deadlock was meant, what is actually being deadlocked here? It doesn't look to me like the receive thread is being held up, since it makes a non-blocking call to the asset manager to load the library asset. Though the asset loader thread is waiting on the initialization to finish, and the initialization is waiting on the MTScript to finish running on the Swing thread. So is the Swing thread somehow also waiting on the asset loader thread, or ... ?

One step further, the initialization code also joins() on the JS results. Is that okay or can it similarly hold things up? Is MTScript only a problem because its being run on the Swing thread?

image

At the point it's hanging, the swing thread is waiting on the thread initializing the add-on, and the add-on initialization is waiting on the swing thread. So yes, there is a deadlock, but also set up by a race condition because if the initialization of the add-on occurs at the "correct time", the deadlock never happens. This is why my attempts to reproduce didn't work and why it works for some of the reporter's players but not others (when I got the campaign, I saw something similar it works if one of my computers is the client but not if the other is). Removing the wait on the swing thread removes the lock (so and the resource the two threads were racing for, so no race and no deadlock). Both are true, but I thought a race condition better described why it happens sometimes and not others.

For the runJS() call, it's not waiting on the Swing thread (and the thread spawned is only waiting on IO) so there should be no issues. Watch now that I have said this guaranteed someone will run into a problem in a few days...

@cwisniew cwisniew added the bug label May 9, 2023
@cwisniew cwisniew merged commit 041b9cc into RPTools:release-1.13 May 9, 2023
@cwisniew cwisniew deleted the fix-addon-freeze branch May 9, 2023 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants