Fix multiple issues related to reconnect #1367
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes some issues regarding reconnect.
Previously there was a massive thread leak caused by repeatedly reconnecting, this no longer happens (tested manually with VisualVM profiler). Also race conditions with null pointer exceptions in multiple places.
No exceptions, leaked threads, deadlocks or other problems found after running 100000 iterations of reconnecting with various delays inbetween. No new test failures (I think one existing failure might be fixed but not sure yet)
To be honest this code is a multithreading nightmare but the general idea of this PR is to make sure that there may exist only one read/write thread per WebSocketClient and it can never be leaked due to "interrupt+join" combination before each reassignment.
I have also removed some unneeded assignments to null as they can cause a race condition where thread is set to null between
if(thread != null)
andthread.interrupt()
- this problem cannot be solved simply by locking, as we can end up in a deadlock.Please review and run your own test programs with this version
@Xander-Polishchuk
Meanwhile I will also continue manual testing and maybe leave a stability test to run for several days
Fixes #1364