Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow idle threads to live forever #612

Merged
merged 1 commit into from
May 2, 2023
Merged

Conversation

mjgp2
Copy link
Contributor

@mjgp2 mjgp2 commented Apr 27, 2023

From https://bz.apache.org/bugzilla/show_bug.cgi?id=66581

Currently the ExecutorService used for this is as such:

ExecutorService executorService = new ThreadPoolExecutor(0, Integer.MAX_VALUE, Long.MAX_VALUE, TimeUnit.MILLISECONDS, new SynchronousQueue<>(), new AsyncIOThreadFactory());

This means that the threads never terminate and are spawned whenever a thread is not available, which for a server under a spike of load spawns many threads which never exit, consuming system resources indefinitely.

I would suggest that a saner idle timeout than Long.MAX_VALUE should be used, such as 60 seconds, unless there is a good reason I am unaware of as to why such threads should never die.

@rmaucher
Copy link
Contributor

This seems like a good idea (although no idea about the best default idle value). It was like this since the code was initially added.

@markt-asf
Copy link
Contributor

My reading of the code is that the AsynchronousChannelProvider submits tasks that never end to the executor so this patch has no effect. The purpose of the Executor is to replace any threads that fail. Local testing appears to confirm this. Unless I've missed something (always possible) I think this PR (and the associated BZ issue) needs to be closed without being merged.

@mjgp2
Copy link
Contributor Author

mjgp2 commented Apr 27, 2023

I'll have a further look at this tomorrow. The situation I have is hundreds of idle threads. Potentially a configurable solution using virtual threads would mitigate the issue.

@markt-asf
Copy link
Contributor

A thread dump would be helpful.

@rmaucher
Copy link
Contributor

Hum, this is the websocket client and it is using NIO2, so that executor would be used internally for all things that are run asynchronously for IO like calling completion handlers and so on.

@mjgp2
Copy link
Contributor Author

mjgp2 commented Apr 28, 2023

I've done a smaller thread ump (attached).

I can find this documentation on the AsynchronousChannelGroup.withCachedThreadPool call made within this websocket code:

•  An asynchronous channel group associated with a cached thread pool submits events to the thread pool that simply invoke the user's completion handler.
•  Internal kernel's I/O operations are handled by one or more internal threads that are not visible to the user application.
•  That means you have one hidden thread pool that dispatch events to a cached thread pool, which in turn invokes completion handler
•  Probability of suffering the hangs problem as with the FixedThreadPool is lower.
•  Still might grows infinitely (those infinite thread pool should have never existed anyway!).
•  At least you guarantee that the kernel will be able to complete its I/O operations (like reading bytes).
•  Oups...CachedThreadPool must support unbounded queueing to works properly (grrr not sure I like that!!!).
•  So you can possibly lock all the threads and feed the queue forever until OOM happens. Great!
•  Of course, nobody will ever do that!

So in short this threadpool is only used for completion callbacks.

It seems like the decision was made to have a synchronous queue with an unbounded number of threads, rather than have a bounded number of threads and an unbounded queue. To me this seems like the wrong way round, interested to hear thoughts?

There is no good reason to keep the threads running forever that I can see based on the documentation and also other implementations that call the withCachedThreadPool method?

These threads are globally shared, however, looking at the dump there are two SecureIO threads created for every websocket session. This seems very resource hungry if you have hundreds of websocket connections? This is something that I imagine could be solved by virtual threads, or refactoring to have a shared thread pool?

Thanks all

threaddump.txt

@markt-asf
Copy link
Contributor

The thread dump confirms the earlier analysis. The sun.nio.ch.KQueuePort$EventHandlerTask never exits so the proposed change will not have the impact desired.
It would be worth exploring if the default group configuration has changed since the code was first written since the intention was to mimic that but with the added benefits that multiple WsWebSocketContainer would share a single pool and that the pool would be destroyed when no longer required.

@markt-asf
Copy link
Contributor

The default AsynchronousChannelGroup has adopted a keep-alive of 60s at some point but since the task never exits, it makes no difference in this case.
We could make the similar change in Tomcat but, until sun.nio.ch.KQueuePort$EventHandlerTask changes, it will have no effect.
'KQueuePort` appears to be MacOS specific (at least in the latest source code) so there may be benefits in this change on other platforms.
Overall, I think I have convinced myself this change makes sense, even if it doesn't solve the issue the OP is seeing.

@markt-asf markt-asf merged commit cd25be5 into apache:main May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants