ARTEMIS-485: Use unbounded client thread-pool by default#468
ARTEMIS-485: Use unbounded client thread-pool by default#468bgutjahr wants to merge 3 commits intoapache:masterfrom
Conversation
|
@bgutjahr Hi. The issue with unbounded thread pools is that it can eventually exhaust the system threads. We have seen this happen and the whole server grinds to a hault. We introduced a fixed pool size, to avoid this scenario (though it is possible to exhaust system memory, since we essentially have an unbounded job queue). It is possible to add a setting to timeout threads, which would solve your issue with the fixed thread pool, creating and storing lots of threads. We've also recently changed the default thread pool size to something more reasonable. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#allowCoreThreadTimeOut(boolean) Your changes around API and injecting pools sounds good. I'll comment in the commit. Thanks |
|
@bgutjahr This all look good to me, except the unbounded default pool setting :D, for reasons I mentioned above. We have had people complaining that it grows out of control under high load. Which as I mentioned is the original reason for the change. How about we add the setting to time out threads in the pool as shown in the link above but keep the default the same. Using a multiplier of the cores is a reasonably standard way to calculate defaults and it's usually reflection on the system resource, but we could easily change this, if there is a lot of objection. This is also how Netty instantiates some of it's thread pools. I wonder if there is a thread pool policy that could prevent the pool growing too large, but not behave like a fixed pool. Ideally we'd like to pool to grow organically, up to a certain number. I was unable to find an appropriate policy using ThreadPoolExecutor in the JDK. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html. But perhaps there's another way to handle this? Any suggestions? |
|
@mtaylor I can understand the issue of an unbounded number of threads. So leaving the default to a fixed thread pool is not an issue for me, as long as we can change it back to unbounded. In that case, I would suggest to also adapt the user documentation accordingly. I was under the - obviously wrong - impression that changing the default to a fixed size was just a side effect of the useful enhancement to configure the global thread pool. In our application, I have seen each client thread to have ~500MB memory allocated. With the default of 500, this consumed 250GB heap space. On my 32 core system, which the new default of 8*cores, this would still be 125GB. At the end, this caused our process to go into constant garbage collection and resulted in a big performance impact. As we use JMS but for a relativ low message volume, we never have more than a few client threads actively running. Thus we have never seen an issue with the unbounded thread pool. I had also looked that the JDK ThreadPoolExecutor. It looks like it doesn't allow to behave like a cached pool, but with a limited number of threads. One option would be to implement a new thread pool that combines there two paradigms. |
|
@bgutjahr Thanks for the reply. Yes. You are right we should get the documentation updated accordingly and explain the pro's and cons of each approach, fixed, unbounded. With regards to heap size, I am trying to figure out how each thread manages to increase the heap by 500MB. this seems huge, if this is the case we need to figure out what is causing this. Perhaps sending lots of large messages concurrently could cause this? Though if this is the case unbounded pool probably wouldn't help... Might you mean stack space by any chance? If so this does seem like a very high number and I'd recommend exploring why this is set so high. Yes I see the issue with timeout. What you described was my original concern the first time round which is why I left it out. If you decrease the timeout further you'll likely get a performance hit, as you'll be constantly creating new threads (which with large stack size would be even more costly). I think what we are missing with regards to defaults, is some performance numbers, really these are just guesses, looking mostly at what Netty does. I'm interested in your use case. Particularly understanding why the heap grows so large, because I think this is something that needs addressing. Do you have any more info on this? |
|
I've created a couple JIRAs to track what we have discussed: https://issues.apache.org/jira/browse/ARTEMIS-492 |
|
Usually each thread takes 1MB of memory for stack size on 64-bit machine. afaik this should be separated from heap space but on HP-UX with HP-UX JDK 1.8 we saw that it takes memory from heap as well. @bgutjahr what platfrom do you use? |
|
We use LInux and Windows, and we have seen consistent memory allocation. It's not stack memory. So far, I have found out that the memory is allocated in io.netty.buffer.PoolThreadCache instances. I have one instance per thread, each has allocate 216232 bytes in my test. I don't know yet if this is something I could influence in my application code. |
|
@bgutjahr Could you please update your PR to reflect what we've discussed here (essentially keeping the default as fixed pool). Also, some of the commits have the same higher level goal of protecting accessors and adding validation. could you squash these please, this keeps it nice and neat. We can talk more about more about your use case in the mailing list. Particularly the memory usage you are seeing. Regardless of the number of threads... you shouldn't be seeing an increase in heap of 500MB per thread, this seems crazy. Thanks again! |
…d pool Adapted code to handle -1 correctly to configure an unbounded thread pool. In addition, I removed the capability to reconfigure the max pool size of existing thread pools, because the global thread pool can either be an unbounded cached pool, or a bounded fixed size pool. These 2 kinds of pool also differ in the used blocking queue, therefore cannot be converted into each other.
1. Changed public fields in ActiveMQClient to private and added getters. Exposing fields for thread pool sized allow to modify them in undesired ways. I made these fields private and added corresponding getter methods. In addition, I renamed the field 'globalThreadMaxPoolSize' to 'globalThreadPoolSize' to be more consistent with the 'globalScheduledThreadPoolSize' field name. I also adapted some tests to always call clearThreadPools after the thread pool size configuration has been changed. 2. Protect against injecting null as thread pools ActiveMQClient.injectPools allowed null as injected thread pools. The effect was that internal threads pools were created, but not shutdown correctly.
…ementation Changed the ActiveMQClient interface to expose global thread pools as ExecutorService and ScheduledExecutorService interface. This is necessary to allow injecting thread pool implementations that are not based on ThreadPoolExecutor or ScheduledThreadPoolExecutor.
|
@mtaylor I squashed my changes into 3 commits and reverted the default thread pool size. About the memory usage. It look like Netty is putting pooled byte buffers into thread-local caches. So with every new client thread, a separate byte buffer cache is used. After I changed NettyConnection.createTransportBuffer to allocate the buffer with .ioBuffer instead of .directBuffer, the memory allocation issue went away. Although this helped in my case, I don't want to submit such a change, as I have no idea how this would impact Artemis' performance. I guess a real solution would be different, probably more complicated. To me it looks like Netty's pre-thread buffer caching has a different threading model in mind than Artemis. I'm not going to dig deeper into the memory / buffer caching issue, but I still want to experiment with a different thread pool implementation. Should I create a separate JIRA for the memory issue? |
|
@bgutjahr Yes, please create a separate JIRA for memory usage. Thanks for investigating the issue and for the changes. I'd be interested to see how you get on with finding an appropriate thread pool implementation. Ideally we want to use thread pools provided in the JDK perhaps there's room to extend some of these, or find an appropriate lib. |
I changed the ActiveMQClient code to use an unbounded cached thread pool by default, as it was the case with Artemis 1.1.0 and before.
Bounded fixed-size thread pools create and keep all configured client threads, which can negatively impact client behavior due to additional memory consumption. Even the recently made change to set default max size based on the number of cores won't help on modern multi-core systems.
I also had to remove the capability to dynamically change the thread pool max size, as this would not work if changing the size of an unbounded cached thread pool (due to different blocking queue classes). So after changing the max size, the caches have to cleared to get new caches created.
In addition, I fixed some API flaws (public writable fields and not checking for injected null thread pools).