Skip to content

ARTEMIS-485: Use unbounded client thread-pool by default#468

Closed
bgutjahr wants to merge 3 commits intoapache:masterfrom
bgutjahr:thread_pools
Closed

ARTEMIS-485: Use unbounded client thread-pool by default#468
bgutjahr wants to merge 3 commits intoapache:masterfrom
bgutjahr:thread_pools

Conversation

@bgutjahr
Copy link
Copy Markdown
Contributor

I changed the ActiveMQClient code to use an unbounded cached thread pool by default, as it was the case with Artemis 1.1.0 and before.

Bounded fixed-size thread pools create and keep all configured client threads, which can negatively impact client behavior due to additional memory consumption. Even the recently made change to set default max size based on the number of cores won't help on modern multi-core systems.

I also had to remove the capability to dynamically change the thread pool max size, as this would not work if changing the size of an unbounded cached thread pool (due to different blocking queue classes). So after changing the max size, the caches have to cleared to get new caches created.

In addition, I fixed some API flaws (public writable fields and not checking for injected null thread pools).

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 18, 2016

@bgutjahr Hi. The issue with unbounded thread pools is that it can eventually exhaust the system threads. We have seen this happen and the whole server grinds to a hault. We introduced a fixed pool size, to avoid this scenario (though it is possible to exhaust system memory, since we essentially have an unbounded job queue). It is possible to add a setting to timeout threads, which would solve your issue with the fixed thread pool, creating and storing lots of threads. We've also recently changed the default thread pool size to something more reasonable. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#allowCoreThreadTimeOut(boolean)

Your changes around API and injecting pools sounds good. I'll comment in the commit.

Thanks

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 18, 2016

@bgutjahr This all look good to me, except the unbounded default pool setting :D, for reasons I mentioned above. We have had people complaining that it grows out of control under high load. Which as I mentioned is the original reason for the change.

How about we add the setting to time out threads in the pool as shown in the link above but keep the default the same. Using a multiplier of the cores is a reasonably standard way to calculate defaults and it's usually reflection on the system resource, but we could easily change this, if there is a lot of objection. This is also how Netty instantiates some of it's thread pools.

I wonder if there is a thread pool policy that could prevent the pool growing too large, but not behave like a fixed pool. Ideally we'd like to pool to grow organically, up to a certain number. I was unable to find an appropriate policy using ThreadPoolExecutor in the JDK. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html. But perhaps there's another way to handle this? Any suggestions?

@bgutjahr
Copy link
Copy Markdown
Contributor Author

@mtaylor I can understand the issue of an unbounded number of threads. So leaving the default to a fixed thread pool is not an issue for me, as long as we can change it back to unbounded. In that case, I would suggest to also adapt the user documentation accordingly. I was under the - obviously wrong - impression that changing the default to a fixed size was just a side effect of the useful enhancement to configure the global thread pool.

In our application, I have seen each client thread to have ~500MB memory allocated. With the default of 500, this consumed 250GB heap space. On my 32 core system, which the new default of 8*cores, this would still be 125GB. At the end, this caused our process to go into constant garbage collection and resulted in a big performance impact. As we use JMS but for a relativ low message volume, we never have more than a few client threads actively running. Thus we have never seen an issue with the unbounded thread pool.
I wonder whether the default number of threads should be reduced further, but I don't really know what a good default would be. I guess this highly depends on the application.
Maybe the the amount of memory that is allocated by idle threads could be reduced and the whole issue would go ways. I didn't investigate that area.

I had also looked that the JDK ThreadPoolExecutor. It looks like it doesn't allow to behave like a cached pool, but with a limited number of threads. One option would be to implement a new thread pool that combines there two paradigms.
I also had a look a the allowCoreThreadTimeOut option. But even then the pool prefered to create new threads before reusing existing. With a 1 minute timeout and a rate of 10 messages per second, which can easily be handled by a small number of threads, we still got 500 threads in the pool. The only different was that these threads timed out after I stopped sending messages.

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 19, 2016

@bgutjahr Thanks for the reply.

Yes. You are right we should get the documentation updated accordingly and explain the pro's and cons of each approach, fixed, unbounded.

With regards to heap size, I am trying to figure out how each thread manages to increase the heap by 500MB. this seems huge, if this is the case we need to figure out what is causing this. Perhaps sending lots of large messages concurrently could cause this? Though if this is the case unbounded pool probably wouldn't help... Might you mean stack space by any chance? If so this does seem like a very high number and I'd recommend exploring why this is set so high.

Yes I see the issue with timeout. What you described was my original concern the first time round which is why I left it out. If you decrease the timeout further you'll likely get a performance hit, as you'll be constantly creating new threads (which with large stack size would be even more costly).

I think what we are missing with regards to defaults, is some performance numbers, really these are just guesses, looking mostly at what Netty does. I'm interested in your use case. Particularly understanding why the heap grows so large, because I think this is something that needs addressing. Do you have any more info on this?

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 19, 2016

I've created a couple JIRAs to track what we have discussed:

https://issues.apache.org/jira/browse/ARTEMIS-492
https://issues.apache.org/jira/browse/ARTEMIS-491

@mnovak
Copy link
Copy Markdown
Contributor

mnovak commented Apr 19, 2016

Usually each thread takes 1MB of memory for stack size on 64-bit machine. afaik this should be separated from heap space but on HP-UX with HP-UX JDK 1.8 we saw that it takes memory from heap as well.

@bgutjahr what platfrom do you use?

@bgutjahr
Copy link
Copy Markdown
Contributor Author

We use LInux and Windows, and we have seen consistent memory allocation. It's not stack memory. So far, I have found out that the memory is allocated in io.netty.buffer.PoolThreadCache instances. I have one instance per thread, each has allocate 216232 bytes in my test. I don't know yet if this is something I could influence in my application code.

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 20, 2016

@bgutjahr Could you please update your PR to reflect what we've discussed here (essentially keeping the default as fixed pool). Also, some of the commits have the same higher level goal of protecting accessors and adding validation. could you squash these please, this keeps it nice and neat.

We can talk more about more about your use case in the mailing list. Particularly the memory usage you are seeing. Regardless of the number of threads... you shouldn't be seeing an increase in heap of 500MB per thread, this seems crazy.

Thanks again!

Bernd Gutjahr added 3 commits April 20, 2016 14:33
…d pool

Adapted code to handle -1 correctly to configure an unbounded thread pool.
In addition, I removed the capability to reconfigure the max pool size
of existing thread pools, because the global thread pool can either be
an unbounded cached pool, or a bounded fixed size pool.
These 2 kinds of pool also differ in the used blocking queue,
therefore cannot be converted into each other.
1. Changed public fields in ActiveMQClient to private and added getters.

Exposing fields for thread pool sized allow to modify them in undesired ways.
I made these fields private and added corresponding getter methods.
In addition, I renamed the field 'globalThreadMaxPoolSize'
to 'globalThreadPoolSize' to be more consistent with the
'globalScheduledThreadPoolSize' field name.
I also adapted some tests to always call clearThreadPools after
the thread pool size configuration has been changed.

2. Protect against injecting null as thread pools

ActiveMQClient.injectPools allowed null as injected thread pools.
The effect was that internal threads pools were created,
but not shutdown correctly.
…ementation

Changed the ActiveMQClient interface to expose global thread pools as
ExecutorService and ScheduledExecutorService interface. This is necessary
to allow injecting thread pool implementations that are not based on
ThreadPoolExecutor or ScheduledThreadPoolExecutor.
@bgutjahr
Copy link
Copy Markdown
Contributor Author

bgutjahr commented Apr 20, 2016

@mtaylor I squashed my changes into 3 commits and reverted the default thread pool size.

About the memory usage. It look like Netty is putting pooled byte buffers into thread-local caches. So with every new client thread, a separate byte buffer cache is used. After I changed NettyConnection.createTransportBuffer to allocate the buffer with .ioBuffer instead of .directBuffer, the memory allocation issue went away. Although this helped in my case, I don't want to submit such a change, as I have no idea how this would impact Artemis' performance. I guess a real solution would be different, probably more complicated. To me it looks like Netty's pre-thread buffer caching has a different threading model in mind than Artemis.

I'm not going to dig deeper into the memory / buffer caching issue, but I still want to experiment with a different thread pool implementation. Should I create a separate JIRA for the memory issue?

@mtaylor
Copy link
Copy Markdown
Contributor

mtaylor commented Apr 20, 2016

@bgutjahr Yes, please create a separate JIRA for memory usage. Thanks for investigating the issue and for the changes. I'd be interested to see how you get on with finding an appropriate thread pool implementation. Ideally we want to use thread pools provided in the JDK perhaps there's room to extend some of these, or find an appropriate lib.

@asfgit asfgit closed this in 0a719e0 Apr 20, 2016
@bgutjahr bgutjahr deleted the thread_pools branch April 22, 2016 06:36
clebertsuconic pushed a commit to clebertsuconic/artemis that referenced this pull request Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants