Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle threads never terminated using default Hystrix thread pool #1242

Closed
mattrjacobs opened this issue Jun 13, 2016 · 23 comments
Closed

Idle threads never terminated using default Hystrix thread pool #1242

mattrjacobs opened this issue Jun 13, 2016 · 23 comments

Comments

@mattrjacobs
Copy link
Contributor

Hystrix currently uses the 'coreSize' threadpool config for both ThreadPoolExecutor.corePoolSize and ThreadPoolExecutor.maximumPoolSize. The effect of this is that all Hystrix thread pools are fixed size. That also implies that the keepAlive config is only used when a plugin overrides the default Hystrix thread pool.

This issue will investigate changing the default Hystrix thread pool to accept coreSize and maximumSize, and letting idle threads time out.

See initial discussion in Hystrix Google Group: https://groups.google.com/forum/#!topic/hystrixoss/lT4CZgt-KRk

@mohan-mishra
Copy link

mohan-mishra commented Sep 14, 2016

@mattrjacobs So if I assign a coreSize of 100 to a command, it will assign itself all the 100 threads and these 100 threads can't be used by anyone. Is this what you are trying to say?

Or this only means that the whatever number of threads have been used, they wont get terminated? This can be detrimental to the systems health. Can you suggest some work around?

@mattrjacobs
Copy link
Contributor Author

Yes, these threads get spun up and stay around. A couple points to make around that:

  1. Hystrix is not adding work to your system (other than the bookkeeping Hystrix does to maintain internal state). If you need 100 concurrent blocking I/O operations, that would require 100 threads in a system without Hystrix and 100 in a system with Hystrix. These threads execute the run() method of your Hystrix command. If they are unused, they sit idle.

  2. We run a high-volume system, and the largest threadpool we have is around 20. Are you sure that you need 100 threads?

@mohan-mishra
Copy link

The response time in your case is around 0.2secs if I am not wrong. While I
depend too much on third party apis which have response time of 4secs on an
average. So as per the formula of rps*99th latency I will be needing it. My
only concern is these many daemon threads would be creating unnecessary
load on the server. Your views?
On 15-Sep-2016 09:42, "Matt Jacobs" notifications@github.com wrote:

Yes, these threads get spun up and stay around. A couple points to make
around that:

  1. Hystrix is not adding work to your system (other than the bookkeeping
    Hystrix does to maintain internal state). If you need 100 concurrent
    blocking I/O operations, that would require 100 threads in a system without
    Hystrix and 100 in a system with Hystrix. These threads execute the run()
    method of your Hystrix command. If they are unused, they sit idle.

  2. We run a high-volume system, and the largest threadpool we have is
    around 20. Are you sure that you need 100 threads?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1242 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJaTgaoTedOn-gmz3H-4eUL_Vd7MT0U8ks5qqMW4gaJpZM4I0fam
.

@mattrjacobs
Copy link
Contributor Author

What gives you reason to believe that the Hystrix threads are creating unnecessary load?

@mohan-mishra
Copy link

Lets say I have a bare minimum of 10command groups each having pool size of
100. So after doing there work 1000 threads will be staying around not
getting terminated. Wont this be having a negative impact on the system?
Your views
On 15-Sep-2016 10:09, "Matt Jacobs" notifications@github.com wrote:

What gives you reason to believe that the Hystrix threads are creating
unnecessary load?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1242 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJaTgYXUSopk68DNVQ9eDN3NPtQhgXsuks5qqMwFgaJpZM4I0fam
.

@mattrjacobs
Copy link
Contributor Author

We have never found idle threads to be a source of load in our system. That's why I'm asking what reasons you have for suspecting they are a source of load in yours.

@mohan-mishra
Copy link

I am thinking this because now these threads which are idle(once used by
hystrix command) won't be available for use by operations which don't use
hystrix. I don't have hystrix system-wide. Only a few of my apis are on
hystrix. So that is why I think that they will be creating unnecessary load

On Thu, Sep 15, 2016 at 10:30 AM, Matt Jacobs notifications@github.com
wrote:

We have never found idle threads to be a source of load in our system.
That's why I'm asking what reasons you have for suspecting they are a
source of load in yours.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1242 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJaTgVcR3SjfpVVvppnnNWJ1UHQ427inks5qqNDpgaJpZM4I0fam
.

@mohan-mishra
Copy link

Could you let me know this

Lets say I have a command group with core size of 100. Now at some point of
time all 100 have been used for some operation and now all 100 are idle(and
not terminated). Can a thread out of these 100 idle threads be used by
operations which don't use hystrix?

On Thu, Sep 15, 2016 at 10:45 AM, mohan mishra mishramohan2@gmail.com
wrote:

I am thinking this because now these threads which are idle(once used by
hystrix command) won't be available for use by operations which don't use
hystrix. I don't have hystrix system-wide. Only a few of my apis are on
hystrix. So that is why I think that they will be creating unnecessary load

On Thu, Sep 15, 2016 at 10:30 AM, Matt Jacobs notifications@github.com
wrote:

We have never found idle threads to be a source of load in our system.
That's why I'm asking what reasons you have for suspecting they are a
source of load in yours.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1242 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJaTgVcR3SjfpVVvppnnNWJ1UHQ427inks5qqNDpgaJpZM4I0fam
.

@mattrjacobs
Copy link
Contributor Author

No. By design, these are only used for Hystrix run() methods and sit idle the remainder of the time

@mohan-mishra
Copy link

Is there any plan in the future updates if any to terminate the used
threads instead of keeping them idle?
Also is there any way using which I can get the maximum concurrent requests
served by a command group?

Sent with Mailtrack
https://mailtrack.io/install?source=signature&lang=en&referral=mishramohan2@gmail.com&idSignature=22

On Sun, Sep 18, 2016 at 8:36 PM, Matt Jacobs notifications@github.com
wrote:

No. By design, these are only used for Hystrix run() methods and sit idle
the remainder of the time


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1242 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJaTgVB6h7n9lYUUptsVLrwseAuTlUD9ks5qrVNsgaJpZM4I0fam
.

@bltb
Copy link
Contributor

bltb commented Sep 20, 2016

+1

Unfortunately, if one happens to use large coreSize values, depending on the operating system, this can translate into a large number of native threads and not insignificant native memory usage.

@mattrjacobs
Copy link
Contributor Author

@mohan-mishra Yes, this ticket tracks the work to do so. I haven't had any time to work on it yet. The description in the first comment describes the proposed implementation.

For your maximum-concurrency question, there is a command-level method to return this value for a specific command: HystrixCommandMetrics#getRollingMaxConcurrentExecutions()

For a thread-pool, this is the method to use: HystrixThreadPoolMetrics#getRollingMaxActiveThreads()

@mattrjacobs
Copy link
Contributor Author

@bltb Yeah, you're right. This is not pain we've felt acutely, as the production system we work on has fairly small thread pools (max around 25), and many other memory consumers that make the Hystrix thread usage fairly innocuous.

I'll add this to the 1.5.6 milestone and get this cleaned up

@mattrjacobs mattrjacobs added this to the 1.5.6 milestone Sep 21, 2016
@mattrjacobs mattrjacobs modified the milestones: 1.5.7, 1.5.6 Sep 28, 2016
@mxr-mwaters
Copy link

We have just ran into this issue today. We have a system that talks to many, many other systems. It is a common service that allows key events in our system to have a publish-subscribe model. We were using hystrix to protect ourselves from poorly behaved subscribers and we have ran into that model overwhelming the system with threads. For example with a 1000 subscribers, by default that will be 10,000 threads which is more than the max threads per process allowed at a system level.

It would be nice to create thread pools with a core size of 1 (or 0) and have maxsize set to 5, and reap threads that have been idle for 60 seconds or something.

Thanks!

@kakawait
Copy link

kakawait commented Oct 7, 2016

I think I have same issue (more may I have other leak), I have a service that proxify many request to a large number of other services. I created a hystrix threadpool based on hostname:port with coreSize = 20.

From what I can see on jconsole thread pool allocate a new thread each new request until 20 and it never release it (here I just add many entry on /etc/hosts that point to http://httpbin.org)

screen shot 2016-10-07 at 17 46 38

That may lead to

java.lang.OutOfMemoryError: unable to create new native thread

@mattrjacobs do you think is possible to release thread with our own strategy using https://github.com/Netflix/Hystrix/wiki/Plugins? Or may I have to switch to SEMAPHORE strategy rather than THREAD until this issue will be fixed?

@bltb
Copy link
Contributor

bltb commented Oct 7, 2016

@kakawait, whilst @mattrjacobs has already fixed this in master via #1371, if you really have (or need) many running threads, your actual issue may be due to an operating system ulimit...

@mattrjacobs
Copy link
Contributor Author

Quick update on this issue: I'm going to add a piece of config which enables this feature and defaults it to false. I don't want any Hystrix users to get misconfigured threadpools when doing the upgrade to 1.5.7. Instead, by opting in, the user is then on the hook for verifying threadpool sizes are as they desire.

When doing internal testing, I found that the custom HystrixPropertiesStrategy we are using needed to be updated to work with the new maximum size config.

@kakawait
Copy link

@bltb I well understand that if I create many (too much maybe) threadPoolKey that will affect the number of thread that hystrix will create based on coreSize (and new maxSize).

Even if I put coreSize to 2 and maxSize to 10, if application spawns 200 different threadPoolKey hystrix will lease at least 200 x 2 = 400 threads and never release idle threads.

I was more talking about allowCoreThreadTimeOut capability from ThreadPoolExecutor. I tried every configuration as possible (setting keepAlive, etc) allowCoreThreadTimeOut is always false. Without writing my own HystrixConcurrencyStrategy to set allowCoreThreadTimeOut to true hystrix will keep idle threads infinitely.

Btw my use case is edge case, I really think I will change my strategy to SEMAPHORE because app acts as reverse proxy that spawn to many threadPoolKey in addition that I was not using thread timeout feature.

@mattrjacobs
Copy link
Contributor Author

I just merged in #1399, which adds a property (hystrix.threadpool.<XXX>.allowMaximumSizeToDivergeFromCoreSize) that controls whether the maximum size config value is used. This forces users to opt-in and consciously choose this style of configuring threadpools. It prevents this release from unintentionally changing behavior in existing apps.

I'll update the docs around the new configuration options and then close this issue

@pmohankumar
Copy link

So if I've to keep core and max pool size different, I need to set allowMaximumSizeToDivergeFromCoreSize: true and then specify both coreSize and maximumSize?

Any ETA for 1.5.7 release with these changes? I don't see a due date on 1.5.7 milestone.

@mattrjacobs
Copy link
Contributor Author

@pmohankumar That's correct. I need to get a few other PRs reviewed, and hope to cut the 1.5.7 release early next week. Thanks for the patience

@mattrjacobs
Copy link
Contributor Author

I just added the new configuration options to the Configuration Wiki. Closing this issue now

@mattrjacobs
Copy link
Contributor Author

1.5.7 is released now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants