-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circuit breaker does not close even after remote service has recovered #1633
Comments
|
We might be experiencing the exact same issue but it's really hard to reproduce (it only took place once in PROD and twice under "extreme" load testing). I'll try to find the time for more digging (we're currently using 1.5.13) |
|
We're having the same issue. Hystrix version is 1.5.12 and we're also using Javanica. |
|
I've spent a little bit more trying to reproduce the case without any success. Having said that, going back to the logs, I've seen that there's been an important refactor around Now, I'm gonna wild guess here, but is it possible that there's a race condition between
Why I am mentioning this? Because in my specific scenario, the CB gets stuck between an |
|
Hi, we're having the same issue (version 1.5.12), the circuit randomly OPENS AND NEVER CLOSES AGAIN, even if the backend service work properly. This is our config (we know is very conservative):
No more "success", "badRequest" or "failures". In fact, the command no longer get's executed (we can view this on other dashboards such as RestClient metrics). Suggestion: it would be useful for debugging, if the circuit "open" and "close" events can be logged on HystrixEventNotifier.markEvent(...) For now, we're relaxing the configuration, downdgrading to 1.5.11 and monitor if the issue persists. |
|
Update: with the downgrade we no longer experience the circuit blocked on OPEN state. However, randomly, when a backend service begin to fail, once the circuit transitions to CLOSE, the behavior changes and the circuit begins to OPEN and CLOSE repeatedly (in a "twitchy" state, as if the stats were not reseted). We have to re-deploy the servers for the circuits to keep CLOSED with the same backend error rate. There are a lot of opened issues and PRs addressing similar issues. Are there some guidelines (besides the documentation) about safe configuration values for Circuit Breakers? |
|
I've seen this problem occur as well and downgrading to |
|
I'm also having this problem in production with version
After all that, we had to restart the service to see the circuit closed again. |
|
Probably regression in 1.5.13 caused by a820344#diff-82a974c5de99c7b7fa59df2c2b823ae1R385 |
|
Is there a solution for this issue? |
|
Sound like this #1640 Are there any maintainers looking into this? |
|
I have the same issue. I would like to see the circuit breaker closed again. Will that feature be implemented? |
|
@davidvara Just let you know, I decide to downgrade to 1.5.11 (1.5.12 introduce a big refactor of circuit) and see what happens. |
|
We ran into this. This is a very serious issue. It basically is a showstopper for using Hystrix. Hopefully it will be addressed soon (or the bad code backed out). |
|
+1 |
|
Update. after downgrading to 1.5.11, I haven't seen this issue so far. |
|
Does anyone know if this is fixed under the last release, 1.5.18 of 16 Nov 2018? So far, it seems the best approach is to downgrade to 1.5.11, isn't it? |
|
1.5.11 was rereleased as 1.5.18, so they're the same: https://github.com/Netflix/Hystrix/releases/tag/v1.5.18 And Hystrix is no longer in active development after this release: https://github.com/Netflix/Hystrix#hystrix-status |
|
Hello, We are using hystrix-core:jar:1.5.6 and experiencing the same issue. Error: com.netflix.hystrix.exception.HystrixRuntimeException:xx.xx short-circuited and fallback disabled It doesn't look like there was a problem with 1.5.12. does it ? This problem is with older version of hystrix too. |

We've had the following problem occur three times in about a month:
We're running a service on two nodes. A case occurred where one node's circuit breaker properly closed, and one remained open, while both nodes are talking to the exact same remote service:
At the same time, the circuit breaker for a call to another endpoint on that same remote service remained open, with no requests at all bypassing the circuit breaker:
We have
hystrix.command.default.circuitBreaker.sleepWindowInMillisecondsset to1000, which I would expect to let one request per second bypass the circuit breaker.I've tried to reproduce this in an integration test, where I simulated the remote service timing out/producing errors, and every single time the circuit breaker successfully opens and closes. So unfortunately at this moment I am unable to provide an exact reproduction path.
Some other information
The text was updated successfully, but these errors were encountered: