Manage Timeout in Http connection and have Hystrix report timeout correctly #920

billyy · 2015-10-04T21:44:58Z

Matt,

Last time we discussed the merit of using Http connection timeout vs letting Hystrix to interrupt the thread the timeout. You did point out the disadvantage in this case is that Hystrix will report these issues as Failures instead of timeout. Is there a way to manually tell Hystrix it is a timeout, not failure in the case of client code managing Http Connection and Socket timeout? I went thru the code and it looks like the HystrixTimeoutException is wrapped in a private class. So it is NOT accessible from the client code.

mattrjacobs · 2015-10-05T19:01:49Z

At the moment, the metrics around Timeouts refer specifically to timeouts induced by Hystrix. I'm not sure that adding some HTTP errors to this metric make the system easier to understand or operate.

If Hystrix did as you suggest, how would this help you?

billyy · 2015-10-05T21:29:37Z

For example, I have an external partner with socket timeout of 30 seconds. We like to configure 5 seconds connection timeout so we can "fail faster" in case there is no connection available. In the current setup, Hystrix will count it as failure instead timeout. Logically I like to still see it as timeout so production support can tell the difference and act accordingly.

mattrjacobs · 2015-10-06T04:06:07Z

Right, but consider what would happen in practice if your change was in place. Production support would see a rise in "Hystrix timeout"s, and then have to figure out if they were network timeouts or timeouts induced by Hystrix. To do that, they'd then have to look at whatever monitoring you have set up at your network layer. So I don't believe that you're making anything easier by combining the two timeout sources into a single metric.

billyy · 2015-10-06T04:21:47Z

Ah. Because Hystrix timeout does not necessary equal to socket or connection timeout. Is the definition of Hystrix timeout just simply about Hystrix waited long enough and return control then? If that's the definition, I would agree it does not make sense.

mattrjacobs · 2015-10-06T12:56:24Z

What Hystrix counts as a timeout in metrics today is all command invocations that got executed and then executed for longer than the specified Hystrix timeout value. Hystrix should then attempt the fallback, and then use that as the result of the command invocation.

billyy · 2015-10-06T16:31:36Z

With that definition, I understand why we don't want to lump the client timeout with Hystrix timeout. The reason I brought this up was our previous conversation you made me realized that client connection timeout will go into the Failure bucket, not the timeout bucket. Is there any recommendation around this topic? Because we have ops and dev separation, we like to be able to tell the difference between timeout (Hystrix or client code timeout) and exception. It almost sounds like I need to introduce a fourth metrics. Anyway, any suggestion will be appreciated.

PS: I remembered when I was in the edge team, we never set the client socket timeout.

mattrjacobs · 2015-10-07T16:45:52Z

The way I think about it is to consider Hystrix as application-layer code. The run()/construct() method may execute code that is purely local, to a cache, over the network, but the metrics for any of those cases are the same. Those are latencies and counts of outcomes of the command execution. Then, separately, there are metrics for the other subsystems. Cache clients should have a set of metrics, as should each network connection.

As an example, Ribbon (the Netflix HTTP client library) produces metrics which include client-observed latency, NumCompleted/NumErrors nad then breaks down errors by either status code or by ConnectException/ReadException/etc.

In general, a rise in Hystrix error% does not give me enough information to immediately solve the production issue. It does, however, point me to other metrics which should allow that. But building all of these directly into Hystrix is not a great model.

billyy · 2015-10-07T16:54:26Z

It sounds like the recommendation is to leave it in the Exception bucket and has the next level drill down on the error. It will be good to be able to see all that in one dashboard, and that's why I was wondering if it should be in Hystrix metrics. Anyway, thanks for the explanation. I am good!

billyy · 2015-10-07T17:27:48Z

There is one subtle difference in my situation. Unlike Netflix, We do have separation between ops and Dev. Ops will take care of network issue and dependency issues. Dev is involved with only exceptional case. If there is a way to extend hystrix to group and report this type of connection error separately, that will work for us better. I am thinking more like extension and less of changing the built in behavior to solve for our use case where Dev and ops are separate role.

billyy · 2015-10-07T18:36:53Z

Ops is the first one that get call in our case, not dev. That is really my motivation! I can fork and add the feature I want but then I will lose all the bug fixes and enhancement. If there is way to extend hystrix without forking, that will be the preferred route.

billyy · 2015-10-07T21:35:18Z

If you can make the following class "HystrixTimeoutException" accessible from client code, that is really all I am looking for. Right now it is wrapped in a private class. Anyway, this is really solving for my use case and not necessary applicable for Netflix (Given dev and ops is the same person). Thanks.

private static class HystrixObservableTimeoutOperator<R> implements Operator<R, R> {

    final AbstractCommand<R> originalCommand;

    public HystrixObservableTimeoutOperator(final AbstractCommand<R> originalCommand) {
        this.originalCommand = originalCommand;
    }

    public static class HystrixTimeoutException extends Exception {

        private static final long serialVersionUID = 7460860948388895401L;

    }

billyy · 2015-10-09T16:22:20Z

One more data point. Our external partner has a 30 sec timeout so we want to "fail faster" by setting a five seconds connection timeout. (In Netflix, almost all calls use the default one second so there is lesser reason to have connection vs socket timeout. The action for both hystrix timeout and connection timeout will be the same. Our ops team will reach out to the partner. But if the connection timeout is throwing exception instead, Dev will need to investigate and the end result is also reaching out to the partner.

mattrjacobs · 2015-10-09T22:28:43Z

I think that making the TimeoutException public would allow each class that extends HystrixCommand to use it if they wish, or ignore it. Then it's up to each Hystrix user on what type of exceptions they want to be reported as TIMEOUTs.

I'll work on that and get it in the next release. Thanks for the detailed explanation.

billyy · 2015-10-10T02:29:24Z

Big thanks!!!

mattrjacobs · 2015-10-14T18:23:46Z

Fixed in #931

billyy · 2015-10-15T16:17:28Z

When is the next official maven release?

mattrjacobs · 2015-10-15T16:25:02Z

This is released in 1.4.18: https://github.com/Netflix/Hystrix/releases/tag/v1.4.18

mattrjacobs mentioned this issue Oct 14, 2015

Make HystrixTimeoutException public so that user-defined execution methods may return it #931

Merged

mattrjacobs closed this as completed Oct 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage Timeout in Http connection and have Hystrix report timeout correctly #920

Manage Timeout in Http connection and have Hystrix report timeout correctly #920

billyy commented Oct 4, 2015

mattrjacobs commented Oct 5, 2015

billyy commented Oct 5, 2015

mattrjacobs commented Oct 6, 2015

billyy commented Oct 6, 2015

mattrjacobs commented Oct 6, 2015

billyy commented Oct 6, 2015

mattrjacobs commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 9, 2015

mattrjacobs commented Oct 9, 2015

billyy commented Oct 10, 2015

mattrjacobs commented Oct 14, 2015

billyy commented Oct 15, 2015

mattrjacobs commented Oct 15, 2015

Manage Timeout in Http connection and have Hystrix report timeout correctly #920

Manage Timeout in Http connection and have Hystrix report timeout correctly #920

Comments

billyy commented Oct 4, 2015

mattrjacobs commented Oct 5, 2015

billyy commented Oct 5, 2015

mattrjacobs commented Oct 6, 2015

billyy commented Oct 6, 2015

mattrjacobs commented Oct 6, 2015

billyy commented Oct 6, 2015

mattrjacobs commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 7, 2015

billyy commented Oct 9, 2015

mattrjacobs commented Oct 9, 2015

billyy commented Oct 10, 2015

mattrjacobs commented Oct 14, 2015

billyy commented Oct 15, 2015

mattrjacobs commented Oct 15, 2015