Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

Closed
Illapikov opened this issue Jul 20, 2015 · 10 comments
Closed

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

Illapikov opened this issue Jul 20, 2015 · 10 comments

Comments

@Illapikov
Copy link

Hello,

Agenda:
I have Hystrix command, which shouldn't allow more than 10 concurrent requests. At the same time, I want to add timeout to that requests.

Issue:
If I have property (executionTimeoutInMilliseconds) set up to 500 and request lasts 5000 - I will have execution lasts more than 5000. And only then request will be marked as timed out.

Question:
Is there any other ways to do all that (limit concurrent requests and have a timeout interruption at same time) ?

Thanks, Sergey

@mattrjacobs
Copy link
Contributor

In production at Netflix, almost all commands run using THREAD isolation: https://github.com/Netflix/Hystrix/wiki/Configuration#thread-or-semaphore.

This allows for both concurrent restrictions (via sizing the thread pool appropriately) and the ability to interrupt the thread on timeout. Then it's up to the implementation of the HystrixCommand to respect the interruption appropriately.

@rikcarve
Copy link

The above answer does not satisfy our needs. We do have the requirement of using semaphore and having a timeout set. But the timeout (as stated in the issue) does not interrupt the execute() command and therefore does not work for us. I'm wondering if this is something that is going to be implemented?

"Timeouts now apply to semaphore-isolated commands as well as thread-isolated commands. Before 1.4.x, semaphore-isolated commands could not timeout. They now have a timeout registered on another (HystrixTimer) thread, which triggers the timeout flow. If you use semaphore-isolated commands, they will now see timeouts"

This apparently isn't fully correct...

@mattrjacobs
Copy link
Contributor

@rikcarve If you're using a semaphore-isolated command, then the caller thread runs the HystrixCommand/HystrixObservableCommand. Since this thread is doing arbitrary work outside of Hystrix, interrupting it has consequences outside of Hystrix (which we don't want).

@mdjnewman
Copy link

@mattrjacobs that makes sense, but does this mean the paragraph that @rikcarve posted above (from the change log and migration guide) is incorrect?

I see in logs that the fallback is called after the execution timeoutInMilliseconds is reached, but commands that are triggered with .execute() don't return at that point. Does the above paragraph only refer to the fallback being called and the timeout being recorded in metrics calculation?

@mattrjacobs
Copy link
Contributor

I wrote a unit test that I believe demonstrates the case you're asking about here: mattrjacobs@de6a194.

In this unit test, the SEMAPHORE command sleeps for 2000ms, and has a timeout configured for 100ms. The test asserts that the timeout fires, the command hits fallback and returns the fallback result

@mdjnewman
Copy link

Thanks for looking into this Matt.

I think the command in that test is running with THREAD isolation - after the change in mdjnewman@83e42b3209052c1e29119e076484668b1ecbbbff, the test fails with

Unsuccessful Execution and fallback took : 2036

java.lang.AssertionError
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertTrue(Assert.java:54)
    at com.netflix.hystrix.HystrixCommandTest.testSemaphoreExecutionWithTimeoutAndFallback(HystrixCommandTest.java:3552)
    ...

This makes sense given that Hystrix won't interrupt threads it doesn't control, but I think the confusion (or my confusion at least) was caused by the wording in the documentation.

Sorry if I've missed anything, I'm relatively new to the internals of Hystrix.

@mattrjacobs
Copy link
Contributor

Yeah, my mistake on the unit tests - thanks for pointing that out. The confusion lies in the distinction between HystrixCommand and HystrixObservableCommand.

In a HystrixCommand, a semaphore-isolated command runs directly on the caller thread, and so there is, as you correctly pointed out, no good way to interrupt that work without possibly disrupting other application-level work.

When that same HystrixCommand uses thread-isolation instead, that thread may be interrupted without any danger to the application threads, so a timeout attempts a forcible interrupt (which is configurable).

On the other hand, when a HystrixObservableCommand is used, an Observable is provided in the construct() method. This implies that the work if there is I/O work, it's already happening somewhere else (and not in the thread running the construct() method).

In this case, there's not much point to using thread-isolation (see #805 (comment)). When using semaphore-isolation here, it is OK to attempt to unsubscribe from the work being done. See HystrixObservableCommandTest.testSemaphoreIsolatedObserveTimeoutWithSuccessAndFallback for an example of how this works.

I agree that the docs don't have this level of nuance. I'll take a pass and get that cleaned up.

@billyy
Copy link

billyy commented Sep 7, 2016

Matt,

I just saw this thread and confirm my finding in a separate issue. If I would replay it, it means that if the timeout is set to 1 second and the call actually took 5 seconds. The time for the call to return is really 5 second, not 1 second. I understand what you stated about not wanting to interrupt the caller thread. So can we just simply state that there is no thread interrupt for semaphore?

@mattrjacobs
Copy link
Contributor

Yep - you got it :)

@bhuvangu
Copy link

If one want to interupt the user thread and want to use semaphore, then i am using Timer to register a timer task, which calls the interupt and if actual task gets finish i call the cancel on the returned TimerTask
is it a good idea? what can be the hidden caveats?

public TimerTask registerInterupt(Thread currentThread, int executionTimeoutInMilliseconds) {
        TimerTask tmerTask = new TimerTask() {
            @Override
            public void run() {
                if(!currentThread.isInterrupted()){
                    currentThread.interrupt();
                }
            }
        };
        timer.schedule(tmerTask, executionTimeoutInMilliseconds);
        return tmerTask;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants