Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancellation tokens never firing #2026

Closed
dazbradbury opened this issue Apr 20, 2022 · 10 comments
Closed

Cancellation tokens never firing #2026

dazbradbury opened this issue Apr 20, 2022 · 10 comments

Comments

@dazbradbury
Copy link

dazbradbury commented Apr 20, 2022

Thanks for the brilliant library, as always!

We are currently running Hangfire version 1.7.24. We're facing an issue trying to propagate cancellation tokens throughout our solution, in that, they don't seem to be firing/working as intended. We are following instructions from here:

https://docs.hangfire.io/en/latest/background-methods/using-cancellation-tokens.html

We set up a super simple test case as follows:

public ActionResult AddLongRunningJobToTestCancellationTokens()
{
	_bjs.Enqueue(() => LongRunningJobToTestCancellationTokensAsync(CancellationToken.None));
	return Content("Job added to hangfire");
}

[BackgroundJob]
public static async Task LongRunningJobToTestCancellationTokensAsync(CancellationToken token)
{
	while (true)
	{
		token.ThrowIfCancellationRequested();

		// Wait 5 seconds
		await Task.Delay(5_000);
	}
}

However, no matter whether we kill the server using: server.Dispose();, or we delete the job via the dashboard, it seems the cancellation token is never called:

image

We have ShutdownTimeout/StopTimeout set to 30 seconds, and I can confirm the jobs are given 30 seconds before being forcibly shut down. We switched the implementation to use the legacy IJobCancellationToken, but that also doesn't seem to work for us either.

From the documentation:

Dedicated background process is watching for all the current background jobs that have a cancellation token parameter in a method and polls the storage to watch their current states. When state change is detected or when shutdown is requested, the corresponding cancellation token is canceled.

Is there a way of monitoring this background process to see if it's running?

@dazbradbury dazbradbury changed the title Cancellation tokens never firing, but IJobCancellationToken works fine Cancellation tokens never firing Apr 20, 2022
@odinserj
Copy link
Member

Thank you for the detailed information! Any chance you are using .NET Core / ASP.NET Core? In this case it is possible that .NET Core's host timeout (defaults to 15 seconds) is triggered before Hangfire's ones, that's why you don't get the notifications. You can configure it in the following way (set it to be higher than timeouts in Hangfire):

services.Configure<HostOptions>(
    opts => opts.ShutdownTimeout = TimeSpan.FromSeconds(15));

You can read this excellent article – https://andrewlock.net/extending-the-shutdown-timeout-setting-to-ensure-graceful-ihostedservice-shutdown/ – to understand the details.

P. S. By the way, I've seen really dangerous behavior of the while(true) { token.ThrowIf... statement, where delayed AppDomain unload caused infinite "Exception → new loop iteration" attempts without exiting the loop, but this only relates to .NET Framework and didn't occur in .NET Core.

while (!token.IsCancellationRequested)
{
    await Task.Delay(5000, token);
}
token.ThrowIfCancellationRequested();

@dazbradbury
Copy link
Author

dazbradbury commented Apr 28, 2022

Thanks @odinserj for the response - this example isn't actually running on .NET Core, it's running .NET Framework 4.8. We do have other code that runs on .NET Core though, so I'll bear that in mind!

Is there anything else we could be missing in the .NET Framework world?

I've update the loop behaviour also, thanks for pointing that out. I also tested using an IJobCancellationToken, which doesn't have an IsCancellationRequested property by the way, but that didn't work either.

As it stands, it just seems that no cancellation tokens are actually being cancelled. :-(

@odinserj
Copy link
Member

odinserj commented May 2, 2022

Hm, may I ask you a stupid question – why do you think that cancellation token isn't called? Actually there's a difference between the behavior – during shutdown the cancellation token is fired, but background job itself is being re-queued (so it will be executed again as in the first screenshot related to job transitions). And when background job is moved to another state during the execution (for example, deleted state), cancellation token is fired but for another reason. So in short, cancellation token trigger doesn't relate to state transitions of a background job.

Try adding some logic to ensure that cancellation token isn't triggered, because Dashboard UI can't say anything regarding the method execution.

public async Task LongRunningJobToTestCancellationTokensAsync(CancellationToken token)
{
    try
    {
        while (true)
        {
            token.ThrowIfCancellationRequested();

            // Wait 5 seconds
            await Task.Delay(5_000, token);
        }
    }
    catch (Exception e)
    {
        _logger.LogError(e, "error");
        throw;
    }
}

@dazbradbury
Copy link
Author

OK, that's actually produced some interesting results. I'll explain our setup:

On Application Startup:

We call var server = new BackgroundJobServer(options), and keep a reference the the returned BackgroundJobServer

On Application Shutdown

  1. server.Dispose();
  2. Wait 25 seconds
  3. If server hasn't "disposed", then check what is holding things up using JobStorage.Current.GetMonitoringApi().ProcessingJobs(0, 10);
  4. Wait 5 seconds
  5. Call Stop-Process to kill the process forcibly

Our server is set up with the following timeouts:

var options = new BackgroundJobServerOptions()
{
	ShutdownTimeout = TimeSpan.FromSeconds(27),
	StopTimeout = TimeSpan.FromSeconds(27),
	CancellationCheckInterval = TimeSpan.FromSeconds(5), // Default value is apparently 5 seconds
};

In the logs we see:

  • Step 1 -> Step 3: takes 25 seconds and LongRunningJobToTestCancellationTokensAsync is in the processing jobs list
  • It takes 27 seconds for server.Dispose(); to return
  • We do log the OperationCanceledException, but this happens after server.Dispose(); has returned (around 27.5 seconds)

What I'm clearly not understanding is that I'd expect the cancellation token to throw as soon as we call server.Dispose(), that doesn't seem to be happening, and instead, only being cancelled after 27 seconds (the ShutdownTimeout/StopTimeout)?

As an aside - why doesn't the OperationCanceledException appear in the hangfire job details, the same as any other exception? Is that particular exception not displayed for any particular reason? Hiding the token cancellation event is of course up to you, it just makes it hard to see the order of events.

@odinserj
Copy link
Member

odinserj commented May 4, 2022

Thank you for all the details! To make cancellation tokens throw as soon as possible (or rather depending on CancellationCheckInterval) try setting the StopTimeout to TimeSpan.Zero – it's usually used to avoid cancellation token from being thrown just after shutdown is initiated, to give some time for a background job to be completed (to avoid the unnecessary retry).

As an aside - why doesn't the OperationCanceledException appear in the hangfire job details, the same as any other exception? Is that particular exception not displayed for any particular reason? Hiding the token cancellation event is of course up to you, it just makes it hard to see the order of events.

The reason is avoid giving false expectations – it's special cased only when shutdown token is called or when job's state was already changed. In the latter case we don't have a right to change it's state because it was changed externally, and in the former case there's a high chance that our server will be stopped and application will be stopped much earlier than state is updated in the database.

@dazbradbury
Copy link
Author

To make cancellation tokens throw as soon as possible (or rather depending on CancellationCheckInterval) try setting the StopTimeout to TimeSpan.Zero – it's usually used to avoid cancellation token from being thrown just after shutdown is initiated, to give some time for a background job to be completed (to avoid the unnecessary retry).

Thanks as always! I've done that, and the behaviour is now to throw the cancellation token straight away - great!

It's also important to know which jobs were forcibly shutdown (eg. token.ThrowIfCancellationRequested(); wasn't called within the allotted time). I had hoped to use:

JobStorage.Current.GetMonitoringApi().ProcessingJobs(0, 10);

But that seems to display jobs both where the cancellation token has been called, and where it hasn't been called. Is there a mechanism I can use to check, after say 25 seconds, what jobs have failed to be shutdown gracefully?

@odinserj
Copy link
Member

odinserj commented May 4, 2022

Great cancellation tokens are thrown now! If background job is able to acknowledge cancellation token (looks like as in your case), then you should get the following log message that includes background job id:

Worker stop requested while processing background job 'XXX'. It will be re-queued.

Can it be used in your case?

@dazbradbury
Copy link
Author

Thanks @odinserj! Sadly I think we're going to need slightly more granular control though.

We have some jobs that we definitely require graceful shutdown for (and thus want to be alerted that they need looking into), this would be the default. However, we have other jobs where a forced shutdown is acceptable/expected due to their nature, and so we'd want to ignore any forced shutdowns on those particular jobs. These jobs are idempotent, with slow-running sections that rely on external resources (so controlling with a cancellation token is particularly complex, and adds little value).

Maybe we can wrap the Hangfire logging to achieve this though? That does seem a bit hacky.

Being able to obtain a list of actually running jobs would be most ideal though, where jobs that have thrown the OperationCanceledException don't appear, or are able to be filtered. If that's possible, it would be really powerful for us.

@dazbradbury
Copy link
Author

@odinserj - OK, what I've done is to give server.Dispose(); time to complete. If it doesn't, will then alert with any jobs that are "running", but refer to the hangfire log messages for the exact job that was forcibly killed. Whilst not quite as neat, it should work!

Thanks for your help, I'll just run some tests to confirm and then get this issue closed off. Hopefully it helps any others looking into the nuances of cancellation tokens. :-)

@dazbradbury
Copy link
Author

I've created a new issue for the more nuanced question around finding/filtering cancelled jobs - and can close this off.

For future readers, the problem was not fully understanding the StopTimeout, which was delaying any cancellation tokens being fired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants