Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangfire Job starts multiple times--Same issue is experienced as #590 #1025

Closed
YukaAn opened this issue Oct 17, 2017 · 58 comments
Closed

Hangfire Job starts multiple times--Same issue is experienced as #590 #1025

YukaAn opened this issue Oct 17, 2017 · 58 comments

Comments

@YukaAn
Copy link

YukaAn commented Oct 17, 2017

I have gone through all the open issues here and found that the issue Im experiencing supposed to be solved with v1.5.8. But Im running v1.6.6 and still seeing the similar issue. So the same job will be processed multiple times randomly. I also saw issue #842 describing the same thing. Can someone help me to fix it?

I'm using Hangfire.SqlServer V1.6.6

@YukaAn
Copy link
Author

YukaAn commented Oct 17, 2017

some instances of this issue:

multipleprocessing

@odinserj
Copy link
Member

@YukaAn, what storage are you using? Could you include the full details, e.g. the data column?

@YukaAn
Copy link
Author

YukaAn commented Oct 18, 2017

@odinserj I'm using sql server 2008, Hangfire version is 1.6.6. Below is a screenshot of a job state info:
ss

@odinserj
Copy link
Member

Could you show me your configuration logic and recurring method's signature?

@YukaAn
Copy link
Author

YukaAn commented Oct 18, 2017

@odinserj Thank you for the quick response!

The configuration logic looks like this:

public void Configuration(IAppBuilder app) {
            GlobalConfiguration.Configuration.UseSqlServerStorage("HangfireDb");
            app.UseHangfireDashboard();
            app.UseHangfireServer();
        }

the recurring methods like this:

public void CreateRecurringJob(int hour, int minute, int Id, string Name, string occurence)
        {
            try
            {
                if(!MinuteCheck(minute) || !HourCheck(hour) || !CronCheck(occurence))
                {
                    return;
                }
                string cron = BuildCron(hour, minute, occurence);
                if(IsExistingOrNewMethod(Id, Name))
                {
                    ScheduledJobHandler handler = new ScheduledJobHandler();
                    RecurringJob.AddOrUpdate(
                        JobNameBuilder(Id, Name),
                        () => handler.SendRequest(Id, Name),
                        cron,
                        TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time")
                    );
                }
            }
            catch(Exception ex)
            {
                throw ApiException(ex);
            }
        }

@YukaAn
Copy link
Author

YukaAn commented Oct 20, 2017

@odinserj I have around 70 recurring jobs scheduled each day and this issue keeps happening couple times every day (randomly on different jobs). I'm still waiting for your reply and I appreciate your help. Thanks!

@odinserj
Copy link
Member

odinserj commented Nov 7, 2017

@YukaAn, sorry for the delay. Try to upgrade to the latest version. At least Hangfire.Core 1.6.12 has a fix related to a problem like yours:

• Fixed – Buggy state filters may cause background job to be infinitely retried.

Looks like there's a transient exception that occur when your job is completed, and only logging could help to investigate the issue in detail. Please see this article to learn how to enable it, and feel free to post your log messages into this thread to conduct a further investigation.

@pinual
Copy link

pinual commented Aug 15, 2018

I am having this same issue on Hangfire 1.6.20 and using LiteDBStorage. I have seen several other reports of this issue but no resolution. Are you still using a work around?

@mattxo
Copy link

mattxo commented Aug 16, 2018

Same issue here. Hangfire 1.6.20 and Hangfire.SQLite 1.4.2

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
  ...
  RecurringJob.AddOrUpdate("debug", () => Hangfire(), Cron.Minutely);
}

public void Hangfire()
{
   Debug.WriteLine($"{DateTime.Now} - Hangfire");
}

16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire

@marius-stanescu-archive360
Copy link

marius-stanescu-archive360 commented Sep 5, 2018

Hello. I have the same issue.

The job starts for every worker. If I set the server worker count to 2 then it starts 2 times, if I set it to 50 then it starts 50 times.

I use the latest Hangfire version (1.6.20) and SQLite for storage.

The job is enqueued from the web application.

BackgroundJob.Enqueue(() => StartDatabaseExport(databaseId));

The server is started from another application (windows service).

var options = new BackgroundJobServerOptions { WorkerCount = 50 };
new BackgroundJobServer(options);

image

Any ideas?

@marius-stanescu-archive360

I also tried with the LiteDB storage, same problem. I then tried with the in-memory storage and it works as expected. So it seems it's related to the storage.

@sheburdos
Copy link

Hangfire 1.6.17.0
MemoryStorage: 1.5.1.0

Tasks are created as:

IState state = new EnqueuedState(QueueName.PRIORITY);
_jobClient.Create(() => TaskFactory.Build(id), state);

Sometimes jobs run multiple times. Log from within the task:

2018-09-12 11:45:43.7292|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Export calculation unit 2018-09-12 12:16:14.7399|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Export calculation unit
...
2018-09-12 12:27:51.2235|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Done!
...
2018-09-12 12:51:39.4573|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Done!

Job do access to database. So it could stuck for some time if all db connections from the pool are taken by other jobs. That what happened at the beginning I assume. Then the task was run again, but there were no reports on retry\error. And actually after that, task reported twice about successful completion (as well as about intermediate steps). Feels like the job was retried after some waiting period without canceling previous evaluation.

@pauldotknopf
Copy link

Although your messages should be idempotent, this should definitely be fixed in Hangfire.

Is this issue in Hangfire itself, or in the storage providers?

@lukaszgatnicki
Copy link

Hi,
We have the same problem.
We tried to find a solution and after a long time we probably managed somethink.
The problem is associated with storage beyond any doubt. Mulitiple wrokers run jobs when you use LiteDB, SQLite and similar storages. Everythink is ok with SQL Server. So if it was possible to fix the error in SQL Server Storage it could be possible in others. So this is my shy request to the creators

@pauldotknopf
Copy link

I think I found the issue in the Sqlite provider: mobydi/Hangfire.Sqlite#2 (comment)

Maybe this issue is something similar for the SQL Server provider as well?

@Neonkiller
Copy link

Neonkiller commented Dec 12, 2018

Experiencing the same issue. We use SQL Server as a storage (if it matters).
Any updated when it could be fixed, or at least if root cause is known?
3processings

@srusakov
Copy link

srusakov commented Dec 20, 2018

Experiencing the same issue. LiteDb as a storage. As a temporary solution I set WrokerCount to 1.
@odinserj Is Pro version is free of this bug?

@sheburdos
Copy link

sheburdos commented Dec 21, 2018

In my case some workaround was to set extended intervals for MemoryStorage

MemoryStorageOptions storageOpts = new MemoryStorageOptions()
{
    JobExpirationCheckInterval =  TimeSpan.FromMinutes(120),
    FetchNextJobTimeout = TimeSpan.FromMinutes(120)
};
GlobalConfiguration.Configuration.UseMemoryStorage(storageOpts);

But that works more or less till the task can be done in 2 hours time span. In my case I can be sure that at least most of the tasks will be accomplished

@wtfuii
Copy link

wtfuii commented Jan 9, 2019

Hey, it seems as if #1197 is about the same issue. Everybody who runs into this issue might want to check it out.

@dgioulakis
Copy link

I'm prototyping with Hangfire and MemoryStorage and seeing my job being executed multiple times. Something as simple as the following:

_jobId = _jobClient.Enqueue<MyJobPerformer>(mjp => mjp.Perform(request));
public async Task Perform(RequestBase request)
{
    await Task.Delay(TimeSpan.FromSeconds(5));
    await Task.Delay(TimeSpan.FromSeconds(5));
    await Task.Delay(TimeSpan.FromSeconds(5));
}

config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) }); does not seem to be a valid solution for this problem. The default TimeSpan for FetchNextJobTimeout is 30 minutes. I'm seeing multiple calls to execute my job on concurrent workers within seconds. Does anyone have a solution to this issue?

@pauldotknopf
Copy link

This is a very big issue. I've seen this with the SQLite and Postgres storage.

I haven't seen this with the in in-memory provider, likely because it has distributed locking implemented properly.

@odinserj, there likely needs to be clearer documentation to storage authors about how distributed locks should be implemented to prevent multiple tasks from being executed.

@dgioulakis
Copy link

dgioulakis commented Mar 17, 2019

@pauldotknopf
I'm not sure what's going on. I'm using the memory storage provider and have breakpointed each await Task.Delay shown above. All breakpoints are hit multiple times by several workers. Only 1 job has been enqueued.

@pauldotknopf
Copy link

Hmm, that seems like an easy repro.

Considering that issue has been open since 2017, someone (not the maintainers) will likely have to debug/fix/contribute a PR.

@dgioulakis
Copy link

config.UseMemoryStorage(
    new MemoryStorageOptions
    {
        FetchNextJobTimeout = TimeSpan.FromSeconds(10)
    });
public class MyJobPerformer
{
    private readonly string _performerId;
    public MyJobPerformer()
    {
        _performerId = Guid.NewGuid().ToString("N");
    }

    public async Task Perform(RequestBase request)
    {
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
    }
}

Results

69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:20 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:25 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:30 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:32 PM <<< Duplicate job execution! 10 seconds lapsed.
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:35 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:37 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:42 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:47 PM

One job enqueue results in multiple worker executions. It's even worse if FetchNextJobTimeout is reduced further. I'm not sure whether this is a problem with Hangfire, MemoryStorage, or both.

@rajanagayya
Copy link

RecurringJob.AddOrUpdate(recurringJobId, () => EmailReceiveService.SendMail(parametes),cronExpression);

@pauldotknopf
Copy link

Hey @odinserj, and update on this issue?

@jmalStorm
Copy link

+1.
Having same issue. SQLite storage.

@bchornii
Copy link

+1.
Having the same issue with MSSQL Server storage.

@rafaelboschini
Copy link

Having same issue. OMG!!!!!!

@cnayan
Copy link

cnayan commented Mar 3, 2020

I use ASP.Net Core, 3.1

OMG! I am having this same issue. I observe:

  • HF creates 2 servers for me
  • One server has my options, other has default
  • I saw my job running twice!

So, putting 2 and 2 together, I felt that the reason for execution of job twice was because there are 2 servers. That means, I need to get rid of the second server with default options.

I downloaded the HF code and placed breakpoints to analyze the flow. Here are my findings:

  • If you want HF to honor your BackgroundJobServerOptions, create a singleton instance of the object and don't pass it to AddHF or UseHF. Example:
services.AddSingleton(new BackgroundJobServerOptions
            {
                WorkerCount = 1,
                ServerName = "TaskSvcHangfireServer"
            });

After this, I still had 2 servers, but both using my BackgroundJobServerOptions. Halfway there!

After debugging few times, I found this:

  • I had services.AddHangfireServer(); in ConfigureServices
  • I had app.UseHangfireServer(); in Configure

The servers are created by both (CreateBackgroundJobServerHostedService > BackgroundProcessingServer > BackgroundDispatcher > BackgroundServerProcess.Execute > CreateServer), unlike other libs, where AddHangfireServer should configure, and UseHangfireServer should create instances using configs (I think.)

Then I checked the documentation, and they did not specify to use both. I removed services.AddHangfireServer() and now I have only 1 server.

So, I tested the execution of a job. This time, it executed only once.

Lessons learnt:

  1. Inject BackgroundJobServerOptions in services. Prefer this if you want only one option to prevail.
  2. Don't use AddHangfireServer and UseHangfireServer together, as it seems they do similar job - create server instances. (IMHO, this should be a considered bug.)

@cnayan
Copy link

cnayan commented Mar 4, 2020

Sigh! I'm seeing multiple "processing" jobs still, with same job ID.

image

No idea why.

BTW: My service is hosted in IIS.

@pauldotknopf
Copy link

If you want this issue fixed, you will have to do it yourself.

@cnayan
Copy link

cnayan commented Mar 4, 2020

Well, I have placed some checks now in my JobWorker to avoid the multiple job (with same ID) execution requests. This hack helps.

@pieceofsummer
Copy link
Contributor

@cnayan,

Don't use AddHangfireServer and UseHangfireServer together, as it seems they do similar job - create server instances.

Indeed. The former uses IHostedService-based implementation (available only in netstandard2.0 and later) while the latter uses the more generic approach. Technically, you can have as many servers running as you want, so it shouldn't be considered a bug. But it is definitely something worth noting.

As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".

Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See IBackgroundJobClient.ContinueJobWith() extension method.

Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.

@cnayan
Copy link

cnayan commented Mar 5, 2020

@pieceofsummer
Thank you for guidance.

My problem with the Hangfire documentation is that there are no clear sections that I can focus on OWIN/ASP.Net Core. You may disagree, but it is how I see and read it. Probably I am spolied by MSDN docs.

So, your precise advice is of great help to me.

As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".

After reading Making ASP.NET Core application always running on IIS, and understanding the screenshots, I've configured the IIS. But no code changes have been done to ASP.Net Core app.

Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See IBackgroundJobClient.ContinueJobWith() extension method.

I've a simple function to execute, thus it cannot be split. This point, probably, is not for me.

Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.

Good point. But, I execute a console app via job, and I am happy that it does not get killed. But your point makes sense - to abort job when requested.

Thanks again!

@ZheMann
Copy link

ZheMann commented Sep 3, 2020

I started implementing HangFire with SQLite Storage a couple of days ago and ran into the same problem: enqueued jobs were executed as many times as I had workers initialized (20 by default). I found the solution by changing SQLite storage with HangFire.LiteDB storage. In the release notes they specifically mention 'Fix Hangfire Job starts multiple times' so I thought I'd give it a try. It turns out that they indeed solved the problem; my jobs are finally getting executed only once through which I don't need hacky workarounds anymore.

So, unless you really need SQLite as storage, I'd suggest switching to HangFire.LiteDb.

Example code:

var hangFireDb = @"D:\hangfire.db";
GlobalConfiguration.Configuration.UseLiteDbStorage(hangFireDb);
GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute { Attempts = 3 });
app.UseHangfireDashboard();
app.UseHangfireServer();

Cheers!

@aria321
Copy link

aria321 commented Oct 4, 2020

Same problem for me, I think this is depend on server reset, when I recycle IIS app pool manually, there are created same jobs when app pool get started again,

image

every time IIS app pool get restarted then duplicated jobs increased more and more.
I accidently resolved this just by changing the signature of the method,
My method was like :
public virtual void Do(string title, T order, PerformContext context)
I was looking to handle deleted job in Dashboard and need to stop job (deleted jobs just moved to Deleted they even works after deleting), so based on this link https://discuss.hangfire.io/t/deleting-job-with-onstateelection-cancellation-token/2602/5 I changed the signature of the method just like:
public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken) and call ThrowIfCancellationRequested to cancel deleted job in method block like
cancellationToken?.ThrowIfCancellationRequested(); if user delete job manually in Dashboard then cancellationToken raised the exception and job will really canceled. But interested things is that by debugging and set pointer in method, Do called multi time when I hosted app and going to debug the method like restarting server, but all jobs get canceled(I don't know why, maybe HF set cancellation token) except one in the other words the cancellationToken raised for all except one.
Now I see there are no any multiple jobs by restarting IIS app pool just because of ThrowIfCancellationRequested().

@odinserj
Copy link
Member

odinserj commented Oct 5, 2020

What storage are you using? If it's a community-based storage, then it's possible that FetchNextJob method wasn't implemented in an atomic way, and it is possible for multiple workers to pick up the same job. Please check the repository of the concrete storage implementation (you can find it there) and report the issue there.

@aria321
Copy link

aria321 commented Oct 5, 2020

I am using Redis Storage like :

 GlobalConfiguration.Configuration.UseRedisStorage("localhost",
          new Hangfire.Pro.Redis.RedisStorageOptions()
           {
                 InvisibilityTimeout = TimeSpan.MaxValue,
                 Database = 1,
                 Prefix = "hangfire:reclaim:",
           }).UseConsole();
WebApp.Start<MARCO.Reclaim.Core.Startup>(address);

and Startup Configuration method:

appBuilder.UseWebApi(config); 
appBuilder.UseHangfireDashboard("", new DashboardOptions()); 
appBuilder.UseHangfireServer(new BackgroundJobServerOptions
{
    ServerName = $"sendbulk",
    WorkerCount = 100,
    Queues = new[] { "sendbulk" }
});

the signature of method and it's override is like:
public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)

override in derived class:

[DisplayName("{0}")]
[Queue("sendbulk")]
[AutomaticRetry(Attempts = 5, DelaysInSeconds = new int[] { 60, 60 * 3, 60 * 3 * 3, 60 * 3 * 3 * 3, 60 * 3 * 3 * 3 * 3 })]
public override void Do(string title, SendBulkOrder order, PerformContext context, IJobCancellationToken cancellationToken)

and Startup will call in WCF single instance service under IIS app pool.

@odinserj
Copy link
Member

odinserj commented Oct 5, 2020 via email

@odinserj
Copy link
Member

Because almost each problem ends up either with "Enqueued jobs stuck" or with "Job starts multiple times", and different problems with different storages were reported into the same issue on GitHub. I'm really sorry you have so much troubles, please try to run everything with Hangfire.SqlServer, Hangfire.Pro.Redis or Hangfire.InMemory – these storages are supported in this repository, and other storages are supported by community in their own repositories.

@giovinazzo-kevin
Copy link

Sorry for my shitty reply. I was having a bad day at work, but I realize that I shouldn't bring my personal problems into these spaces.

I'll just delete the old comment, it wasn't appropriate of me. Sorry again.

@navidyazdi
Copy link

no solution yet?

@dchrno
Copy link

dchrno commented Nov 7, 2021

I'm experiencing the same issue. Hangfire 1.7.9 with Hangfire.InMemory. A recurring task configured with Cron.Daily(0, 30). We specify our local timezone when invoking RecurringJob.AddOrUpdate.

This job is a long-running task which runs until 5 am. It was started as expected at 00:30, and then started a second time at 01:00 the same night while the first instance was running.

I have checked that the process didn't restart between 00:30 and 01:00.

Update 16.11.2021:
Here's an example where the same job was triggered several times with 30 minute intervals. The instance triggered at 23:30 completed at 02:20, so it appears Hangfire keeps starting the job when it's already running but not completed.

2021-11-14 23:30:06
2021-11-15 00:00:06
2021-11-15 00:30:06
2021-11-15 01:00:06
2021-11-15 01:30:06
2021-11-15 02:00:06

@devenpatel30
Copy link

This can happen if you have multiple servers (apps) using the same storage. Configure Hangfire to use different database schema for each application.

@dchrno
Copy link

dchrno commented Feb 16, 2022

We have reproduced this problem on a single server configured to use the default memory storage.

@devenpatel30
Copy link

Do you flush storage each time you deploy app change?

@dchrno
Copy link

dchrno commented Feb 16, 2022

Flush the memory storage when the application changes? MemoryStorage is in-process and cleared when we restart the process after binary updates.

@odinserj
Copy link
Member

Try using this filter to avoid scheduling a new recurring job execution when previous one is still running – https://gist.github.com/odinserj/a6ad7ba6686076c9b9b2e03fcf6bf74e.

@CrommVardek
Copy link

@dchrno : Did you fix this issue ? How ? We are facing a very similar (if not identical) issue with a recurring background job...

@dchrno
Copy link

dchrno commented Feb 23, 2023

Unfortunately, no. We ended up adding a mutex per task as a workaround.

Edit: we have a base class containing this code. "key" is the type name of the recurring task.

            if (!_semaphore[key].Wait(100)) // try to aquire mutex
            {
                log.Debug("{description}: job is already running", description);
                return;
            }

@CrommVardek
Copy link

@dchrno : Thanks for the feedback

@haga2112
Copy link

haga2112 commented Apr 14, 2023

This issue still happens in version 1.7.25 and memory storage 1.4.

Edit: I was using Hangfire.MemoryStorage which is NOT the ideal package (and it's creators also said it is not good for production purposes). I updated my main package to Hangfire.Core 1.8.5 and started using Hangfire.InMemory package from @odinserj . But anyways I had to implement a Mutex in C# to handle this multiple starts scenario.

@laki889
Copy link

laki889 commented Oct 5, 2023

This is still an issue for version 1.7.28 using the MS SQL database as storage

@panossant
Copy link

This is still an issue with the latest version as of today, paired with postregsql. In our case we have a machine learning task that can run for days. Those long running jobs start running again after 48 hours and it is always at midnight. Even if we fire the job lets say at noon 2 days ago, after two midnights pass, it starts again and runs concurrently, with its previous incarnation. We tried to remove the schedule and fire them from the dashboard by hand, it still behaves the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests