Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of failed jobs after upgrading to ActiveRecord4 #1047

Closed
deckchairhq opened this issue Jul 8, 2013 · 28 comments
Closed

Large number of failed jobs after upgrading to ActiveRecord4 #1047

deckchairhq opened this issue Jul 8, 2013 · 28 comments

Comments

@deckchairhq
Copy link

I'm looking into a potential issue with ActiveRecord4 and Sidekiq.

Since upgrading to AR4, my failure rate shot through the roof - from under 100 failures in 2 weeks to 1000 failures in the first hour after upgrading.

The error presented is this:

ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.001 seconds)

I have an ActiveRecord connection pool of 20. The first 20 jobs go through fine and I can see my connection pool growing, however when it hits 20, the errors start occurring.

It appears that either the connections aren't getting released or checked out appropriately.

If I replace the following line from the ActiveRecord Sidekiq middleware:

::ActiveRecord::Base.clear_active_connections! if defined?(::ActiveRecord)

...with a full disconnect:

::ActiveRecord::Base.clear_all_connections!

All works fine again - though obviously very inefficient.

I'm using a barebones ruby script for my worker that connects to ActiveRecord without rails:

      ::ActiveRecord::Base.establish_connection(
        :adapter  => configuration[:adapter],
        :host     => configuration[:host],
        :username => configuration[:username],
        :password => configuration[:password],
        :database => configuration[:database],
        :pool     => configuration[:pool],
        :port     => configuration[:port],
        :reaping_frequency => 10
      )

      self[:activerecord_cache] = ::ActiveSupport::Cache::MemoryStore.new

It's all very simple and the queue was running fine until now.

I'll plug away at this a bit more and if no luck, will get a skeleton repo up to reproduce it. Any input is appreciated.

@mperham
Copy link
Collaborator

mperham commented Jul 8, 2013

You'll probably need +1 connection for the main thread which boots the system. You don't state your Sidekiq version but I assume 2.12.x. Try doubling your pool to 41 and see if that helps. I've noticed some oddness where sometimes 2x connections are needed. I still don't understand the cause.

@deckchairhq
Copy link
Author

Ah sorry, should of provided more information..

My DB can accept 500 connections.
I'm currently testing with 1 set as the concurrency level and still see the issue.

@mperham
Copy link
Collaborator

mperham commented Jul 8, 2013

Are you spinning off threads in your worker, to do work in parallel?

@deckchairhq
Copy link
Author

In a few of my jobs I do create a thread or two.

@mperham
Copy link
Collaborator

mperham commented Jul 8, 2013

That'll do it - if they touch ActiveRecord, they'll check out a connection and never return it until stale connections are reaped.

@deckchairhq
Copy link
Author

Ah that sounds like the culprit! New AR4 feature I presume?

Best course of action is going to be cleaning up in the thread I take it - is it worth adding any logic to the middleware to help cleanup in such circumstances?

@mperham
Copy link
Collaborator

mperham commented Jul 8, 2013

It shouldn't be new.

Best course is to manually check out a connection for your thread's use:

Thread.new do
  ActiveRecord::Base.connection_pool.with_connection do |conn|
    MyModel.find(123)
  end
end

The find will use the connection you checked out because it's already reserved for the thread. When the with_connection block finishes, the connection will be auto-released.

@mperham mperham closed this as completed Jul 8, 2013
@mperham
Copy link
Collaborator

mperham commented Jul 8, 2013

Maybe clear_active_connections! used to clear stale connections in Rails 3 and doesn't anymore in Rails 4?

@deckchairhq
Copy link
Author

That sounds like it could be it, I'll have a look this AM, but it did start immediately after I updated.

@deckchairhq
Copy link
Author

Just to confirm, wrapping in a block as mentioned above works a treat.. It appears that behaviour has changed for the clear_active_connections! method in the latest version, consider my wrist slapped for being so flippant with AR in Threads.

@aurels
Copy link

aurels commented Nov 6, 2013

Hello,

We had the same issue on Rails 4.0.0. Migrating to 4.0.1 fixes the issue for us.

https://github.com/rails/rails/blob/v4.0.1/activerecord/CHANGELOG.md

At the end of the CHANGELOG :

Connections must be closed at the end of a thread. If not, your connection pool can fill and an exception will be raised.

Aaron Patterson

@rubytastic
Copy link

Also having this issue, also on 4.0.1

If interpret correctly the suggestion is to use

Thread.new do
  ActiveRecord::Base.connection_pool.with_connection do |conn|
    MyModel.find(123)
  end
end

whenever one needs a database connection from inside the worker right?
With many queries this seems cybersome

How does this relate to loading an module into the worker like:

  require 'games/general'
  include Games::General

These modules hold methods that connect to the database (instead directly from the worker).
Would one need to update these with the suggested block for every database query ? confused
It seems so since I am suffering the same issue mentioned in this thread.

Or is there a way to have initialiser and set this behaviour globally

Thanks in advanche for any suggestions regarding this matter.

@mperham
Copy link
Collaborator

mperham commented Dec 2, 2013

@rubytastic The advice is if you create a Thread manually within a Worker, use with_connection inside that thread. 95% of people using Sidekiq don't create their own threads so this advice should be very rare and not needed for most people.

@rubytastic
Copy link

@mike so I misinterpreted above. Seems still the issue is caused by active record 4.0.1 regarding not closing the db connections. Im on 4.0.1 and the problem there persists. Do you have a recommendation regarding a solution?

@x3qt
Copy link

x3qt commented Dec 4, 2013

Facing this issue with rails 4.0.1 and sidekiq 2.17.0.

@rubytastic
Copy link

@x3qt , @mperham
Perhaps this is the issue and not sidekiq problem:
rails/rails#12867

I have tried both with and without sidekiq running.
Seems the issue persist in both cases and is related to active record (?)

@rubytastic
Copy link

After switching to rails 4.0.2 and msyql with msyql2 gem, hoped the issues would be resolved but they persist.
It seems to be a problem with active record then but not sure, will try to downgrade sidekiq also or disable it to check if the problem occurs then

@rubytastic
Copy link

Any updates on this?
Problem seems still persistent. Tried updating sidekiq to latest.

@mperham
Copy link
Collaborator

mperham commented Jan 4, 2014

There's nothing Sidekiq can do. This is a Rails issue.

darwin added a commit to binaryage/discourse-docker that referenced this issue Feb 25, 2014
@janetruluck
Copy link

Sorry to resurrect a long since closed issue but figured I would mention this since it caused me a headache for a few days...

@rubytastic I am not sure how you are using ActiveRecord (i.e. within rails or by itself) but one thing to to look out for is load order when requiring Sidekiq and ActiveRecord. I came across this issue recently while using ActiveRecord outside of Rails and I was (quite stupidly...) requiring Sidekiq before ActiveRecord itself. Because of this the middleware in Sidekiq was obviously not picking up ActiveRecord was defined and in turn not including the middleware to close the connections to the DB properly. A pretty silly mistake but it was not very apparent until I manually copied the middleware itself and loaded it via an initializer that I realized it was not being picked up. I had just assumed that it was some issue with AR4 since it was working with AR3....doh

It is odd that this issue seemed to coincide with an upgrade to Active Record 4 for myself as well since the load order in my own app would not change due to that upgrade. But I have not been able to track down why yet.

Also, you can pretty easily check which middleware is being included with:

# initializers/sidekiq.rb
Sidekiq.configure_server do |config|
    config.server_middleware do |chain|
        p chain
    end
end 

And you should see the active record middleware within the chain. Hope this helps!

tldr: Make sure the middleware is being included haha

@pond
Copy link

pond commented Aug 20, 2014

This is still an issue and still broken as far as I can see. The Rails bug report related to this is wholly depressing as clearly deeper problems exist that nobody's addressing and it's since been locked, so impossible to contribute towards. The very clearly stated problem and suggested solutions have apparently not been implemented in favour of a couple of unrelated patches that AFAICS don't address the issue at all.

As far as Sidekiq goes, we have a very experimental hack in production right now just to "tide us over" - in practice, for months! - which works around the issue. The failure to process the queue in our case springs from Rails 4 ActiveRecord's "thread safety" actually being just a grotty hack of a giant semaphore around the entire system. Rails 4's PG adapter's "is connection active?" check is using blocking I/O, so when the combination of problems that causes Sidekiq threads to stall occurs, it's usually because there's no response in the "active?" check and that blocks indefinitely, locking the whole of ActiveRecord for everyone (in that execution instance). There are wider questions about the underlying reason for that call to block which are beyond the scope of this comment.

https://gist.github.com/pond/5eaec0c30c0b4477f234

Once again, this is experimental and papers over the cracks, but it's kept us going in prod. Use with great caution and read the comments at the top carefully before trying it out. You almost certainly need to have the ActiveRecord pool reaper running in conjunction with this.

It was developed while working for Loyalty New Zealand, my current employer, but is posted with permission and can be considered as available for any use (MIT/BSD/whatever kind of imaginary licence floats your boat!).

@Silex
Copy link

Silex commented Jul 11, 2017

I'm a bit confused about wether we should call ActiveRecord::Base.connection_pool.with_connection in our workers or not, if we do not create threads in them (just use the ActiveRecord models).

Can someone clarify?

@mperham
Copy link
Collaborator

mperham commented Jul 11, 2017 via email

@Silex
Copy link

Silex commented Jul 11, 2017

(I just realised that editing answers on github does not work well with email workflows because you only receive the first text that was submitted. I thus resubmit the whole text and delete my previous reply)


@mperham: thanks! What happens if we have more concurrent workers than available database pool size tho?

My current pool size is 100, I run rails 5. At the moment I never have more than 80 concurrent jobs and my workers all use ActiveRecord::Base.connection_pool.with_connection because I thought it was better practice and avoided "max DB connexion reached" issues.

I'm in the middle of a refactor where my goal is to remove the need for a DB connection for most of my jobs, so they have all the information they need to perform. Then another job's goal will be to collect the results and do the necessary DB operations... That was it should remove the need for a high database pool. I'll also remove the wrapping with ActiveRecord::Base.connection_pool.with_connection based on your advice.

@mperham
Copy link
Collaborator

mperham commented Jul 11, 2017

ActiveRecord keeps a pool of connections to the database, per-process. You should not run more than a concurrency of 50 in Sidekiq and I recommend setting the pool size == concurrency. The pool is lazy so AR will open up to 50 connections if you have a lot of jobs processing at the same time in the process.

In Sidekiq <5, Sidekiq's ActiveRecord middleware cleans up the job's db connection after execution.
In Sidekiq 5+, the ActiveSupport::Executor will clean up the job's db connection after execution. In either case, you don't need to do anything special in your code unless you are trying to do something special, like minimize connection checkout time.

@Silex
Copy link

Silex commented Jul 12, 2017

@mperham: ah, great thanks a lot for the clarifications!

In my case I use Sidekiq 5+ for jobs where I download images from ~130 IP cameras at regular intervals (each 10 minutes), so I kinda need the high concurrency. Maybe this needs to be designed differently...

@mperham
Copy link
Collaborator

mperham commented Jul 12, 2017 via email

@Silex
Copy link

Silex commented Jul 12, 2017

@mperham: ah multiple sidekiq processes? yes I can do that, that way I can limit each Sidekiq process to a concurrency of 50 or less like you recommend and also the database connection pool becomes less of an issue. I just didn't think of it.

Thanks again!

p.s: for anyone wondering "why 50", the explanation is in https://github.com/mperham/sidekiq/wiki/Advanced-Options#concurrency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants