Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runaway checks hit redis at 2600 requests per second within 30 minutes of starting #10

Closed
rosskevin opened this issue Nov 8, 2018 · 16 comments
Labels
bug Something isn't working
Milestone

Comments

@rosskevin
Copy link

I've just integrated this but found a single instance escalated within 30 minutes to 2600 requests per second. This is a no load/new architecture and this is the only job running.

I did notice an environment variable/config error on my part, but not sure it would cause this - I had set ttl to config.time_to_live = 8000.

redis_labs

I did happen to lose a kubernetes cluster node in the pool, so the sidekiq pod may have been restarted on another host. Could this be the same as @fruwe is discussing in #7?

Could this TTL 8000 cause it to spin out of control (wouldn't think so)?

@rosskevin
Copy link
Author

This has just happened again. As I take down pods and deploy new ones it just runs away:

2018-11-09T22:54:08.709Z 11 TID-go5eoizov SidekiqAlive::Worker JID-13c5f2051107f52a25ef5f1e INFO: start
2018-11-09T22:54:08.710Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-29372509d901455f06733abb INFO: done: 0.005 sec
2018-11-09T22:54:08.713Z 11 TID-go5eoizov SidekiqAlive::Worker JID-13c5f2051107f52a25ef5f1e INFO: done: 0.003 sec
2018-11-09T22:54:08.713Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-cad550f796b0ee11ea24af10 INFO: start
2018-11-09T22:54:08.716Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-cad550f796b0ee11ea24af10 INFO: done: 0.003 sec
2018-11-09T22:54:08.716Z 11 TID-go5eoizef SidekiqAlive::Worker JID-cc66acbdebf274f111b40c58 INFO: start
2018-11-09T22:54:08.719Z 11 TID-go5eoizef SidekiqAlive::Worker JID-cc66acbdebf274f111b40c58 INFO: done: 0.003 sec
2018-11-09T22:54:08.719Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-31c532b2ec2d1aa57b37d0d2 INFO: start
2018-11-09T22:54:08.725Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-31c532b2ec2d1aa57b37d0d2 INFO: done: 0.005 sec
2018-11-09T22:54:08.725Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-1c4f6526cadc2a67b52b5ce0 INFO: start
2018-11-09T22:54:08.729Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-1c4f6526cadc2a67b52b5ce0 INFO: done: 0.004 sec
2018-11-09T22:54:08.730Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-e7447ac5992ca48667b2aa0d INFO: start
2018-11-09T22:54:08.735Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-e7447ac5992ca48667b2aa0d INFO: done: 0.005 sec
2018-11-09T22:54:08.735Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-48fa855268670a1c77584efe INFO: start
2018-11-09T22:54:08.738Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-48fa855268670a1c77584efe INFO: done: 0.003 sec
2018-11-09T22:54:08.739Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-dbb66602fac45881bec1ff59 INFO: start
2018-11-09T22:54:08.743Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-be3aa1aa7ec2db3402005825 INFO: start
2018-11-09T22:54:08.744Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-dbb66602fac45881bec1ff59 INFO: done: 0.005 sec
2018-11-09T22:54:08.746Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-be3aa1aa7ec2db3402005825 INFO: done: 0.003 sec
2018-11-09T22:54:08.746Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-e053558a23400b40139cf321 INFO: start
2018-11-09T22:54:08.750Z 11 TID-go5eoizov SidekiqAlive::Worker JID-2dec473f542077fcf88ea7d1 INFO: start
2018-11-09T22:54:08.751Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-e053558a23400b40139cf321 INFO: done: 0.004 sec
2018-11-09T22:54:08.753Z 11 TID-go5eoizov SidekiqAlive::Worker JID-2dec473f542077fcf88ea7d1 INFO: done: 0.003 sec
2018-11-09T22:54:08.754Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-9abdc83f275f941e630dfa49 INFO: start
2018-11-09T22:54:08.757Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-9abdc83f275f941e630dfa49 INFO: done: 0.003 sec
2018-11-09T22:54:08.757Z 11 TID-go5eoizef SidekiqAlive::Worker JID-3f5ed59a06e7e926c9c26e6d INFO: start
2018-11-09T22:54:08.760Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-19e69736237e3f93faefc95e INFO: start
2018-11-09T22:54:08.768Z 11 TID-go5eoizef SidekiqAlive::Worker JID-3f5ed59a06e7e926c9c26e6d INFO: done: 0.011 sec
2018-11-09T22:54:08.771Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-ad2c19842bb736bbaddd0406 INFO: start
2018-11-09T22:54:08.771Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-19e69736237e3f93faefc95e INFO: done: 0.011 sec
2018-11-09T22:54:08.773Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-ad2c19842bb736bbaddd0406 INFO: done: 0.003 sec
2018-11-09T22:54:08.774Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-dffe4b1ae40923aefde8098d INFO: start
2018-11-09T22:54:08.776Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-dffe4b1ae40923aefde8098d INFO: done: 0.003 sec
2018-11-09T22:54:08.777Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-eda2ac20cd709bf30792d59a INFO: start
2018-11-09T22:54:08.779Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-50eaefb18564c5780ecc802c INFO: start
2018-11-09T22:54:08.780Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-eda2ac20cd709bf30792d59a INFO: done: 0.003 sec
2018-11-09T22:54:08.782Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-50eaefb18564c5780ecc802c INFO: done: 0.002 sec
2018-11-09T22:54:08.782Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-e599764cdacd0fe43481b5cb INFO: start
2018-11-09T22:54:08.785Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-63e250a424575821374ad89c INFO: start
2018-11-09T22:54:08.786Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-e599764cdacd0fe43481b5cb INFO: done: 0.003 sec
2018-11-09T22:54:08.789Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-63e250a424575821374ad89c INFO: done: 0.004 sec
2018-11-09T22:54:08.789Z 11 TID-go5eoizov SidekiqAlive::Worker JID-ceca7482d9946899dc8c80f9 INFO: start
2018-11-09T22:54:08.792Z 11 TID-go5eoizov SidekiqAlive::Worker JID-ceca7482d9946899dc8c80f9 INFO: done: 0.003 sec
2018-11-09T22:54:08.793Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-6d532f1c87cb3d6a5ac73391 INFO: start
2018-11-09T22:54:08.796Z 11 TID-go5eoizef SidekiqAlive::Worker JID-368216db9b0db2bc3d8d073b INFO: start
2018-11-09T22:54:08.796Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-6d532f1c87cb3d6a5ac73391 INFO: done: 0.003 sec
2018-11-09T22:54:08.815Z 11 TID-go5eoizef SidekiqAlive::Worker JID-368216db9b0db2bc3d8d073b INFO: done: 0.019 sec
2018-11-09T22:54:08.818Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-b905c7eb0f0b8ab58613e5e3 INFO: start
2018-11-09T22:54:08.822Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-50699c6860cff70fa3384955 INFO: start
2018-11-09T22:54:08.823Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-b905c7eb0f0b8ab58613e5e3 INFO: done: 0.005 sec
2018-11-09T22:54:08.826Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-df0acb5dae800558f88efbf1 INFO: start
2018-11-09T22:54:08.826Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-50699c6860cff70fa3384955 INFO: done: 0.004 sec
2018-11-09T22:54:08.829Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-256f96301792de537316b1e5 INFO: start
2018-11-09T22:54:08.829Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-df0acb5dae800558f88efbf1 INFO: done: 0.004 sec

@rosskevin
Copy link
Author

I guess it slows down after it catches up, but there wasn't a sidekiq running for x amount of time. @fruwe is this what you intend to solve with #7?

@rosskevin
Copy link
Author

rosskevin commented Nov 12, 2018

This just happened again - we are sorting out our new production and had 30 minutes of downtime. Upon startup, sidekiq was slammed with 58,000 SidekiqAlive::Worker jobs in less than 10 seconds.

@gsmetal
Copy link

gsmetal commented Nov 14, 2018

It looks like that problem is when new version of pod is deploying:

  1. Old pod scheduled job and then stop accepting new jobs (because caught termination signal).
  2. New pod started and received old pod's job.
  3. Old pod's hostname is still registered, because it's time-to-live is configured time-to-live + one minute
    def self.register_instance(instance_name)
    redis.set(instance_name,
    Time.now.to_i,
    { ex: config.time_to_live.to_i + 60 })
    end
    so new pod's trying to execute job, but it's hostname is different, so it reschedule job immediately
    else
    # requeue for hostname to validate it's own liveness probe
    self.class.perform_async(hostname)
  4. (3) happens indefinitely until old pod's hostname is timed out (for one minute).

I think #7 should help to limit this old pod's job to execute only ~60 times.

@gsmetal
Copy link

gsmetal commented Nov 14, 2018

Another case where this problem may be occured is when there is several replicas of one sidekiq and one of them is stuck. Then it's job will be scheduled to another pods and them will be ping-pong it until hostname timed out.

@gsmetal
Copy link

gsmetal commented Nov 14, 2018

I think #7 should help to limit this old pod's job to execute only ~60 times.

That's not accurate because it depends on average_scheduled_poll_interval option of sidekiq config.

@arturictus
Copy link
Owner

Hi, Sorry for the delay.
I'll add a sidekiq callback on stop to remove the registered hostname from the list.
That will not trigger new jobs.

@arturictus arturictus added this to the 1.0.1 milestone Jan 8, 2019
@arturictus arturictus added the bug Something isn't working label Jan 8, 2019
@arturictus
Copy link
Owner

Hi,
I think this pull should solved it: #12
This works when pods are on the regular kubernetes lifecircle. When deploying or kubernetes kills the pod organically. I will not stand to edge cases like compleat shutdown without a signal. But it should not happen anyway or in really weird conditions.

I think on quiet should be enough I added on shutdown just in case the deployment is not set properly as the example deployment explains.

what do you think?

@arturictus
Copy link
Owner

Hi @rosskevin,
If I understand correctly it only happens when pods are deleted and new ones start processing. Is that the case?

@arturictus
Copy link
Owner

Version 1.0.1 released. Should solve this issue.

@rosskevin
Copy link
Author

rosskevin commented Jan 21, 2019

This should be reopened (verified on 1.0.1):

Throughput of the Redis database indicated below is 1553 RPS (requests per second).

This is a brand new cluster/business with essentially no traffic other than alpha users. The spike occurs on a new deployment (old one deleted, new one added)

redis_labs

What information could I provide that can help?

@fruwe
Copy link

fruwe commented Feb 6, 2019

I guess it slows down after it catches up, but there wasn't a sidekiq running for x amount of time. @fruwe is this what you intend to solve with #7?

@rosskevin Sorry for my late reply. Yes, something similar was happening on my cluster too.

Even with my PR, I experienced spikes of 150k jobs on an empty cluster. Last week, I had to remove sidekiq-alive, since my cluster keeps on crashing (in fact I had to remove the HTTP health check as well). I will have to investigate whether this problem occurs due to sidekiq-alive or a GKE update or whatever.

I will try v1.0.1 from @arturictus (Thanks for #12) and see how it goes.

@arturictus
Copy link
Owner

Reopen this.
I will add a queue per pod

@arturictus arturictus reopened this Feb 20, 2019
@arturictus
Copy link
Owner

already open

@arrowcircle
Copy link
Contributor

It looks like the problem persists in some or another way.
Every deploy I get 100k jobs scheduled.

I think the problem is because of no sidekiqctl in binary anymore.

Any ideas how to handle this problem?

@arturictus
Copy link
Owner

Hi @arrowcircle did you try with version 2.
In sidekiq 6 they removed sidekiqctl but when I tried the shutdown was triggering the callbacks.
Aswell in SidekiqAlive version 2 each instance has his own queue I do not see how it can be happening.

arturictus pushed a commit that referenced this issue Dec 7, 2022
* Add release and changelog workflow

* Set custom label for actions deps
andrcuns added a commit that referenced this issue Dec 20, 2022
* Add release and changelog workflow

* Set custom label for actions deps
andrcuns added a commit that referenced this issue Dec 22, 2022
* Create dependabot.yml

* Remove deprecated options call on config

* Clean up logging and add ability to set custom logger

* Clean up code and add shutdown callback

* Regenerate gemfile

* Correctly set sidekiq queues

* Update workflow

* Add github-actions to dep updates

* Add lockfile platforms

* Update version to 2.2.0

* Bump actions/checkout from 2 to 3 (#7)

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Rename gem to sidekiq-alive (#8)

* Add release and changelog workflow (#10)

* Add release and changelog workflow

* Set custom label for actions deps

* Fix setting custom sidekiq queue (#9)

* Move all configuration in to startup block

* Check respond to on config object

* Update specs

* Run specs in random order

* Set git identity for release workflow

* Rename gem to sidekiq-alive-next

* Update README.md

* Create github release in the same job

* Update gemspec metadata (#11)

* Update gemspec metadata

* Require ruby 2.7

* Add static rubocop checks (#12)

* Add static rubocop checks

* Allign arguments

* Automate release (#13)

* Add rake task to update version

* Add version update steps

* Unset deployment mode for release

* Add release ssh key for release workflow

* Checkout tag in release

* Add additional details to startup info (#14)

* Use scan for finding and purging jobs (#15)

* Add changelog workflow (#16)

* Bump rspec from 3.11.0 to 3.12.0 (#17)

Bumps [rspec](https://github.com/rspec/rspec-metagem) from 3.11.0 to 3.12.0.
- [Release notes](https://github.com/rspec/rspec-metagem/releases)
- [Commits](rspec/rspec-metagem@v3.11.0...v3.12.0)

---
updated-dependencies:
- dependency-name: rspec
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Sidekiq 7 support (#19)

* Update sidekiq requirement from >= 5, < 7 to >= 5, < 8

Updates the requirements on [sidekiq](https://github.com/mperham/sidekiq) to permit the latest version.
- [Release notes](https://github.com/mperham/sidekiq/releases)
- [Changelog](https://github.com/mperham/sidekiq/blob/main/Changes.md)
- [Commits](sidekiq/sidekiq@v6.5.7...v7.0.0)

---
updated-dependencies:
- dependency-name: sidekiq
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update for sidekiq 7 compatibility

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update README.md

* Update changelog.yml

* Use ruby 3.1.2 for development

* Replace pry with debug gem

* Update rack requirement from >= 2.2.4, < 3 to >= 3, < 4 (#20)

* Update rack requirement from >= 2.2.4, < 3 to >= 2.2.4, < 4

Updates the requirements on [rack](https://github.com/rack/rack) to permit the latest version.
- [Release notes](https://github.com/rack/rack/releases)
- [Changelog](https://github.com/rack/rack/blob/main/CHANGELOG.md)
- [Commits](rack/rack@2.2.4...3.0.0)

---
updated-dependencies:
- dependency-name: rack
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add rackup gem and improve tests

* Narrow down rack requirement

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrejs Cunskis <andrejs.cunskis@gmail.com>

* Update nokogiri and workflow ruby setup

* Bump sidekiq from 7.0.0 to 7.0.1 (#21)

Bumps [sidekiq](https://github.com/mperham/sidekiq) from 7.0.0 to 7.0.1.
- [Release notes](https://github.com/mperham/sidekiq/releases)
- [Changelog](https://github.com/mperham/sidekiq/blob/main/Changes.md)
- [Commits](sidekiq/sidekiq@v7.0.0...v7.0.1)

---
updated-dependencies:
- dependency-name: sidekiq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rackup from 0.2.2 to 0.2.3 (#22)

Bumps [rackup](https://github.com/rack/rackup) from 0.2.2 to 0.2.3.
- [Release notes](https://github.com/rack/rackup/releases)
- [Commits](rack/rackup@v0.2.2...v0.2.3)

---
updated-dependencies:
- dependency-name: rackup
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rack from 3.0.0 to 3.0.1 (#23)

Bumps [rack](https://github.com/rack/rack) from 3.0.0 to 3.0.1.
- [Release notes](https://github.com/rack/rack/releases)
- [Changelog](https://github.com/rack/rack/blob/main/CHANGELOG.md)
- [Commits](rack/rack@3.0.0...v3.0.1)

---
updated-dependencies:
- dependency-name: rack
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump sidekiq from 7.0.1 to 7.0.2 (#24)

Bumps [sidekiq](https://github.com/mperham/sidekiq) from 7.0.1 to 7.0.2.
- [Release notes](https://github.com/mperham/sidekiq/releases)
- [Changelog](https://github.com/mperham/sidekiq/blob/main/Changes.md)
- [Commits](sidekiq/sidekiq@v7.0.1...v7.0.2)

---
updated-dependencies:
- dependency-name: sidekiq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Loosen rackup requirement (#25)

* Bump rack from 3.0.1 to 3.0.2 (#27)

Bumps [rack](https://github.com/rack/rack) from 3.0.1 to 3.0.2.
- [Release notes](https://github.com/rack/rack/releases)
- [Changelog](https://github.com/rack/rack/blob/main/CHANGELOG.md)
- [Commits](rack/rack@v3.0.1...v3.0.2)

---
updated-dependencies:
- dependency-name: rack
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump debug from 1.6.3 to 1.7.0 (#26)

Bumps [debug](https://github.com/ruby/debug) from 1.6.3 to 1.7.0.
- [Release notes](https://github.com/ruby/debug/releases)
- [Commits](ruby/debug@v1.6.3...v1.7.0)

---
updated-dependencies:
- dependency-name: debug
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* renaming back and apply changes

* rubocop fix

* rubocop fix (for real)

* add comment to why different ways to access to queues

* use badges from this repo

* Remove breaking version restrictions

Add testing with multiple sidekiq versions

Remove rack version requirement

* Enable new cops by default

* Ignore load error for sidekiq/capsule

* Move dynamic sidekiq version definition to Gemfile

* Improve sidekiq version detection

* Add redis client wrappers

* Remove redundant module function

* Fix sq_config mocking

* fix sidekiq 5

* Update release actions

* improve Helpers version

* Update release.yml for default project labels

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Andrejs <andrejs.cunskis@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants