Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a "restartpause" option + fixing travis-checks #659

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pfuender
Copy link

This is actually the same as PR #509 by fclairamb, just added my 2 cents, and (hopefully) fixed the tests. Please let us know if there's anything more needed to get this merged.

Thanks

  • Applied on startup failure aditionnaly to the backoff delay (by fclairamb)
  • Applied on bad exit status (by fclairamb)
  • fixed tests (by pfuender)

Default value is 0, which doesn't change the existing behavior.

The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.

* Applied on startup failure aditionnaly to the backoff delay (by fclairamb)
* Applied on bad exit status (by fclairamb)
* fixed tests (by pfuender)

Default value is 0, which doesn't change the existing behavior.

The goal is to impose a minimum delay between restarts to avoid overloading host with restarts.
@mrook
Copy link

mrook commented Sep 16, 2015

+1

@toastbrotch
Copy link

+1

@sschueller
Copy link

Please merge this. I need this. Thanks +1

@miso-belica
Copy link

👍

@icholy
Copy link

icholy commented Oct 6, 2015

RFC @mcdonc @mnaberez @theduderog

@oryband
Copy link

oryband commented Oct 18, 2015

@stevenscg
Copy link

+1 This is exactly what I was looking for!

In fact, I just ran into a case where all of the restart attempts happened way too quickly during a database failover and left all the processes down in a fatal state.

@doodyparizada
Copy link

+1

@oryband
Copy link

oryband commented Nov 1, 2015

1.5 months have passed since this pr was opened. merge plz!

@oliparcol
Copy link

+1

@pfuender
Copy link
Author

Guys - Any paypal account to get you some bribe money to get this merged? ;-)

@icholy
Copy link

icholy commented Nov 24, 2015

Is this being held back for philosophical reasons?

@webtussi
Copy link

webtussi commented Dec 7, 2015

+1

@oryband
Copy link

oryband commented Dec 20, 2015

can the merge gods bless this pr already? plz?

@oryband

This comment has been minimized.

@jrottenberg
Copy link

It's missing a bit for the doc, maybe that's the blocker , see docs/configuration.rst and supervisor/skel/sample.conf to document that new option.
If you look at the recent commits they are mostly touching files from the docs

Raphael Hoegger added 2 commits December 30, 2015 16:21
* upstream/master: (61 commits)
  Fix a typo
  Merge pull request Supervisor#703 from gudata/patch-1
  Add pypy3 to tox.ini and .travis.yml
  Use `make html` to build the docs under tox
  Fix package name for Sphinx
  Fix import for Mock
  Add docs to travis build
  Test that doc building + readme are correct
  Add Supervisor 3.2.0 release date to Supervisor 4 changelog
  Fix system.multicall() broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback.  50d1857 changed this so they may return either a callback or a bool, but the code that handles system.multicall() was not updated.  If multicall was used with stopProcess() followed by startProcess(), it would try to start the process before it had finished stopping.  This broke the restart process link on the web interface.
  Added supervisor_checks to the docs.
  Fix typo in docs
  Clarify behavior of user= option. Closes Supervisor#695
  Fix start/stop buttons on web broken by faster start/stop patch In past versions, startProcess() and stopProcess() would always return a callback.  50d1857 changed this so they may return either a callback or a bool, but the web interface was not updated.
  Show error messages when clearing a log on the web interface
  Move code for start and stop actions near each other
  Show errors when stopping a process on the web interface
  Show string description for unexpected faults
  Show all error messages in TailView. Fixes Supervisor#627
  Implement __str__ so code and text of RPCError are logged This is helpful for troubleshooting issues like Supervisor#627 where the traceback doesn't show the contents of the RPCError.
  ...
@pfuender
Copy link
Author

There we go! @jrottenberg

``restartpause``

Adds a pause (in seconds) between successive failed start attempts - thus
throttles failed-start attemps and prevents massive load increase during

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: attempts

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barqshasbite - fixed, thanks!

@harijoe
Copy link

harijoe commented Jan 6, 2016

+1 This restartpause parameter would be very useful

@dbpolito
Copy link

👍

1 similar comment
@igama
Copy link

igama commented Feb 17, 2016

+1

@ffrank
Copy link

ffrank commented Feb 24, 2016

+1 I would love to see this.

Does not look like a breaking change, so it should not require a major version bump, yes?

@ewerk-tstein
Copy link

+1 Please add an option for delayed restarts.

@icholy
Copy link

icholy commented Apr 26, 2016

C'mon people, use thumbs up please.

@mikezawitkowski
Copy link

👍

@toastbrotch
Copy link

toastbrotch commented Jun 29, 2016

please merge!
what is the hold up?

@jonathan-kosgei
Copy link

Has this been merged?

@nzjrs
Copy link

nzjrs commented Aug 28, 2016

Another gentle +1 from me. I'm running my own fork just for this option.

@s12v
Copy link

s12v commented Jan 18, 2017

@mnaberez, do you know can you help with merging this? The PR is opened since more then a year...

@briandignan
Copy link

merge please? :'(

@wjdecorte
Copy link

+1
Will soon be on the two year mark. Is there any chance of getting this merged? I'd really, really, really like to have the restartpause option without running my own fork.

Thanks!

@barqshasbite
Copy link

This behavior technically exists already with the 'startretries' option. If you set startretries=10, for example, every time it fails to start, it goes into the BACKOFF state where it waits n seconds, where n is the number of start failures it has experienced so far. So on the first failure it waits 1 second before trying to start again. On the second failure it waits 2 seconds before trying to start again. On the third failure it waits 3 seconds. And so on all the way up to 10 failures and a 10 second wait (at which point it would have waited 1+2+3+4+5+6+7+8+9+10=55 seconds total).

Searching the repo for 'startretries' will highlight how it interacts with the BACKOFF state and the 'backoff' delay.

See also the process states: http://supervisord.org/subprocess.html#process-states

This PR just adds an additional delay on top of the pre-existing 'backoff' delay, which I suspect over complicates things and is why it is not being merged through. If you need a large delay, I recommend just setting 'startretries' to a sufficiently large number.

@jimbrowne
Copy link

I read the code to verify @barqshasbite's explanation. I think perhaps adding this explanation to http://supervisord.org/subprocess.html#process-states and a note to http://supervisord.org/configuration.html#program-x-section-settings about the BACKOFF state and its' increasing duration would be sufficient. The program config section gives no suggestion that this backoff strategy is used.

@Don42
Copy link

Don42 commented Jun 17, 2017

Linear backoff isn't always enough. When dealing with longer term issues I would prefer exponential backoff, as to no get spammed with failure messages.

@mikemix
Copy link

mikemix commented Sep 21, 2017

Totally agree with @Don42 !

@elgalu
Copy link

elgalu commented Oct 25, 2018

Any reason this isn't merged? the PR looks quite straightforward adding restartpause would be a great addition even if not perfect.

@lhovo
Copy link

lhovo commented Jun 14, 2019

Friendly bump to admin team, this would be great if it was merged :D

@fclairamb
Copy link

4 years about my half-baked proposal (#509), this still hasn't been merged 😄! @mnaberez if you don't like this PR, you should at least close it.

@mnaberez
Copy link
Member

@mnaberez if you don't like this PR, you should at least close it.

I am currently working on other issues but I am not the only person with commit access to this repository so I will leave it open.

@SeanJA
Copy link

SeanJA commented Feb 19, 2020

+1

1 similar comment
@elgalu
Copy link

elgalu commented Feb 27, 2020

👍

@isosphere
Copy link

isosphere commented May 1, 2022

I will set startretries to 999 for my use case in the absence of this PR being merged but I would rather have this feature.

The kinds of failure I am restarting from will not be resolved in seconds, so I am guaranteed to have several dozen unnecessary failures until it is waiting an appropriate amount of time to restart.

I'm not a fan of approach of using a wrapper that sleeps after execution because then you're always delaying restarts, not just when you have some kind of fatal error.

@michnhokn
Copy link

Please merge this :)

@gander
Copy link

gander commented May 13, 2023

@mnaberez @theduderog @mcdonc

What's the problem, exactly? You don't like the idea, then shut it down. Like it, what are you waiting so long for? How long can you wait for such a decision?

@mikemix
Copy link

mikemix commented May 13, 2023

+1, please merge.

@wokenlex
Copy link

10 years soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet