Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[process.py] Handle scenario where system clock rolls backward #1047

Merged
merged 5 commits into from Mar 13, 2019

Conversation

3 participants
@jleveque
Copy link
Contributor

jleveque commented Jan 18, 2018

  • Resolves #1043

  • If the system clock is set to an earlier time after a process is started but before it has entered the RUNNING state, supervisorctl start <process_name> can hang for a very long time waiting for the current system time to become greater than self.laststart + self.config.startsecs, depending on how far backward the system time has moved.

  • This pull request resolves the issue above and also attempts to address a couple other potential issues in process.py with system time moving backward.

  • Also added related unit tests

@mnaberez

This comment has been minimized.

Copy link
Member

mnaberez commented Feb 7, 2018

Thanks for looking into this issue and for your work on this pull request (with tests!). I definitely think we need to fix this issue. A concern that I have reading this diff is the way these checks are inlined. There is now as much code for checking time rollback as there is actual state transition logic. I think this makes the transition logic more difficult to follow. Can we separate these concerns somehow, e.g. can we do something like check for rollback once at the top of transition and then call a separate method to fix up all the timestamps if rollback is detected?

@jleveque jleveque force-pushed the jleveque:system_clock_rollback_fix branch from 39bab51 to 8b0d1a3 Feb 13, 2018

Add _check_and_adjust_for_system_clock_rollback() method to Subproces…
…s class in order to streamline code a bit

@jleveque jleveque force-pushed the jleveque:system_clock_rollback_fix branch from 8b0d1a3 to 75e9804 Feb 13, 2018

@jleveque

This comment has been minimized.

Copy link
Contributor Author

jleveque commented Mar 6, 2018

I pushed some changes a few weeks ago in an attempt to address your comments above. Please review the latest changes when you have a moment.

@jleveque

This comment has been minimized.

Copy link
Contributor Author

jleveque commented Oct 29, 2018

Update: It appears I missed at least one case where system clock rollback can still cause issues. With this patch, the issue is encountered far less frequently, but it appears it is not complete.

@mnaberez: What are your thoughts on using monotonic time throughout supervisor? I think this would prevent the need for checking for system clock rollback and subsequent adjustments.

@mnaberez

This comment has been minimized.

Copy link
Member

mnaberez commented Oct 29, 2018

@mnaberez: What are your thoughts on using monotonic time throughout supervisor? I think this would prevent the need for checking for system clock rollback and subsequent adjustments.

I think it's a good idea but it's not available on all versions of Python that Supervisor supports. There was a PR submitted that added monotonic time for old Python versions using ctypes hacking. I didn't think that was a good idea for production but I suggested we could use monotonic time if it was available (#468 (comment)). I had solicited feedback on that at that time but didn't get any.

@nikos-github

This comment has been minimized.

Copy link

nikos-github commented Nov 27, 2018

@mnaberez If you believe it's a must to keep supporting Python versions older than 3 when 3 has been around for 10 years now, then no argument there (I personally would drop the support but my take on this has no bearing in that decision).

It's important however to bring stability to supervisord in my opinion hence I would recommend moving ahead with the suggestion you made in #468 (comment). Is this something acceptable and can be done relatively quickly?

@mnaberez

This comment has been minimized.

Copy link
Member

mnaberez commented Nov 28, 2018

If you believe it's a must to keep supporting Python versions older than 3 when 3 has been around for 10 years now,

Supervisor is over 14 years old and for most of that time, it ran only on Python 2. Consequently, it still has a large installed base of Python 2 users today. We've made significant efforts to support both 2 and 3 due to our installed base, and we'll continue to support them so they receive bug fixes until they can transition to 3.

It's important however to bring stability to supervisord in my opinion hence I would recommend moving ahead with the suggestion you made in #468 (comment).

This patch (#1047) or something like it should probably done to fix the issue on installations where monotonic time is not available. It should be fine to also add something like #468 (comment). When monotonic time is available, the checks here would just not be triggered.

@jleveque

This comment has been minimized.

Copy link
Contributor Author

jleveque commented Nov 28, 2018

@mnaberez: Have you looked into the package I linked above: https://pypi.org/project/monotonic/? I do not have experience with it, but apparently it is an implementation of monotonic time for both Python 2 and 3:

On Python 3.3 or newer, monotonic will be an alias of time.monotonic from the standard library. On older versions, it will fall back to an equivalent implementation:

Linux, BSD, AIX: clock_gettime(3)
Windows: GetTickCount or GetTickCount64
OS X: mach_absolute_time

@mnaberez

This comment has been minimized.

Copy link
Member

mnaberez commented Dec 3, 2018

Have you looked into the package I linked above

I did. I looked at the source and it uses the same ctypes hacking as the PR that I mentioned. Supervisor runs on a large number of production servers so I'm hesitant to do that sort of thing. It might be fine but I don't really have a way to know. I'd rather be conservative for production use. I'm more inclined do something like this patch for Python 2 and use the built-in monotonic time on Python 3.

jleveque added some commits Feb 27, 2019

@jleveque

This comment has been minimized.

Copy link
Contributor Author

jleveque commented Mar 1, 2019

Added a patch (and related test) to prevent a crash when a process has already entered the RUNNING state but then clock rolled back to be less than startsecs. The crash would present itself with log messages similar to the following:

Traceback (most recent call last):
File "/usr/bin/supervisord", line 9, in <module>
load_entry_point('supervisor==3.3.3', 'console_scripts', 'supervisord')()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 363, in main
go(options)
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 373, in go
d.main()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 84, in main
self.run()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 101, in run
self.runforever()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 253, in runforever
self.reap()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 289, in reap
process.finish(pid, sts)
File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 527, in finish
self._assertInState(ProcessStates.STARTING)
File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 179, in _assertInState
self.config.name, current_state, allowable_states))
AssertionError: Assertion failed for start.sh: RUNNING not in STARTING

@mnaberez mnaberez merged commit c04d3d5 into Supervisor:master Mar 13, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jleveque jleveque deleted the jleveque:system_clock_rollback_fix branch Mar 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.