New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only log '[num] workers' message when it changes. #1078

Merged
merged 3 commits into from Aug 19, 2015

Conversation

Projects
None yet
3 participants
@erydo
Contributor

erydo commented Jul 12, 2015

Otherwise when debug logging is on, the message prints every second even with no system activity, drowning out more useful log messages:

[2015-07-12 17:39:25 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:26 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:27 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:28 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:29 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:30 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:31 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:32 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:33 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:34 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:35 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:36 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:37 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:38 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:39 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:40 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:41 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:42 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:43 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:45 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:46 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:47 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:48 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:49 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:50 +0000] [19970] [DEBUG] 3 workers
Show outdated Hide outdated gunicorn/arbiter.py
@@ -473,6 +473,8 @@ def manage_workers(self):
Maintain the number of workers by spawning or killing
as required.
"""
orig_num_workers = self.num_workers
if len(self.WORKERS.keys()) < self.num_workers:

This comment has been minimized.

@benoitc

benoitc Jul 12, 2015

Owner

i would rather tag here if a change has been done and test against it instead of comparing numbers. Thoughts?

@benoitc

benoitc Jul 12, 2015

Owner

i would rather tag here if a change has been done and test against it instead of comparing numbers. Thoughts?

This comment has been minimized.

@erydo

erydo Jul 12, 2015

Contributor

In my experience, using a mutable has_changed-type flag tends to open up a class of errors where the flag does not get set appropriately after modification. Particularly when it happens in a separate method call (Can kill_worker/spawn_workers modify num_workers? Pretty sure. Do they always? I don't know). That kind of bug crops up more during future code changes/maintenance.

It also makes the if-statement harder to reason about: does this flag represent that the value being printed out changed? That it could have changed? Should have? If I see multiple "3 workers" messages in sequence, is that expected or a bug?

This is a short method, but those are the reasons I tend to prefer the state-comparison approach when it's not a computationally difficult one.

@erydo

erydo Jul 12, 2015

Contributor

In my experience, using a mutable has_changed-type flag tends to open up a class of errors where the flag does not get set appropriately after modification. Particularly when it happens in a separate method call (Can kill_worker/spawn_workers modify num_workers? Pretty sure. Do they always? I don't know). That kind of bug crops up more during future code changes/maintenance.

It also makes the if-statement harder to reason about: does this flag represent that the value being printed out changed? That it could have changed? Should have? If I see multiple "3 workers" messages in sequence, is that expected or a bug?

This is a short method, but those are the reasons I tend to prefer the state-comparison approach when it's not a computationally difficult one.

This comment has been minimized.

@erydo

erydo Jul 12, 2015

Contributor

More concretely, as someone unfamiliar with the codebase, I didn't know if spawn_workers would necessarily change that number, or could be capped by memory or some other configuration value.

It could be expected to try to spawn each time but not necessarily succeed at increasing that number; I'd have to look deeper to see if that was a (long-term) guarantee.

(I hope that makes it clear where my head was at here)

@erydo

erydo Jul 12, 2015

Contributor

More concretely, as someone unfamiliar with the codebase, I didn't know if spawn_workers would necessarily change that number, or could be capped by memory or some other configuration value.

It could be expected to try to spawn each time but not necessarily succeed at increasing that number; I'd have to look deeper to see if that was a (long-term) guarantee.

(I hope that makes it clear where my head was at here)

This comment has been minimized.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

I think your reasoning is sound but if the flag is localized to this function then it's pretty fail proof.

At the top of the function, just set count_changed = False.
In each of the loops you can set count_changed = True.
At the bottom of the function you can check count_changed.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

I think your reasoning is sound but if the flag is localized to this function then it's pretty fail proof.

At the top of the function, just set count_changed = False.
In each of the loops you can set count_changed = True.
At the bottom of the function you can check count_changed.

This comment has been minimized.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

I actually think I slightly prefer what I just described for exactly the reasons you say. While it should be clear from the code that self.num_workers must change or the loops would not terminate, it's perhaps clearest to just set a flag when we're attempting to change the number.

It's conceivable that self.num_workers could change between the loops so that it is first increased and then decreased, so if the edge case of logging the same number twice bothers us then your way is indeed better.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

I actually think I slightly prefer what I just described for exactly the reasons you say. While it should be clear from the code that self.num_workers must change or the loops would not terminate, it's perhaps clearest to just set a flag when we're attempting to change the number.

It's conceivable that self.num_workers could change between the loops so that it is first increased and then decreased, so if the edge case of logging the same number twice bothers us then your way is indeed better.

This comment has been minimized.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

What if kill_worker moved the worker from self.WORKERS to a dead worker list? I don't think ESRCH block is necessary because reap_workers should still get the SIGCHLD.

Then the manage_workers becomes three steps:

@tilgovi

tilgovi Jul 13, 2015

Collaborator

What if kill_worker moved the worker from self.WORKERS to a dead worker list? I don't think ESRCH block is necessary because reap_workers should still get the SIGCHLD.

Then the manage_workers becomes three steps:

This comment has been minimized.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

Scratch that last part, not three steps. manage_workers can stay the same, I think.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

Scratch that last part, not three steps. manage_workers can stay the same, I think.

This comment has been minimized.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

@erydo maybe let's make a PR to clean this dance up a bit and then let's rebase this. If you're up for it.

@tilgovi

tilgovi Jul 13, 2015

Collaborator

@erydo maybe let's make a PR to clean this dance up a bit and then let's rebase this. If you're up for it.

This comment has been minimized.

@erydo

erydo Jul 13, 2015

Contributor

Sure thing, I'll give it a stab.

@erydo

erydo Jul 13, 2015

Contributor

Sure thing, I'll give it a stab.

This comment has been minimized.

@erydo

erydo Jul 13, 2015

Contributor

See #1084.

@erydo

erydo Jul 13, 2015

Contributor

See #1084.

@erydo

This comment has been minimized.

Show comment
Hide comment
@erydo

erydo Jul 13, 2015

Contributor

This PR is on hold until #1084 is completed.

Contributor

erydo commented Jul 13, 2015

This PR is on hold until #1084 is completed.

Only log '[num] workers' message when it changes.
Otherwise when debug logging is on, the message prints every second even
with no system activity.
@erydo

This comment has been minimized.

Show comment
Hide comment
@erydo

erydo Jul 19, 2015

Contributor

(Rebasing against current master)

Contributor

erydo commented Jul 19, 2015

(Rebasing against current master)

Store last logged worker count.
This is easier and safer than only logging when we detect that
self.WORKERS has changed or that `spawn_worker` or `kill_worker` has
been done.
@erydo

This comment has been minimized.

Show comment
Hide comment
@erydo

erydo Jul 19, 2015

Contributor

@tilgovi @benoitc — I've updated this to no longer rely on synchronicity/asynchronicity of spawn_worker, kill_worker, or self.WORKERS. It just stores the last number that was logged and checks against that.

This was possibly the right approach to begin with (no error conditions!) and decouples it from #1084, which might take some more time to merge in.

Contributor

erydo commented Jul 19, 2015

@tilgovi @benoitc — I've updated this to no longer rely on synchronicity/asynchronicity of spawn_worker, kill_worker, or self.WORKERS. It just stores the last number that was logged and checks against that.

This was possibly the right approach to begin with (no error conditions!) and decouples it from #1084, which might take some more time to merge in.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jul 23, 2015

Owner

Quite like the idea :) While we are here maybe we could also just log where the number of workers did really change in spawn workers and when we reap them. Something like the patch below. I didn't test it though. Thoughts?

diff --git a/gunicorn/arbiter.py b/gunicorn/arbiter.py
index b7ee05d..85fa5ea 100644
--- a/gunicorn/arbiter.py
+++ b/gunicorn/arbiter.py
@@ -55,6 +55,7 @@ class Arbiter(object):
         os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE

         self._num_workers = None
+        self.worker_count = None
         self.setup(app)

         self.pidfile = None
@@ -464,6 +465,7 @@ class Arbiter(object):
                     if not worker:
                         continue
                     worker.tmp.close()
+                    self._log_numworkers()
         except OSError as e:
             if e.errno != errno.ECHILD:
                 raise
@@ -475,6 +477,7 @@ class Arbiter(object):
         """
         if len(self.WORKERS.keys()) < self.num_workers:
             self.spawn_workers()
+            self._log_numworkers()

         workers = self.WORKERS.items()
         workers = sorted(workers, key=lambda w: w[1].age)
@@ -482,10 +485,6 @@ class Arbiter(object):
             (pid, _) = workers.pop(0)
             self.kill_worker(pid, signal.SIGTERM)

-        self.log.debug("{0} workers".format(len(workers)),
-                       extra={"metric": "gunicorn.workers",
-                              "value": len(workers),
-                              "mtype": "gauge"})

     def spawn_worker(self):
         self.worker_age += 1
@@ -563,8 +562,20 @@ class Arbiter(object):
                 try:
                     worker = self.WORKERS.pop(pid)
                     worker.tmp.close()
+                    self._log_numworkers()
                     self.cfg.worker_exit(self, worker)
                     return
                 except (KeyError, OSError):
                     return
             raise
+
+    def _log_numworkers(self):
+        nworker_count = len(workers)
+        if self.worker_count != nworker_count:
+            self.log.debug("{0} workers".format(nworker_count),
+                    extra={"metric": "gunicorn.workers",
+                        "value": len(workers),
+                        "mtype": "gauge"})
+            self.worker_count = nworker_count
+
Owner

benoitc commented Jul 23, 2015

Quite like the idea :) While we are here maybe we could also just log where the number of workers did really change in spawn workers and when we reap them. Something like the patch below. I didn't test it though. Thoughts?

diff --git a/gunicorn/arbiter.py b/gunicorn/arbiter.py
index b7ee05d..85fa5ea 100644
--- a/gunicorn/arbiter.py
+++ b/gunicorn/arbiter.py
@@ -55,6 +55,7 @@ class Arbiter(object):
         os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE

         self._num_workers = None
+        self.worker_count = None
         self.setup(app)

         self.pidfile = None
@@ -464,6 +465,7 @@ class Arbiter(object):
                     if not worker:
                         continue
                     worker.tmp.close()
+                    self._log_numworkers()
         except OSError as e:
             if e.errno != errno.ECHILD:
                 raise
@@ -475,6 +477,7 @@ class Arbiter(object):
         """
         if len(self.WORKERS.keys()) < self.num_workers:
             self.spawn_workers()
+            self._log_numworkers()

         workers = self.WORKERS.items()
         workers = sorted(workers, key=lambda w: w[1].age)
@@ -482,10 +485,6 @@ class Arbiter(object):
             (pid, _) = workers.pop(0)
             self.kill_worker(pid, signal.SIGTERM)

-        self.log.debug("{0} workers".format(len(workers)),
-                       extra={"metric": "gunicorn.workers",
-                              "value": len(workers),
-                              "mtype": "gauge"})

     def spawn_worker(self):
         self.worker_age += 1
@@ -563,8 +562,20 @@ class Arbiter(object):
                 try:
                     worker = self.WORKERS.pop(pid)
                     worker.tmp.close()
+                    self._log_numworkers()
                     self.cfg.worker_exit(self, worker)
                     return
                 except (KeyError, OSError):
                     return
             raise
+
+    def _log_numworkers(self):
+        nworker_count = len(workers)
+        if self.worker_count != nworker_count:
+            self.log.debug("{0} workers".format(nworker_count),
+                    extra={"metric": "gunicorn.workers",
+                        "value": len(workers),
+                        "mtype": "gauge"})
+            self.worker_count = nworker_count
+
Show outdated Hide outdated gunicorn/arbiter.py
@@ -51,6 +51,8 @@ class Arbiter(object):
if name[:3] == "SIG" and name[3] != "_"
)
last_logged_worker_count = None

This comment has been minimized.

@benoitc

benoitc Jul 23, 2015

Owner

shouldn't it be probably an instance variable?

@benoitc

benoitc Jul 23, 2015

Owner

shouldn't it be probably an instance variable?

This comment has been minimized.

@erydo

erydo Jul 23, 2015

Contributor

The instance variable is created once it's assigned to for the first time. I'd be fine moving it into __init__.

@erydo

erydo Jul 23, 2015

Contributor

The instance variable is created once it's assigned to for the first time. I'd be fine moving it into __init__.

@erydo

This comment has been minimized.

Show comment
Hide comment
@erydo

erydo Jul 23, 2015

Contributor

@benoitc — I would make the same argument as I made at the beginning of this PR. The patch I've proposed here has no edge cases and is resilient to refactorings instead of trying to find and maintain every place that modifies self.WORKERS.

Additionally, the patch in your comment would log every change in manage_workers which could be quite noisy if it's spawning/killing multiple workers (e.g. on startup you'd get a log message with an incrementing number for each worker). My motivation for this PR in the first place was to reduce the unnecessary noise.

Contributor

erydo commented Jul 23, 2015

@benoitc — I would make the same argument as I made at the beginning of this PR. The patch I've proposed here has no edge cases and is resilient to refactorings instead of trying to find and maintain every place that modifies self.WORKERS.

Additionally, the patch in your comment would log every change in manage_workers which could be quite noisy if it's spawning/killing multiple workers (e.g. on startup you'd get a log message with an incrementing number for each worker). My motivation for this PR in the first place was to reduce the unnecessary noise.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jul 23, 2015

Owner

hrm the patch in manage_workers only log if new workers need to be spawned and after it. Did I miss smth?

The point to log there and in reap workers is to only log when the number did really change not before it changes on kill. I think it would be more accurate although we can optimise the reap workers case. This is at least the intention :)

Owner

benoitc commented Jul 23, 2015

hrm the patch in manage_workers only log if new workers need to be spawned and after it. Did I miss smth?

The point to log there and in reap workers is to only log when the number did really change not before it changes on kill. I think it would be more accurate although we can optimise the reap workers case. This is at least the intention :)

@erydo

This comment has been minimized.

Show comment
Hide comment
@erydo

erydo Jul 23, 2015

Contributor

Ah, you're right, I misread your patch! For some reason I misremembered spawning as being in a loop like killing is. I'd still stand by the maintenance argument, though: manage_workers gets called regularly. Logging the worker count there is the least complex, least error-prone implementation of logging the current active worker count.

If it's desirable to also log once workers finish dying, I'd suggest that that change should be made separately, and should likely be based off #1084, which consolidates that worker cleanup work.

Contributor

erydo commented Jul 23, 2015

Ah, you're right, I misread your patch! For some reason I misremembered spawning as being in a loop like killing is. I'd still stand by the maintenance argument, though: manage_workers gets called regularly. Logging the worker count there is the least complex, least error-prone implementation of logging the current active worker count.

If it's desirable to also log once workers finish dying, I'd suggest that that change should be made separately, and should likely be based off #1084, which consolidates that worker cleanup work.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jul 24, 2015

Owner

All of this depends of what is expected by using this metric: supervise the number of active workers or supervise the number of workers alives (running). Maybe indeed we should have 2 metrics indeed and I guess you're right and your patch is enough to tell the number of active workers and then logged as it.

I commented the patch for a last one change, then let's commit it:)

Owner

benoitc commented Jul 24, 2015

All of this depends of what is expected by using this metric: supervise the number of active workers or supervise the number of workers alives (running). Maybe indeed we should have 2 metrics indeed and I guess you're right and your patch is enough to tell the number of active workers and then logged as it.

I commented the patch for a last one change, then let's commit it:)

@@ -55,6 +55,8 @@ def __init__(self, app):
os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE
self._num_workers = None
self._last_logged_active_worker_count = None

This comment has been minimized.

@benoitc

benoitc Jul 24, 2015

Owner

last_logged is not needed. let's just call it self._active_worker_count .

@benoitc

benoitc Jul 24, 2015

Owner

last_logged is not needed. let's just call it self._active_worker_count .

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jul 29, 2015

Owner

OK if noone object I will merge this PR and do the renaming right after. OK?

Owner

benoitc commented Jul 29, 2015

OK if noone object I will merge this PR and do the renaming right after. OK?

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Jul 29, 2015

Collaborator

👍

On Wed, Jul 29, 2015, 06:08 Benoit Chesneau notifications@github.com
wrote:

OK if noone object I will merge this PR and do the renaming right after.
OK?


Reply to this email directly or view it on GitHub
#1078 (comment).

Collaborator

tilgovi commented Jul 29, 2015

👍

On Wed, Jul 29, 2015, 06:08 Benoit Chesneau notifications@github.com
wrote:

OK if noone object I will merge this PR and do the renaming right after.
OK?


Reply to this email directly or view it on GitHub
#1078 (comment).

benoitc added a commit that referenced this pull request Aug 19, 2015

Merge pull request #1078 from preo/reduce-arbiter-noise
Only log '[num] workers' message when it changes.

@benoitc benoitc merged commit a132ca9 into benoitc:master Aug 19, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

fofanov pushed a commit to fofanov/gunicorn that referenced this pull request Mar 16, 2018

Merge pull request #1078 from preo/reduce-arbiter-noise
Only log '[num] workers' message when it changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment