New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunicorn instrumentation via statsD and a new logger class #748

Merged
merged 23 commits into from Jun 28, 2014

Conversation

Projects
None yet
8 participants
@alq666
Contributor

alq666 commented May 14, 2014

(This is another attempt at getting statsD instrumentation to gunicorn initially in #672)

What is this pull request?

statsd is the de facto low-overhead instrumentation of many web stacks. I have come across a number of different ways to instrument gunicorn, none of which are part of the core and rely on external scripts being run out-of-band to guess the number of requests processed or the amount of cpu spent.

I am hereby proposing that gunicorn optionally submits metrics to a local (or remote) statsD instance with the use of --logger-class gunicorn.instrument.statsd.Statsd option. Being that statsD uses UDP as transport, the receiving end being up or down does not affect the performance of the gunicorn application (unlike other stat collection schemes).

Here is an example of such metric being graphed.

metric_explorer___datadog-2

How does this work?

The arbiter and the base worker use a new logger that can take additional arguments to turn a log message into a metric. In addition certain known logging levels are automatically turned into metrics.

In particular the access method is overriden to capture duration and count of each gunicorn request.

The new logger class inherits from gunicorn.glogging.Logger so existing logging should be preserved. No external dependencies were created.

Who is behind this?

I use gunicorn at Datadog and have found statsD an indispensable aid to measure the performance of high-traffic applications. There is no company-specific bits in the pull request of course.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi May 14, 2014

Collaborator

This looks like a pretty beautiful solution. I had forgotten how easy it is to send data to statsd.

I wish we had this at my last company!

Collaborator

tilgovi commented May 14, 2014

This looks like a pretty beautiful solution. I had forgotten how easy it is to send data to statsd.

I wish we had this at my last company!

@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 May 14, 2014

Contributor

Thanks @tilgovi

Contributor

alq666 commented May 14, 2014

Thanks @tilgovi

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi May 18, 2014

Collaborator

A few questions:

  • Can you imagine any value in having the extra keyword arguments available for log formatting?
  • If yes, should we use the extras keyword argument that log calls take anyway?
  • Should the worker count log in the arbiter be debug level? It's expected not to change except through administrator action and manage_workers is called every timeout cycle. That seems like unnecessary log noise (though I understand info level is pretty high, so maybe it's okay).
  • Should the logger check the log level or should we assume that anyone using statsD wants the metrics collected no matter the log level?
  • What do you think about logging resp.status (the status code)?
Collaborator

tilgovi commented May 18, 2014

A few questions:

  • Can you imagine any value in having the extra keyword arguments available for log formatting?
  • If yes, should we use the extras keyword argument that log calls take anyway?
  • Should the worker count log in the arbiter be debug level? It's expected not to change except through administrator action and manage_workers is called every timeout cycle. That seems like unnecessary log noise (though I understand info level is pretty high, so maybe it's okay).
  • Should the logger check the log level or should we assume that anyone using statsD wants the metrics collected no matter the log level?
  • What do you think about logging resp.status (the status code)?
@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 May 19, 2014

Contributor

@tilgovi

  • I did not know about extras, I can indeed stuff my keywords into extras to save a call to a function call for each log entry.
  • With respect to logging level and instrumentation you are correct in that there is a coupling between using INFO or greater to get some instrumentation going. My thinking behind this was two-fold:
    1. to provide instrumentation that makes sense out of the box, without having to set up anything special (save command-line arguments). As such DEBUG statements don't send anything to statsD and ERROR/CRITICAL/WARNING only send limited metrics.
    2. to equate instrumentation with informational logging: instrumentation is most useful for production, where INFO is likely to be enabled (but not DEBUG).
  • resp.status is certainly worth capturing as a rate; I'll add that.
Contributor

alq666 commented May 19, 2014

@tilgovi

  • I did not know about extras, I can indeed stuff my keywords into extras to save a call to a function call for each log entry.
  • With respect to logging level and instrumentation you are correct in that there is a coupling between using INFO or greater to get some instrumentation going. My thinking behind this was two-fold:
    1. to provide instrumentation that makes sense out of the box, without having to set up anything special (save command-line arguments). As such DEBUG statements don't send anything to statsD and ERROR/CRITICAL/WARNING only send limited metrics.
    2. to equate instrumentation with informational logging: instrumentation is most useful for production, where INFO is likely to be enabled (but not DEBUG).
  • resp.status is certainly worth capturing as a rate; I'll add that.
@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 May 31, 2014

Contributor

@tilgovi I've made the changes we discussed (using extra and capturing resp.status as a rate).

I have also added a test to verify that even if statsD instrumentation fails, the logger continues to work.

Contributor

alq666 commented May 31, 2014

@tilgovi I've made the changes we discussed (using extra and capturing resp.status as a rate).

I have also added a test to verify that even if statsD instrumentation fails, the logger continues to work.

@tilgovi

View changes

Show outdated Hide outdated tests/test_009-statsd.py
@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi May 31, 2014

Collaborator

@benoitc this looks alright to me. Thoughts?

Collaborator

tilgovi commented May 31, 2014

@benoitc this looks alright to me. Thoughts?

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Jun 10, 2014

Collaborator

@benoitc any reason not to include this in the next release"

Collaborator

tilgovi commented Jun 10, 2014

@benoitc any reason not to include this in the next release"

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jun 12, 2014

Owner

The patch looks fine for me but I would change the way it is enabled before to integrate it. Instead of passing the class to the the logger class option and add an option to set the host I would automatically set the statd logger if the --logs-statsd-to option is used. Also since it would be a supported backend we should be able to only use a short name with the --logging-class option. While we would be here I would rename this option as --logging-backend. Also naming --logs-statsd-to in --statsd-host sounds better for me.

If it's done until tomorrow we can include it in R19.0. If not it will wait the next one (planned this month or in july).

Thoughts?

Owner

benoitc commented Jun 12, 2014

The patch looks fine for me but I would change the way it is enabled before to integrate it. Instead of passing the class to the the logger class option and add an option to set the host I would automatically set the statd logger if the --logs-statsd-to option is used. Also since it would be a supported backend we should be able to only use a short name with the --logging-class option. While we would be here I would rename this option as --logging-backend. Also naming --logs-statsd-to in --statsd-host sounds better for me.

If it's done until tomorrow we can include it in R19.0. If not it will wait the next one (planned this month or in july).

Thoughts?

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jun 12, 2014

Owner

let's not wait much more for the 19,0, This feature will be added in 19.1 this month. We should put it in master ASAP.

Owner

benoitc commented Jun 12, 2014

let's not wait much more for the 19,0, This feature will be added in 19.1 this month. We should put it in master ASAP.

@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 Jun 12, 2014

Contributor

@benoitc I can implemented the suggested changes over the next week or so.

Contributor

alq666 commented Jun 12, 2014

@benoitc I can implemented the suggested changes over the next week or so.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jun 12, 2014

Owner

@alq666 cool! will merge once everything is OK :)

Owner

benoitc commented Jun 12, 2014

@alq666 cool! will merge once everything is OK :)

alq666 added some commits Jun 19, 2014

Simpler statsd integration
No need to specify a logger class.
--statsd-to is enough to trigger the Statsd logger automatically
@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 Jun 19, 2014

Contributor

@benoitc @tilgovi I changed the option to --statsd-host. That in itself is enough to trigger the statsD logger automatically so there is no need to use --logger-class to use it. I left the --logger-class option as is to keep this simpler.

Contributor

alq666 commented Jun 19, 2014

@benoitc @tilgovi I changed the option to --statsd-host. That in itself is enough to trigger the statsD logger automatically so there is no need to use --logger-class to use it. I left the --logger-class option as is to keep this simpler.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Jun 19, 2014

Collaborator

Looking great.

The readme example needs to be updated to reflect the extras keyword use, yeah?

Collaborator

tilgovi commented Jun 19, 2014

Looking great.

The readme example needs to be updated to reflect the extras keyword use, yeah?

@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 Jun 22, 2014

Contributor

@tilgovi should be good now.

Contributor

alq666 commented Jun 22, 2014

@tilgovi should be good now.

@matrixise

This comment has been minimized.

Show comment
Hide comment
@matrixise

matrixise Jun 22, 2014

Collaborator

+1. Can we merge this branch?

Collaborator

matrixise commented Jun 22, 2014

+1. Can we merge this branch?

@@ -482,6 +482,11 @@ def manage_workers(self):
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)
self.log.info("{0} workers".format(len(workers)),

This comment has been minimized.

@benoitc

benoitc Jun 22, 2014

Owner

i can't test it right now but does this change works with the default logger?

@benoitc

benoitc Jun 22, 2014

Owner

i can't test it right now but does this change works with the default logger?

This comment has been minimized.

@alq666

alq666 Jun 22, 2014

Contributor

It does work with default logger indeed:

Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Starting gunicorn 19.1.0
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Listening at: http://127.0.0.1:8000 (17090)
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Using worker: sync
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17095] Booting worker with pid: 17095
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17096] Booting worker with pid: 17096
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17097] Booting worker with pid: 17097
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17098] Booting worker with pid: 17098
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17090] 4 workers

using gunicorn --log-syslog --log-level=info --workers=4 test:app as a test.

@alq666

alq666 Jun 22, 2014

Contributor

It does work with default logger indeed:

Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Starting gunicorn 19.1.0
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Listening at: http://127.0.0.1:8000 (17090)
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Using worker: sync
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17095] Booting worker with pid: 17095
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17096] Booting worker with pid: 17096
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17097] Booting worker with pid: 17097
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17098] Booting worker with pid: 17098
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17090] 4 workers

using gunicorn --log-syslog --log-level=info --workers=4 test:app as a test.

benoitc added a commit that referenced this pull request Jun 28, 2014

Merge pull request #748 from alq666/statsd-logger
gunicorn instrumentation via statsD and a new logger class

@benoitc benoitc merged commit ceda9c5 into benoitc:master Jun 28, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jun 28, 2014

Owner

merged thanks!

Owner

benoitc commented Jun 28, 2014

merged thanks!

@srgshames

This comment has been minimized.

Show comment
Hide comment
@srgshames

srgshames Sep 8, 2014

Really cool feature!

I was wondering if there's a way to differentiate between multiple applications? Is there a way to assign a name/prefix to an instance of a gunicorn app to be reflected in the metric name?

srgshames commented Sep 8, 2014

Really cool feature!

I was wondering if there's a way to differentiate between multiple applications? Is there a way to assign a name/prefix to an instance of a gunicorn app to be reflected in the metric name?

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Sep 9, 2014

Collaborator

@srgshames good thought.
@alq666 do you think we should consider prepending the app name to the metrics? Or appending? It's easily available as self.cfg.proc_name from the logger.

Collaborator

tilgovi commented Sep 9, 2014

@srgshames good thought.
@alq666 do you think we should consider prepending the app name to the metrics? Or appending? It's easily available as self.cfg.proc_name from the logger.

@alq666

This comment has been minimized.

Show comment
Hide comment
@alq666

alq666 Sep 9, 2014

Contributor

I can certainly stick it in the middle in the normal case, just after
gunicorn but before the actual metric name.

In the case the statsd's implementation supports explicit tags (like
Datadog's), I will add the app name as a tag.
On Sep 9, 2014 2:52 AM, "Randall Leeds" notifications@github.com wrote:

@srgshames https://github.com/srgshames good thought.
@alq666 https://github.com/alq666 do you think we should consider
prepending the app name to the metrics? Or appending? It's easily available
as self.cfg.proc_name from the logger.


Reply to this email directly or view it on GitHub
#748 (comment).

Contributor

alq666 commented Sep 9, 2014

I can certainly stick it in the middle in the normal case, just after
gunicorn but before the actual metric name.

In the case the statsd's implementation supports explicit tags (like
Datadog's), I will add the app name as a tag.
On Sep 9, 2014 2:52 AM, "Randall Leeds" notifications@github.com wrote:

@srgshames https://github.com/srgshames good thought.
@alq666 https://github.com/alq666 do you think we should consider
prepending the app name to the metrics? Or appending? It's easily available
as self.cfg.proc_name from the logger.


Reply to this email directly or view it on GitHub
#748 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment