StatsD

A network daemon for aggregating statistics (counters and timers), rolling them up, then sending them to graphite.

We (Etsy) blogged about how it works and why we created it.

Concepts

buckets Each stat is in its own "bucket". They are not predefined anywhere. Buckets can be named anything that will translate to Graphite (periods make folders, etc)
values Each stat will have a value. How it is interpreted depends on modifiers
flush After the flush interval timeout (default 10 seconds), stats are munged and sent over to Graphite.

Counting

gorets:1|c

This is a simple counter. Add 1 to the "gorets" bucket. It stays in memory until the flush interval config.flushInterval.

Timing

glork:320|ms

The glork took 320ms to complete this time. StatsD figures out 90th percentile, average (mean), lower and upper bounds for the flush interval. The percentile threshold can be tweaked with config.percentThreshold.

The percentile threshold can be a single value, or a list of values, and will generate the following list of stats for each threshold:

stats.timers.$KEY.mean_$PCT
stats.timers.$KEY.upper_$PCT

Where $KEY is the key you stats key you specify when sending to statsd, and $PCT is the percentile threshold.

Sampling

gorets:1|c|@0.1

Tells StatsD that this counter is being sent sampled every 1/10th of the time.

Gauges

StatsD now also supports gauges, arbitrary values, which can be recorded.

gaugor:333|g

Debugging

There are additional config variables available for debugging:

debug - log exceptions and periodically print out information on counters and timers
debugInterval - interval for printing out information on counters and timers
dumpMessages - print debug info on incoming messages

For more information, check the exampleConfig.js.

Guts

UDP Client libraries use UDP to send information to the StatsD daemon.
NodeJS
Graphite

Graphite uses "schemas" to define the different round robin datasets it houses (analogous to RRAs in rrdtool). Here's what Etsy is using for the stats databases:

[stats]
priority = 110
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974

That translates to:

6 hours of 10 second data (what we consider "near-realtime")
1 week of 1 minute data
5 years of 10 minute data

This has been a good tradeoff so far between size-of-file (round robin databases are fixed size) and data we care about. Each "stats" database is about 3.2 megs with these retentions.

TCP Stats Interface

A really simple TCP management interface is available by default on port 8126 or overriden in the configuration file. Inspired by the memcache stats approach this can be used to monitor a live statsd server. You can interact with the management server by telnetting to port 8126, the following commands are available:

stats - some stats about the running server
counters - a dump of all the current counters
timers - a dump of the current timers

The stats output currently will give you:

uptime: the number of seconds elapsed since statsd started
graphite.last_flush: the number of seconds elapsed since the last successful flush to graphite
graphite.last_exception: the number of seconds elapsed since the last exception thrown whilst flushing to graphite
messages.last_msg_seen: the number of elapsed seconds since statsd received a message
messages.bad_lines_seen: the number of bad lines seen since startup

A simple nagios check can be found in the utils/ directory that can be used to check metric thresholds, for example the number of seconds since the last successful flush to graphite.

Installation and Configuration

Install node.js
Clone the project
Create a config file from exampleConfig.js and put it somewhere
Start the Daemon:

node stats.js /path/to/config

Tests

A test framework has been added using node-unit and some custom code to start and manipulate statsd. Please add tests under test/ for any new features or bug fixes encountered. Testing a live server can be tricky, attempts were made to eliminate race conditions but it may be possible to encounter a stuck state. If doing dev work, a killall node will kill any stray test servers in the background (don't do this on a production machine!).

Tests can be executd with ./run_tests.sh.

Inspiration

StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a post where Cal Henderson described it in depth: Counting and timing. Cal re-released the code recently: Perl StatsD

Contribute

You're interested in contributing to StatsD? AWESOME. Here are the basic steps:

fork StatsD from here: http://github.com/etsy/statsd

Clone your fork
Hack away
If you are adding new functionality, document it in the README
If necessary, rebase your commits into logical chunks, without errors
Push the branch up to GitHub
Send a pull request to the etsy/statsd project.

We'll do our best to get your changes in!

Contributors

In lieu of a list of contributors, check out the commit history for the project: http://github.com/etsy/statsd/commits/master

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
debian		debian
examples		examples
test		test
utils		utils
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
config.js		config.js
exampleConfig.js		exampleConfig.js
gateguru_config.js		gateguru_config.js
package.json		package.json
run_tests.sh		run_tests.sh
stats.js		stats.js

Navigation Menu

License

GateGuru/statsd

Folders and files

Latest commit

History

Repository files navigation