Permalink
Commits on Jul 1, 2012
  1. Handle all OSError when spawning a collector.

    Recently ran into `OSError: [Errno 26] Text file busy' while
    live-editing a collector, which made the main thread die.  :(
    
    Change-Id: Iec15a6d2c89977e947168bff749bcd00ddb9ef06
    Reviewed-on: http://review.stumble.net/14309
    Reviewed-by: Dave Barr <barr@stumbleupon.com>
    Reviewed-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    tsuna committed with tsuna Jul 1, 2012
Commits on Jun 22, 2012
  1. Merging ZFS collectors for tcollector

    Manuel Amador (Rudd-O) committed Jun 22, 2012
Commits on Apr 24, 2012
  1. Handle errors in tcollector related to failure to spawn collectors.

    Change-Id: Ia27a4b528eac99e5d4f2ddc35c503d470eeaee37
    Reviewed-on: https://review.stumble.net/11428
    Reviewed-by: Dave Barr <barr@stumbleupon.com>
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Reviewed-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Manuel Amador (Rudd-O) committed with tsuna Apr 24, 2012
Commits on Mar 30, 2012
  1. Better check logging flags. Increase default log size.

    Change-Id: Ibcb842a222e55065cecd429de7690088ceb8eb38
    tsuna committed Mar 30, 2012
  2. Limit the length of a line read from collectors.

    TSD won't accept any data point that doesn't fit in 1024 bytes anyway,
    so we may as well drop them early while in tcollector.
    
    Change-Id: Ie15b5bbc48ddada70ad5c42ea6374a929343bd87
    tsuna committed with tsuna Mar 27, 2012
Commits on Mar 29, 2012
  1. Add RotatingFileHandler handler options to tcollector.

    This adds three new flags --max-bytes, --backup-count, and --logfile.
    
    Change-Id: Iac187c21d67431ffdeb348321393e32a940c422b
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    Chad Rhyner committed with tsuna Mar 15, 2012
Commits on Mar 27, 2012
  1. Properly handle JMX TabularData.

    Change-Id: I3e27bbf6a4003f21fd78c40f086e3177b4293598
    tsuna committed Mar 27, 2012
  2. Don't double-print the value when dealing with arrays.

    Change-Id: I4356db19c80071364cdc461f653a1ef125448c8b
    tsuna committed Mar 27, 2012
Commits on Mar 9, 2012
  1. Close stdin if we don't need it.

    Change-Id: Idba0500ef4bf8d834d8705e80c0f671359eb8609
    tsuna committed Mar 9, 2012
  2. Rename a service introduced in HBase 0.92.1.

    Change-Id: I3e9c3967a600d98f62d5aca3ff4c4a72efe3c4d4
    tsuna committed Mar 9, 2012
  3. Catch some more invalid lines and report them instead of dying.

    Change-Id: I408c0e67525ba8c4d4becbabc68377914cf54d2d
    tsuna committed with tsuna Mar 9, 2012
Commits on Feb 8, 2012
  1. Fix jmx path

    code left in during testing
    
    Change-Id: I4f5f304649cf5c6f362145cfdda183d864146e01
    davebarr committed Feb 8, 2012
  2. Add hadoop datanode collector

    Change-Id: I37146c1c5882dbfa2fa94e44e54e05d6b2e8061a
    Reviewed-on: https://review.stumble.net/8182
    Reviewed-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Reviewed-by: Dave Barr <barr@stumbleupon.com>
    Tested-by: Dave Barr <barr@stumbleupon.com>
    davebarr committed with tsuna Feb 8, 2012
Commits on Oct 17, 2011
  1. Handle uncaught exceptions in the SenderThread.

    Allow up to 100 uncaught exceptions in a row for common kinds of
    exceptions that aren't too bad.  Other exceptions, or an excessive
    number of uncaught exceptions, will cause tcollector to shutdown.
    All uncaught exceptions in the SenderThread are now logged.
    
    Change-Id: Icfac4cf840c91243792ffcb4ddf1e5aa43ac8014
    tsuna committed Oct 17, 2011
Commits on Oct 13, 2011
  1. Add a Redis collector.

    This collector gathers data from local Redis servers.  This requires
    the Redis module for Python.  We use netstat to look for 'redis-server'
    processes running on the local machine, since many people run multiple
    Redis servers per box.
    
    It is also suggested you put a hint in your Redis configuration file to
    tell this collector a logical 'cluster' name.  This helps if you have
    several Redis instances on different hosts and you want to be able to
    aggregate the data.
    
    Change-Id: I24975d78aab39148ed92f7a641240c14d725c7d4
    zorkian committed with tsuna Oct 10, 2011
  2. Add a Riak collector.

    This collector is for the Riak distributed database.  It uses the stats
    JSON object to parse out data and create some timeseries.
    
    This expects /usr/lib/riak to exist (it does by default) and it uses the
    default ports.  This expects you to only be running one Riak instance on
    a machine.  This also only collects stats from the local machine -- you
    will need to run collectors on every machine you use as a Riak node.
    
    Change-Id: I9758e0f99baadf1fe737609932edbb74a1d6581c
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    zorkian committed with tsuna Oct 10, 2011
  3. Only print slave status if we have slaves in our setup.

    This closes #24.
    
    Change-Id: I269f578e81502e1b31f8c18c90b8a2c33ca1e955
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    Alex Newman committed with tsuna Oct 7, 2011
Commits on Sep 17, 2011
  1. MySQL collectors: ignore search directories that don't exist.

    Change-Id: Ife9efeac5c600a6298e14699b5d96e69f8d06531
    tsuna committed Sep 17, 2011
  2. Fix a couple variable names broken in the last change.

    Yay for unsafe languages that blow up at the last minute.
    
    Change-Id: Ib445fb01ac4137eca3ed55ee1d9c227bbe3c5333
    tsuna committed Sep 17, 2011
Commits on Sep 16, 2011
  1. IPv6 support: Use `getaddrinfo' to resolve the TSD's host.

    This way it works with IPv6 hosts and can work with DNS entries that
    have multiple A or AAAA records.
    
    Change-Id: I7f64ad730d98d732f8bef2f75576499e917a02bd
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    spark404 committed with tsuna Sep 16, 2011
Commits on Aug 21, 2011
  1. Collect more internal metrics from InnoDB.

    Run `SHOW ENGINE INNODB STATUS' and parse the output to extract some
    of the metrics.  Some InnoDB metrics are exposed in SHOW GLOBAL STATUS,
    but many are not.
    
    This adds 14 new metrics for InnoDB.  We can add more in the future.
    
    Change-Id: Ib1458bfa939f2314bcb5c9c88fd2e4ac0fb10b5c
    Reviewed-by: Tony Landells <tony@stumbleupon.com>
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Reviewed-by: Benoit Sigoure <tsuna@stumbleupon.com>
    tsuna committed with tsuna Aug 13, 2011
  2. Refresh the timestamp more frequently.

    This is in case a command takes a significant amount of time.
    
    Change-Id: I4758b220d7883f5815e0618dbcf474ba8c299c9b
    tsuna committed with tsuna Aug 13, 2011
  3. Detect when the InnoDB engine is used.

    Change-Id: Id43513cef6af2d4a6e3b53ba6c48dc6894bd44fe
    tsuna committed with tsuna Aug 12, 2011
  4. Fix the metric name used for InnoDB mutex locks.

    The metric name ought to be `mysql.innodb.locks'.
    
    Change-Id: I7297ab0822fc945d60fde5b8be849155b74557cd
    tsuna committed with tsuna Aug 12, 2011
Commits on Aug 17, 2011
  1. Handle the output of "SHOW PROCESSLIST" from MySQL 5.5.

    New columns have been added in 5.5, just ignore them.
    
    Change-Id: Ia71339f7fd7b212c771736dee00e4dd676107889
    tsuna committed with tsuna Aug 17, 2011
Commits on Aug 16, 2011
  1. Make sure there are no spaces in the `state' tag.

    Change-Id: I35a38605ccb5b5ac708a85f421d4c6a87a1d10f3
    tsuna committed with tsuna Aug 16, 2011
Commits on Aug 12, 2011
  1. Add a basic collector for ElasticSearch.

    The collector comes with 38 metrics about ElasticSearch server instances
    as well as 8 additional cluster-wide metrics collected from the master
    node.  Most metrics are system-level metrics, because right now ES
    doesn't have many serving statistics.  We don't collect per-index
    metrics at this time, because many indices are named dynamically and
    we would need a way of canonicalizing index names.
    
    Change-Id: I7619b29fc7cb83450e478c3601760762dbd83ba5
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Reviewed-by: Tony Landells <tony@stumbleupon.com>
    tsuna committed with tsuna Aug 11, 2011
  2. Add a collector for MySQL.

    The collector includes about 300 metrics about MySQL (when InnoDB
    is used).  Most metrics are collected through `SHOW GLOBAL STATUS'.
    
    The collector has a configuration file, `mysqlconf.py', in which
    the user / password to use to connect to MySQL must be specified.
    
    The collector has limited support for MySQL 5.0, because in that
    version of MySQL running the command `SHOW GLOBAL STATUS' has a
    big impact on the performance of the database.  Hopefully almost
    everyone uses at least MySQL 5.1 these days.
    
    Change-Id: If6042427f1701da3f5954c166a087c7996a86a1f
    tsuna committed Aug 12, 2011
Commits on Jul 14, 2011
  1. Add the ability to pass additional tags with `startstop'.

    It may be necessary to pass additional tags when running tcollector.
    In our case we are monitoring host level OpenStack systems, and
    want to roll up into availability zones and hypervisor type.
    
    ./startstop start -t az=paloalto0 hv=kvm
    
    Change-Id: Id48e4442dd984cd670d4677bbd5f393368501621
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    retr0h committed with tsuna Jul 14, 2011
Commits on Jun 21, 2011
  1. Fix issue #2 using ALIVE flag for graceful termination of threads.

    Change-Id: I12e589820daa6809b122322c87534c2f1d7d5212
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    Nikolay Botev committed with tsuna May 18, 2011
Commits on Jun 17, 2011
  1. Use '/' in tag values for mount points.

    Implements OpenTSDB feature request #14.
    
    Change-Id: I566e45f7af3f38364c313b71d65447b74012e1bc
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    Jari Takkala committed with tsuna Mar 31, 2011
  2. Allow `/' in metric names, tag keys and values.

    Change-Id: Icc995a58c3e1275fb8985c67936c607f2c13247c
    Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
    Jari Takkala committed with tsuna Apr 4, 2011
  3. Evict old keys from the de-dup cache.

    For every combination of (metric, tags), collectors remember what was
    the last value they saw so that they can remove duplicate values (or,
    in the future, perform RLE encoding).  If there are loads of different
    combinations of (metric, tags) changing over time, this can lead to
    excessive memory consumption because old values are never evicted from
    this "de-dup cache".
    
    This change adds a flag, --evict-interval (set to 6000 seconds = 1h40m
    by default), to put an upper bound on how long a collector will remember
    the last value seen for a specific combination of (metric, tags).
    
    This change is based on a contribution of Kai Ren <kair at cs.cmu.edu>.
    
    Change-Id: I5b5b0b64e05b2f1f81b213e14117e029bfd25ba7
    tsuna committed Jun 16, 2011
  4. Simplify a call to pgrep.

    Change-Id: I949bf012e9864c0ded44d7fa06e8f4e22d6c892e
    Reviewed-on: https://review.stumble.net/2242
    Reviewed-by: Dave Barr <barr@stumbleupon.com>
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    tsuna committed with tsuna Jun 15, 2011
  5. Fix restart in case pidfile is stale

    If the pidfile is stale, tcollector won't get restarted
    if the pid got reused.
    
    Don't do simplistic checks of the pid running but just
    always use the pgrep check.
    
    Removes blindly using the pidfile in stop() too.
    
    Should just get rid of using the pidfile, but currently
    it's required as part of tcollector.py.  The remaining
    use case is the forcerestart logic, but we could get
    the same functionality from looking at start time from
    /proc/PID
    
    Change-Id: I24b3fa0eb10332cac51e4d28a06ae030147b2c28
    Reviewed-on: https://review.stumble.net/2236
    Reviewed-by: Benoit Sigoure <tsuna@stumbleupon.com>
    Tested-by: Benoit Sigoure <tsuna@stumbleupon.com>
    davebarr committed with tsuna Jun 15, 2011