Skip to content

Manual Pages

Jacek Masiulaniec edited this page Jun 23, 2015 · 40 revisions
  1. collect-netscaler(1)
  2. collect-statse(1)
  3. statse(5)
  4. tsp(5)
  5. tsp-aggregator(8)
  6. tsp-controller(8)
  7. tsp-forwarder(8)
  8. tsp-poller(8)

This document contains snapshots of all man pages released with TSP. Look up unfamiliar terms in the glossary.

Many programs take an -f command-line flag that points to a json configuration file. Their man pages dedicate long sections to documenting the details of the configuration structure.

The good news is that most users can avoid learning it. Local json config file is almost never necessary. TSP makes it easy to avoid the need for hand-crafting and deploying static json files throughout the data center by instead having all programs request it from the configuration service, tsp-controller. The controller centrally generates these json configs based on its own easy-to-use configuration file. This central file is operator's uniform view into configuration of the many remote processes that make up TSP.

To summarize, json can be thought of as wire format of configuration data, a low-level detail. Humans only ever edit controller's config, which is currently xml-based (although it may get replaced with dedicated syntax eventually).

4.1. collect-netscaler(1)

NAME
       collect-netscaler - collect metrics from NetScaler load balancer

SYNOPSIS
       collect-netscaler [-uv] [-a path] [-i interval] host

DESCRIPTION
       collect-netscaler  polls  the  specified  load balancer host for metric
       data, translates it to OpenTSDB format, and writes the resulting  lines
       to standard output.

       Metrics  are  obtained  from  the  Nitro API, which is a HTTP interface
       exposed by NetScaler load balancers from Citrix Systems.   The  metrics
       are rooted in the netscaler branch of the metrics hierarchy.

       -a path
              Set  path  to  a file containing the authentication secret.  The
              file contains single JSON object with string  properties:  User-
              name, Password. The default path is /dev/stdin.

       -i interval
              Set polling interval. Default: 7s.

       -u
              Disable transport security. Use HTTPS but skip certificate veri-
              fication.

       -v
              Print debugging information.

EXAMPLE
       Poll netscaler01.example.com using the given auth credentials:

       #!/bin/bash

       exec collect-netscaler netscaler01.example.com << 'EOF'
       {
               "Username": "opentsp",
               "Password": "13donkeys"
       }
       EOF

SEE ALSO
       tsp-poller(8)

BUGS
       At startup, collect-netscaler begins a learning process aimed  at  dis-
       covering  the  available  VServer-to-service bindings. This process may
       take a few minutes to complete because of performance bugs in the Nitro
       API. While it is in progress, some metrics are unavailable.

       Nitro  API suffers from regular crashes. It recovers automatically, but
       while it is restarting, no data is being collected. These data gaps can
       be correlated to i/o timeout errors logged on standard error.

       Most NetScaler metrics are updated on a ~7-second schedule on the Nitro
       API side. It is possible for collect-netscaler to  make  a  query  that
       results in a response payload that is an exact duplicate of the preced-
       ing response's payload. Counters carried in such stale  response  cause
       visualization  artifacts: when they are used to plot rate-per-second of
       a counter, they cause the plotted line  to  incorrectly  drop  to  Y=0.
       This  makes  it difficult to distinguish true drops from the artificial
       ones.  Luckily, some metrics  are  available  in  two  variants:  total
       counter and rate-per-second gauge. For these metrics, collect-netscaler
       emits only the pre-computed rate-per-second gauge (which does not  lead
       to visualization artifacts), and ignores the total counter.

       Some  VServers are deliberately omitted. A VServer must receive traffic
       within the past 24 hours in order to be considered for collection. This
       restriction  protects OpenTSDB server from creating metrics for defunct
       VServers.

4.2. collect-statse(1)

NAME
       collect-statse - derive time series from a flow of Statse records

SYNOPSIS
       collect-statse [-tv] -f file


DESCRIPTION
       collect-statse  implements  a  Statse pipeline that adapts the incoming
       stream of Statse records into a time series format.  It  resolves  data
       format  mismatch  making  it  possible to store event-based metric data
       (e.g. from web logs) in a time series  database  like  OpenTSDB,  which
       normally would be incompatible with event-based data.

       -f file
              Set path to the configuration file.

       -t
              Test configuration file, signalling success via exit code.

       -v
              Print debugging information.

       Statse's  main  use  is  in  application performance monitoring. Statse
       pipeline summarises performance of entire application cluster by  peri-
       odically  applying an aggregation function (mainly 99th percentile, aka
       p99) on response times submitted by servers that make up  the  cluster.
       Such  global, cluster-scoped p99 is convenient to reason about, and the
       pipeline outputs it as  a  time  series  refreshed  every  few  seconds
       (Aggregator.SnapshotInterval).

       Statse  forwarder  listener  binds  to 127.0.0.1:14444 accepting Statse
       record submissions from local applications. Every record  is  evaluated
       against  user-defined  filter  (Forwarder.Filter),  and on success for-
       warded to remote aggregator listener (Forwarder.AggregatorHost).

       Statse aggregator listener binds to *:14445  accepting  Statse  records
       relayed  by  remote  forwarder  instances, and writing these records to
       temporary memory store. Periodically, it calculates p99  based  on  the
       stored  records,  and writes it to standard output in the OpenTSDB for-
       mat.

       Aggregator depends on forwarder to modify the records  to  include  the
       tag  host, which must be set to fully-qualified domain name of the for-
       warding server.

       Aggregator reserves the tag error  for  internal  use.  A  record  that
       includes this tag will be ignored.


CONFIGURATION FORMAT
       The configuration file holds a JSON object that configures the two main
       stages of Statse pipeline: forwarding and aggregation.

       The forwarding-stage settings are:

       Forwarder.AggregatorHost (string)
              Name of the host running collect-statse that is marked  to  per-
              form continuous statistical aggregation. Default: localhost

       Forwarder.Filter (array)
              Filter  for  the  forwarded  records. The filter array format is
              identical to Filter  from  tsp-forwarder(8),  except  the  rules
              operate  on  Statse  records  rather  than data points of a time
              series.

              The filter must set the host tag. It may also  set  the  cluster
              tag to enable computation of cluster-scoped p99.

       The aggregation-stage settings are:

       Aggregator.SnapshotInterval (string)
              The  interval  between  runs  of the aggregation function on the
              stored records.  Default: 10s.


EXAMPLE
       The below example showcases all settings, including  a  message  filter
       that  adds  two standard tags, adds a metric prefix, and blocks records
       that include a tag that corresponds to HTTP path, which would otherwise
       lead to OpenTSDB namespace blowup.

       {
            "Aggregator": {
                 "SnapshotInterval": "5s"
            },
            "Forwarder": {
                 "AggregatorHost": "foo1.example.com",
                 "Filter": [
                      {
                           "Match": [
                                "",
                                "path",
                                "^/"
                           ],
                           "Block": true
                      },
                      {
                           "Match": [
                                "(.*)"
                           ],
                           "Set": [
                                "foo.${1}",
                                "host",
                                "foo2.example.com",
                                "cluster",
                                "live.uswest"
                           ]
                      }
                 ]
            }
       }

SEE ALSO
       statse(5)

BUGS
       Independent  submitters  sharing a forwarder must arrange to use unique
       OpenTSDB namespaces. This is easily achieved by prepending unique iden-
       tifier  to  the  metric  name of every outgoing record, for example the
       application's name.  Failing that, the records will be merged.

4.3. statse(5)

NAME
       statse - a protocol for exporting event-based metric data


INTRODUCTION
       Statse  is  a  companion protocol to the TSDB protocol. It is a propri-
       etary protocol developed at Betfair to fulfill the  following  require-
       ments:

       1.  Introduce  cross-node  (cluster-wide)  calculation of the 99th per-
       centile of service operation times.

       2. Obsolete application-level event buffering,  reducing  memory  pres-
       sure.

       Statse  inherits  some  of  its structure and limitations from the TSDB
       protocol, for which it acts as a preprocessor. As an  example  of  such
       restriction, Statse can handle only a subset of Unicode.

       Statse  is  an event sampling protocol. Periodic sampling (e.g. regular
       collection of operation counters) is out of Statse's scope.  Use  tools
       like  StatsGatherer to collect these counters. However, the Statse lis-
       tener can derive some of these counters based on  the  incoming  Statse
       log.


EVENT FLOW
       Statse  introduces the event flow abstraction. Application programs use
       Statse to append entries to topical performance event logs. Log entries
       have  specified  structure,  see MESSAGE STRUCTURE. Logging is network-
       based: entries are serialised over ZeroMQ socket, see MESSAGE  SERIALI-
       SATION.

       As of this writing, the event flow is written to in-memory store, allo-
       cated by a possibly remote aggregator service. The aggregator  performs
       on-the-fly  statistical analysis, deriving time series as a result. The
       details of this process are out of scope for this document because they
       are not a concern of Statse sender.


MESSAGE STRUCTURE
       Each message has the following structure:

       type Message struct {
            Header struct {
                 Version   int8
                 Timestamp int64
            }
            Body struct {
                 Type string

                 // Log name
                 Metric string
                 Tags   string

                 // Metric values
                 Error  bool
                 Time   float32
                 TTFB   float32
                 Size   float32
            }
       }

       Each message has the following fields:

       Version
              Protocol version. Always 2.

       Timestamp
              Sample  time in the milliseconds-since-epoch-start format. Exam-
              ple: 1398787844000.

       Type
              Message type. Always ``EVENT''.

       The fields of an EVENT-typed message are:

       Metric
              Name of the metric (topic), analogous to the metric part of TSDB
              data point. Example: ``to.FooService''.

       Tags
              List  of  categorical dimensions of the metric, analogous to the
              tags part of TSDB data point. Example: ``op=GetBar''.

              Some tag names are reserved for Statse's use  and  must  not  be
              used by Statse sender. The only reserved tag name is error.

              The minimum tag count is 0. The maximum tag count is 5.

       Error
              Indicates the error status. Example: true.

              The  aggregator  service may omit erroneous events from calcula-
              tion of some statistics.

       Time
              End-to-end duration, conventionally in milliseconds. Example: 3.

       TTFB
              Time  to  first  byte,  conventionally in milliseconds. Example:
              2.9.

       Size
              Related space metric, e.g. HTTP response size, conventionally in
              bytes.  Example: 2951.

       The  fields Error, Time, TTFB, Size are optional. If the Error field is
       absent, it defaults to false.


MESSAGE SERIALISATION
       Statse sender is a ZeroMQ publisher that connects to a subscriber  lis-
       tening  at  tcp://127.0.0.1:14444.  Each Statse message is encoded in a
       multi-part ZeroMQ message consisting of exactly two  parts.  The  parts
       are ASCII strings serialised according to the following rules.

       Part  1 corresponds to Message.Header. It is serialised by applying the
       one-line template:

       "{{.Version}} {{.Timestamp}}"

       Part 2 corresponds to Message.Body. Its serialisation depends  on  Mes-
       sage.Body.Type.

       An EVENT-typed message is serialised by applying the one-line template:

       "EVENT|{{.Metric}}|{{.Tags}}|err={{.Error}} time={{.Time}}
       ttfb={{.TTFB}} size={{.Size}}"

       The  format  of  floating  point values is decimal without exponent nor
       sign, for example ``123.456''. To denote absence of value (Time,  TTFB,
       or Size), omit the relevant name=value.


EXAMPLES
       Allocating the socket:

       socket, err := zmq.DefaultContext().Socket(zmq.Pub)
       if err != nil {
            statClientErrors.Add("type=ZMQSocket", 1)
            return err
       }
       if err := socket.Connect("127.0.0.1:14444"); err != nil {
            statClientErrors.Add("type=ZMQConnect", 1)
            return err
       }


       Sending a message:

       var (
            header = []byte("2 1234567890000")
            body   = []byte("EVENT|Foo.to.Bar|op=getBar|err=false time=12")
       )
       if err := socket.Send(header, body); err != nil {
            statClientErrors.Add("type=ZMQSend", 1)
       }


SEE ALSO
       collect-statse(1)

4.5. tsp-aggregator(8)

NAME
       tsp-aggregator - aggregate time series


DESCRIPTION
       tsp-aggregator  implements  site feed, a time series feed that includes
       all the site's data points.  It receives individual host feeds from the
       collect-site plugin, and relays them to the subscriber relays.

       From  the  implementation point of view, tsp-aggregator behaves exactly
       as tsp-forwarder(8) with the exceptions listed below.

       The -f flag defaults to ``/etc/tsp-aggregator/config''.

       CollectPath defaults to ``/etc/tsp-aggregator/collect.d''.

       LogPath defaults to ``/var/log/tsp/aggregator.log''.


STREAM API
       To create a site feed subscriber, add a Relay the has all fields  unset
       except Host.

       The feed has the following qualities:

       Well-formed:  the  data  is  guaranteed  to  be  valid according to the
       opentsdb.net specification.

       Uncompressed: the data appears in plain text without drops due to dedu-
       plication.

       Canonical  format:  no  whitespace  beyond spaces (no tabs). All spaces
       squeezed (no repeats).

       Delivery guarantee: at-most-once.

       Delivery delay: the only deliberate algorithmic (by  design)  delay  is
       that due to Nagle's algorithm.

       Order  guarantee:  order  preserving;  data  points in each time series
       arrive in strictly monotonic time order (no duplicates).

       Reconnect strategy: on connection error, first reconnect is  immediate.
       Subsequent  reconnects  are  separated by pauses that increase exponen-
       tially starting at 1s, up to a limit of 600s.


SEE ALSO
       tsp-forwarder(8)

4.6. tsp-controller(8)

NAME
       tsp-controller - control the time series pipeline

SYNOPSIS
       tsp-controller [-t] [-f file] [-l addr]


DESCRIPTION
       tsp-controller  is  a service that receives control requests from pipe-
       line components, and handles them according to the  policy  defined  in
       file.

       -f file
              Set  path  to  the  configuration  file.  Default: /etc/tsp-con-
              troller/config

       -l addr
              Set listen addr. Default: :8084

       -t
              Test configuration file, signalling success via exit code.


CONFIGURATION FORMAT
       The configuration file holds an XML document that  configures  tsp-con-
       troller.  The global <config> element includes the elements:

       <filter path=file/>
              Path  to  a  program  that  generates  custom filtering ruleset.
              Default: /etc/tsp-controller/filter

              The extra rules are prepended to the filter array. If file  does
              not  exit,  no extra rules are added. However, tsp-controller(8)
              always appends the usual rules  setting  the  host  and  cluster
              tags.

              The  program must output a filter array as specified in tsp-for-
              warder(8) under the Filter section. The full program  invocation
              is:

                     file program host [cluster]

              program  is  the  name of requesting program, for example ``tsp-
              forwarder''.  host is the name of the requesting host.   cluster
              is the name of the enclosing cluster, and may be unset.

       <hostgroup id=id>
              Declare  a  host  group identified by id.  Host groups may nest.
              For example, one might define a hostgroup ``web'' that nests the
              groups ``web.api'' and ``web.cache''.

              An innermost hostgroup includes cluster elements:

              <cluster id=id>
                     Declare  a  cluster  identified by id.  A cluster is like
                     hostgroup, except (1) it is an innermost  group,  (2)  it
                     acts as a directive to the pipeline to assign value id to
                     the cluster tag of all points originated in the  cluster.

                     A cluster element includes host elements:

                     <host id=id>
                            Declare a host identified by id, which is a fully-
                            qualified domain name.

                            A host element may include the elements:

                            <statse aggregator=enabled interval=interval>
                                   If enabled is true, the host is  marked  as
                                   statse  aggregator  for  its  cluster.  All
                                   instances of collect-statse(8) in the clus-
                                   ter  will  forward records to this host for
                                   cluster-wide analysis.  At most one host in
                                   a  cluster  may be marked as aggregator. In
                                   case no host is marked, each host will ana-
                                   lyse  its  own data, i.e. cluster-wide sum-
                                   maries will be unavailable.

                                   The interval controls aggregation  snapshot
                                   interval. Default: 10s

       <network path=file/>
              Path  to a file declaring the traffic routing topology. Default:
              /etc/tsp-controller/network

              If file does not exist, tsp-controller(8)  assumes  it  runs  on
              OpenTSDB  server,  and  routes  all  time  series traffic to the
              server.

              The file may include the following elements:

              <aggregator host=host/>
                     Address of the host running tsp-aggregator(8)

              <poller host=host/>
                     Address of the host running tsp-poller(8)

              <restrict host=host/>
                     Limit controller's scope. Handle requests only for  hosts
                     matching  the  host  regex.  Multiple  elements are OR-ed
                     together. If no  element  is  specified,  controller  has
                     unlimited scope (handles all requests).

              <subscriber id=id host=host direct=direct dedup=dedup/>
                     Register  traffic  subscriber identified by id.  The host
                     setting is the address of subscriber host. If  direct  is
                     true,  the  subscriber will receive multiple connections,
                     one per tsp-forwarder(8) instance.  If  false,  the  sub-
                     scriber will receive single combined connection from tsp-
                     aggregator(8).  If dedup is true, the received feed  will
                     be deduplicated. By default, subscribers receive combined
                     connection without deduplication.


EXAMPLE
       The following pair of files defines a web site:

       <config>
            <hostgroup id="web">
                 <cluster id="web.live">
                      <host id="web001.example.com"/>
                      <host id="web002.example.com"/>
                 </cluster>
            </hostgroup>
       </config>

       <network>
            <subscriber
             id="tsd"
             host="tsd.example.com"
             direct="true"
             dedup="true"/>
       </network>

4.7. tsp-forwarder(8)

NAME
       tsp-forwarder - forward time series

SYNOPSIS
       tsp-forwarder [-tv] [-f file]


DESCRIPTION
       tsp-forwarder  is  an  agent that collects data points from plugin pro-
       grams, and writes them to a time series relay.

       tsp-forwarder rereads the configuration file when it receives a  hangup
       signal,  SIGHUP,  by  executing itself with the name and options it was
       started with.

       -f file
              Set path to the configuration file. Default: /etc/tsp/config

       -t
              Test configuration file, signalling success via exit code.

       -v
              Print debugging information.


CONFIGURATION FORMAT
       The configuration file holds a JSON  object  that  configures  tsp-for-
       warder.  The settings are:

       CollectPath (string)
              Path   to  directory  containing  collection  plugins.  Default:
              /etc/tsp/collect.d

              Collection plugin is an executable  program  that  creates  data
              points  by  writing OpenTSDB-formatted lines to standard output.
              tsp-forwarder monitors the directory  for  file  additions/dele-
              tions,  and  starts/kills processes in response. If crashed, the
              process is restarted; if exit code 13 was returned, the  restart
              is  delayed  by 1 hour. The process must exit after encountering
              an error writing to standard output.

       Filter (array)
              Ruleset evaluated for every data point prior to a  send  to  the
              relay.

              The rule settings are:

              Match (array)
                     Match is a list of conditions that must hold for the rule
                     actions to be executed. If a condition is false, the rule
                     is  ignored and the evaluation proceeds to the next rule.
                     If Match is unset, the rule actions are executed uncondi-
                     tionally.

                     The  head array element (index 0) is a regex matching the
                     dot-delimited metric name. The  remainder  of  the  array
                     (indicies  1+) is an optional tags match list in the form
                     of key, regex pairs. If a  tag  with  the  given  key  is
                     absent, the regex is passed an empty string.

                     At  most  one  regex  may  include  submatch  syntax (the
                     unescaped parentheses).

              Set (array)
                     An action that mutates the forwarded data  point.  Struc-
                     turally, Set is like Match except it (1) reassigns metric
                     name or tags, or assigns new tags, (2) expands regex sub-
                     matches  by substituting the ${N} symbol with N'th match.

                     Assigning an empty value is a no-op.

              Block (boolean)
                     An action that causes  the  data  point  to  be  ignored.
                     Default: false

              All regular expressions support the extended features, for exam-
              ple the + operation.

              The default filter is:

              [
                   {
                        "Block": true
                   }
              ]

       LogPath (string)
              Path to the log file. Default: /var/log/tsp/forwarder.log

       Relay (object)
              Relay definitions. The object key gives the  relay  an  internal
              name for use in log entries. The object value holds the settings
              object specified below.

              Each defined relay is a mirror: it is sent every data point.  If
              it  cannot keep up with the throughput, its points are queued in
              memory up to a limit (100K points). Once the  queue  fills,  new
              points are dropped until the throughput problem is resolved.

              The settings are:

              DropRepeats (bool)
                     Enable  deduplication,  a  space-conserving optimisation.
                     Drops data points that don't contribute  to  line  plots,
                     i.e. those contained by existing line segments.  Default:
                     false

              Host (string)
                     Server address in host:port format. If port is  not  pro-
                     vided,  it  defaults  to  4242.  The protocol used is the
                     OpenTSDB line-based telnet protocol. For load  balancing,
                     multiple  servers  may be defined using comma to separate
                     server addresses. The traffic will be sent to all  listed
                     servers,  partitioned using a hash of time series identi-
                     fier.

              MaxConnsPerHost (string)
                     Limit number of client connections  established  to  each
                     host address in the Host list.  Default: 1

              OnQueueFull (string)
                     Error  handling  for  the  queue  full condition. One of:
                     Drop, DropAndLog.  Drop causes irrecoverable  data  loss.
                     DropAndLog  is  like  Drop  except it will attempt to log
                     lost points to LogPath.  Default: Drop


EXAMPLE
       Forward data points to a single relay:

       {
            "Filter": [
                 {
                      "Match": [
                           "",
                           "host",
                           "^$"
                      ],
                      "Set": [
                           "",
                           "host",
                           "server101.example.com"
                      ]
                 }
            ],
            "Relay": {
                 "tsd": {
                      "Host": "tsd.example.com",
                      "DropRepeats": true
                 }
            }
       }

4.8. tsp-poller(8)

NAME
       tsp-poller - extract time series from remote hosts


DESCRIPTION
       tsp-poller  is a service that collects data points from plugin programs
       that produce time series based on remote data: SNMP,  non-SNMP  propri-
       atary  APIs,  Internet  services,  and  so  on. It complements tsp-for-
       warder(8), which uses only local data.

       From the implementation point of view, tsp-poller  behaves  exactly  as
       tsp-forwarder(8) with the exceptions listed below.

       The -f flag defaults to ``/etc/tsp-poller/config''.

       CollectPath defaults to ``/etc/tsp-poller/collect.d''.

       LogPath defaults to ``/var/log/tsp/poller.log''.


SEE ALSO
       tsp-forwarder(8)

Return to Documentation.

Clone this wiki locally