-
Notifications
You must be signed in to change notification settings - Fork 5
Manual Pages
- collect-netscaler(1)
- collect-statse(1)
- statse(5)
- tsp(5)
- tsp-aggregator(8)
- tsp-controller(8)
- tsp-forwarder(8)
- tsp-poller(8)
This document contains snapshots of all man pages released with TSP. Look up unfamiliar terms in the glossary.
Many programs take an -f command-line flag that points to a json configuration file. Their man pages dedicate long sections to documenting the details of the configuration structure.
The good news is that most users can avoid learning it. Local json config file is almost never necessary. TSP makes it easy to avoid the need for hand-crafting and deploying static json files throughout the data center by instead having all programs request it from the configuration service, tsp-controller. The controller centrally generates these json configs based on its own easy-to-use configuration file. This central file is operator's uniform view into configuration of the many remote processes that make up TSP.
To summarize, json can be thought of as wire format of configuration data, a low-level detail. Humans only ever edit controller's config, which is currently xml-based (although it may get replaced with dedicated syntax eventually).
NAME
collect-netscaler - collect metrics from NetScaler load balancer
SYNOPSIS
collect-netscaler [-uv] [-a path] [-i interval] host
DESCRIPTION
collect-netscaler polls the specified load balancer host for metric
data, translates it to OpenTSDB format, and writes the resulting lines
to standard output.
Metrics are obtained from the Nitro API, which is a HTTP interface
exposed by NetScaler load balancers from Citrix Systems. The metrics
are rooted in the netscaler branch of the metrics hierarchy.
-a path
Set path to a file containing the authentication secret. The
file contains single JSON object with string properties: User-
name, Password. The default path is /dev/stdin.
-i interval
Set polling interval. Default: 7s.
-u
Disable transport security. Use HTTPS but skip certificate veri-
fication.
-v
Print debugging information.
EXAMPLE
Poll netscaler01.example.com using the given auth credentials:
#!/bin/bash
exec collect-netscaler netscaler01.example.com << 'EOF'
{
"Username": "opentsp",
"Password": "13donkeys"
}
EOF
SEE ALSO
tsp-poller(8)
BUGS
At startup, collect-netscaler begins a learning process aimed at dis-
covering the available VServer-to-service bindings. This process may
take a few minutes to complete because of performance bugs in the Nitro
API. While it is in progress, some metrics are unavailable.
Nitro API suffers from regular crashes. It recovers automatically, but
while it is restarting, no data is being collected. These data gaps can
be correlated to i/o timeout errors logged on standard error.
Most NetScaler metrics are updated on a ~7-second schedule on the Nitro
API side. It is possible for collect-netscaler to make a query that
results in a response payload that is an exact duplicate of the preced-
ing response's payload. Counters carried in such stale response cause
visualization artifacts: when they are used to plot rate-per-second of
a counter, they cause the plotted line to incorrectly drop to Y=0.
This makes it difficult to distinguish true drops from the artificial
ones. Luckily, some metrics are available in two variants: total
counter and rate-per-second gauge. For these metrics, collect-netscaler
emits only the pre-computed rate-per-second gauge (which does not lead
to visualization artifacts), and ignores the total counter.
Some VServers are deliberately omitted. A VServer must receive traffic
within the past 24 hours in order to be considered for collection. This
restriction protects OpenTSDB server from creating metrics for defunct
VServers.
NAME
collect-statse - derive time series from a flow of Statse records
SYNOPSIS
collect-statse [-tv] -f file
DESCRIPTION
collect-statse implements a Statse pipeline that adapts the incoming
stream of Statse records into a time series format. It resolves data
format mismatch making it possible to store event-based metric data
(e.g. from web logs) in a time series database like OpenTSDB, which
normally would be incompatible with event-based data.
-f file
Set path to the configuration file.
-t
Test configuration file, signalling success via exit code.
-v
Print debugging information.
Statse's main use is in application performance monitoring. Statse
pipeline summarises performance of entire application cluster by peri-
odically applying an aggregation function (mainly 99th percentile, aka
p99) on response times submitted by servers that make up the cluster.
Such global, cluster-scoped p99 is convenient to reason about, and the
pipeline outputs it as a time series refreshed every few seconds
(Aggregator.SnapshotInterval).
Statse forwarder listener binds to 127.0.0.1:14444 accepting Statse
record submissions from local applications. Every record is evaluated
against user-defined filter (Forwarder.Filter), and on success for-
warded to remote aggregator listener (Forwarder.AggregatorHost).
Statse aggregator listener binds to *:14445 accepting Statse records
relayed by remote forwarder instances, and writing these records to
temporary memory store. Periodically, it calculates p99 based on the
stored records, and writes it to standard output in the OpenTSDB for-
mat.
Aggregator depends on forwarder to modify the records to include the
tag host, which must be set to fully-qualified domain name of the for-
warding server.
Aggregator reserves the tag error for internal use. A record that
includes this tag will be ignored.
CONFIGURATION FORMAT
The configuration file holds a JSON object that configures the two main
stages of Statse pipeline: forwarding and aggregation.
The forwarding-stage settings are:
Forwarder.AggregatorHost (string)
Name of the host running collect-statse that is marked to per-
form continuous statistical aggregation. Default: localhost
Forwarder.Filter (array)
Filter for the forwarded records. The filter array format is
identical to Filter from tsp-forwarder(8), except the rules
operate on Statse records rather than data points of a time
series.
The filter must set the host tag. It may also set the cluster
tag to enable computation of cluster-scoped p99.
The aggregation-stage settings are:
Aggregator.SnapshotInterval (string)
The interval between runs of the aggregation function on the
stored records. Default: 10s.
EXAMPLE
The below example showcases all settings, including a message filter
that adds two standard tags, adds a metric prefix, and blocks records
that include a tag that corresponds to HTTP path, which would otherwise
lead to OpenTSDB namespace blowup.
{
"Aggregator": {
"SnapshotInterval": "5s"
},
"Forwarder": {
"AggregatorHost": "foo1.example.com",
"Filter": [
{
"Match": [
"",
"path",
"^/"
],
"Block": true
},
{
"Match": [
"(.*)"
],
"Set": [
"foo.${1}",
"host",
"foo2.example.com",
"cluster",
"live.uswest"
]
}
]
}
}
SEE ALSO
statse(5)
BUGS
Independent submitters sharing a forwarder must arrange to use unique
OpenTSDB namespaces. This is easily achieved by prepending unique iden-
tifier to the metric name of every outgoing record, for example the
application's name. Failing that, the records will be merged.
NAME
statse - a protocol for exporting event-based metric data
INTRODUCTION
Statse is a companion protocol to the TSDB protocol. It is a propri-
etary protocol developed at Betfair to fulfill the following require-
ments:
1. Introduce cross-node (cluster-wide) calculation of the 99th per-
centile of service operation times.
2. Obsolete application-level event buffering, reducing memory pres-
sure.
Statse inherits some of its structure and limitations from the TSDB
protocol, for which it acts as a preprocessor. As an example of such
restriction, Statse can handle only a subset of Unicode.
Statse is an event sampling protocol. Periodic sampling (e.g. regular
collection of operation counters) is out of Statse's scope. Use tools
like StatsGatherer to collect these counters. However, the Statse lis-
tener can derive some of these counters based on the incoming Statse
log.
EVENT FLOW
Statse introduces the event flow abstraction. Application programs use
Statse to append entries to topical performance event logs. Log entries
have specified structure, see MESSAGE STRUCTURE. Logging is network-
based: entries are serialised over ZeroMQ socket, see MESSAGE SERIALI-
SATION.
As of this writing, the event flow is written to in-memory store, allo-
cated by a possibly remote aggregator service. The aggregator performs
on-the-fly statistical analysis, deriving time series as a result. The
details of this process are out of scope for this document because they
are not a concern of Statse sender.
MESSAGE STRUCTURE
Each message has the following structure:
type Message struct {
Header struct {
Version int8
Timestamp int64
}
Body struct {
Type string
// Log name
Metric string
Tags string
// Metric values
Error bool
Time float32
TTFB float32
Size float32
}
}
Each message has the following fields:
Version
Protocol version. Always 2.
Timestamp
Sample time in the milliseconds-since-epoch-start format. Exam-
ple: 1398787844000.
Type
Message type. Always ``EVENT''.
The fields of an EVENT-typed message are:
Metric
Name of the metric (topic), analogous to the metric part of TSDB
data point. Example: ``to.FooService''.
Tags
List of categorical dimensions of the metric, analogous to the
tags part of TSDB data point. Example: ``op=GetBar''.
Some tag names are reserved for Statse's use and must not be
used by Statse sender. The only reserved tag name is error.
The minimum tag count is 0. The maximum tag count is 5.
Error
Indicates the error status. Example: true.
The aggregator service may omit erroneous events from calcula-
tion of some statistics.
Time
End-to-end duration, conventionally in milliseconds. Example: 3.
TTFB
Time to first byte, conventionally in milliseconds. Example:
2.9.
Size
Related space metric, e.g. HTTP response size, conventionally in
bytes. Example: 2951.
The fields Error, Time, TTFB, Size are optional. If the Error field is
absent, it defaults to false.
MESSAGE SERIALISATION
Statse sender is a ZeroMQ publisher that connects to a subscriber lis-
tening at tcp://127.0.0.1:14444. Each Statse message is encoded in a
multi-part ZeroMQ message consisting of exactly two parts. The parts
are ASCII strings serialised according to the following rules.
Part 1 corresponds to Message.Header. It is serialised by applying the
one-line template:
"{{.Version}} {{.Timestamp}}"
Part 2 corresponds to Message.Body. Its serialisation depends on Mes-
sage.Body.Type.
An EVENT-typed message is serialised by applying the one-line template:
"EVENT|{{.Metric}}|{{.Tags}}|err={{.Error}} time={{.Time}}
ttfb={{.TTFB}} size={{.Size}}"
The format of floating point values is decimal without exponent nor
sign, for example ``123.456''. To denote absence of value (Time, TTFB,
or Size), omit the relevant name=value.
EXAMPLES
Allocating the socket:
socket, err := zmq.DefaultContext().Socket(zmq.Pub)
if err != nil {
statClientErrors.Add("type=ZMQSocket", 1)
return err
}
if err := socket.Connect("127.0.0.1:14444"); err != nil {
statClientErrors.Add("type=ZMQConnect", 1)
return err
}
Sending a message:
var (
header = []byte("2 1234567890000")
body = []byte("EVENT|Foo.to.Bar|op=getBar|err=false time=12")
)
if err := socket.Send(header, body); err != nil {
statClientErrors.Add("type=ZMQSend", 1)
}
SEE ALSO
collect-statse(1)
NAME
tsp-aggregator - aggregate time series
DESCRIPTION
tsp-aggregator implements site feed, a time series feed that includes
all the site's data points. It receives individual host feeds from the
collect-site plugin, and relays them to the subscriber relays.
From the implementation point of view, tsp-aggregator behaves exactly
as tsp-forwarder(8) with the exceptions listed below.
The -f flag defaults to ``/etc/tsp-aggregator/config''.
CollectPath defaults to ``/etc/tsp-aggregator/collect.d''.
LogPath defaults to ``/var/log/tsp/aggregator.log''.
STREAM API
To create a site feed subscriber, add a Relay the has all fields unset
except Host.
The feed has the following qualities:
Well-formed: the data is guaranteed to be valid according to the
opentsdb.net specification.
Uncompressed: the data appears in plain text without drops due to dedu-
plication.
Canonical format: no whitespace beyond spaces (no tabs). All spaces
squeezed (no repeats).
Delivery guarantee: at-most-once.
Delivery delay: the only deliberate algorithmic (by design) delay is
that due to Nagle's algorithm.
Order guarantee: order preserving; data points in each time series
arrive in strictly monotonic time order (no duplicates).
Reconnect strategy: on connection error, first reconnect is immediate.
Subsequent reconnects are separated by pauses that increase exponen-
tially starting at 1s, up to a limit of 600s.
SEE ALSO
tsp-forwarder(8)
NAME
tsp-controller - control the time series pipeline
SYNOPSIS
tsp-controller [-t] [-f file] [-l addr]
DESCRIPTION
tsp-controller is a service that receives control requests from pipe-
line components, and handles them according to the policy defined in
file.
-f file
Set path to the configuration file. Default: /etc/tsp-con-
troller/config
-l addr
Set listen addr. Default: :8084
-t
Test configuration file, signalling success via exit code.
CONFIGURATION FORMAT
The configuration file holds an XML document that configures tsp-con-
troller. The global <config> element includes the elements:
<filter path=file/>
Path to a program that generates custom filtering ruleset.
Default: /etc/tsp-controller/filter
The extra rules are prepended to the filter array. If file does
not exit, no extra rules are added. However, tsp-controller(8)
always appends the usual rules setting the host and cluster
tags.
The program must output a filter array as specified in tsp-for-
warder(8) under the Filter section. The full program invocation
is:
file program host [cluster]
program is the name of requesting program, for example ``tsp-
forwarder''. host is the name of the requesting host. cluster
is the name of the enclosing cluster, and may be unset.
<hostgroup id=id>
Declare a host group identified by id. Host groups may nest.
For example, one might define a hostgroup ``web'' that nests the
groups ``web.api'' and ``web.cache''.
An innermost hostgroup includes cluster elements:
<cluster id=id>
Declare a cluster identified by id. A cluster is like
hostgroup, except (1) it is an innermost group, (2) it
acts as a directive to the pipeline to assign value id to
the cluster tag of all points originated in the cluster.
A cluster element includes host elements:
<host id=id>
Declare a host identified by id, which is a fully-
qualified domain name.
A host element may include the elements:
<statse aggregator=enabled interval=interval>
If enabled is true, the host is marked as
statse aggregator for its cluster. All
instances of collect-statse(8) in the clus-
ter will forward records to this host for
cluster-wide analysis. At most one host in
a cluster may be marked as aggregator. In
case no host is marked, each host will ana-
lyse its own data, i.e. cluster-wide sum-
maries will be unavailable.
The interval controls aggregation snapshot
interval. Default: 10s
<network path=file/>
Path to a file declaring the traffic routing topology. Default:
/etc/tsp-controller/network
If file does not exist, tsp-controller(8) assumes it runs on
OpenTSDB server, and routes all time series traffic to the
server.
The file may include the following elements:
<aggregator host=host/>
Address of the host running tsp-aggregator(8)
<poller host=host/>
Address of the host running tsp-poller(8)
<restrict host=host/>
Limit controller's scope. Handle requests only for hosts
matching the host regex. Multiple elements are OR-ed
together. If no element is specified, controller has
unlimited scope (handles all requests).
<subscriber id=id host=host direct=direct dedup=dedup/>
Register traffic subscriber identified by id. The host
setting is the address of subscriber host. If direct is
true, the subscriber will receive multiple connections,
one per tsp-forwarder(8) instance. If false, the sub-
scriber will receive single combined connection from tsp-
aggregator(8). If dedup is true, the received feed will
be deduplicated. By default, subscribers receive combined
connection without deduplication.
EXAMPLE
The following pair of files defines a web site:
<config>
<hostgroup id="web">
<cluster id="web.live">
<host id="web001.example.com"/>
<host id="web002.example.com"/>
</cluster>
</hostgroup>
</config>
<network>
<subscriber
id="tsd"
host="tsd.example.com"
direct="true"
dedup="true"/>
</network>
NAME
tsp-forwarder - forward time series
SYNOPSIS
tsp-forwarder [-tv] [-f file]
DESCRIPTION
tsp-forwarder is an agent that collects data points from plugin pro-
grams, and writes them to a time series relay.
tsp-forwarder rereads the configuration file when it receives a hangup
signal, SIGHUP, by executing itself with the name and options it was
started with.
-f file
Set path to the configuration file. Default: /etc/tsp/config
-t
Test configuration file, signalling success via exit code.
-v
Print debugging information.
CONFIGURATION FORMAT
The configuration file holds a JSON object that configures tsp-for-
warder. The settings are:
CollectPath (string)
Path to directory containing collection plugins. Default:
/etc/tsp/collect.d
Collection plugin is an executable program that creates data
points by writing OpenTSDB-formatted lines to standard output.
tsp-forwarder monitors the directory for file additions/dele-
tions, and starts/kills processes in response. If crashed, the
process is restarted; if exit code 13 was returned, the restart
is delayed by 1 hour. The process must exit after encountering
an error writing to standard output.
Filter (array)
Ruleset evaluated for every data point prior to a send to the
relay.
The rule settings are:
Match (array)
Match is a list of conditions that must hold for the rule
actions to be executed. If a condition is false, the rule
is ignored and the evaluation proceeds to the next rule.
If Match is unset, the rule actions are executed uncondi-
tionally.
The head array element (index 0) is a regex matching the
dot-delimited metric name. The remainder of the array
(indicies 1+) is an optional tags match list in the form
of key, regex pairs. If a tag with the given key is
absent, the regex is passed an empty string.
At most one regex may include submatch syntax (the
unescaped parentheses).
Set (array)
An action that mutates the forwarded data point. Struc-
turally, Set is like Match except it (1) reassigns metric
name or tags, or assigns new tags, (2) expands regex sub-
matches by substituting the ${N} symbol with N'th match.
Assigning an empty value is a no-op.
Block (boolean)
An action that causes the data point to be ignored.
Default: false
All regular expressions support the extended features, for exam-
ple the + operation.
The default filter is:
[
{
"Block": true
}
]
LogPath (string)
Path to the log file. Default: /var/log/tsp/forwarder.log
Relay (object)
Relay definitions. The object key gives the relay an internal
name for use in log entries. The object value holds the settings
object specified below.
Each defined relay is a mirror: it is sent every data point. If
it cannot keep up with the throughput, its points are queued in
memory up to a limit (100K points). Once the queue fills, new
points are dropped until the throughput problem is resolved.
The settings are:
DropRepeats (bool)
Enable deduplication, a space-conserving optimisation.
Drops data points that don't contribute to line plots,
i.e. those contained by existing line segments. Default:
false
Host (string)
Server address in host:port format. If port is not pro-
vided, it defaults to 4242. The protocol used is the
OpenTSDB line-based telnet protocol. For load balancing,
multiple servers may be defined using comma to separate
server addresses. The traffic will be sent to all listed
servers, partitioned using a hash of time series identi-
fier.
MaxConnsPerHost (string)
Limit number of client connections established to each
host address in the Host list. Default: 1
OnQueueFull (string)
Error handling for the queue full condition. One of:
Drop, DropAndLog. Drop causes irrecoverable data loss.
DropAndLog is like Drop except it will attempt to log
lost points to LogPath. Default: Drop
EXAMPLE
Forward data points to a single relay:
{
"Filter": [
{
"Match": [
"",
"host",
"^$"
],
"Set": [
"",
"host",
"server101.example.com"
]
}
],
"Relay": {
"tsd": {
"Host": "tsd.example.com",
"DropRepeats": true
}
}
}
NAME
tsp-poller - extract time series from remote hosts
DESCRIPTION
tsp-poller is a service that collects data points from plugin programs
that produce time series based on remote data: SNMP, non-SNMP propri-
atary APIs, Internet services, and so on. It complements tsp-for-
warder(8), which uses only local data.
From the implementation point of view, tsp-poller behaves exactly as
tsp-forwarder(8) with the exceptions listed below.
The -f flag defaults to ``/etc/tsp-poller/config''.
CollectPath defaults to ``/etc/tsp-poller/collect.d''.
LogPath defaults to ``/var/log/tsp/poller.log''.
SEE ALSO
tsp-forwarder(8)
Return to Documentation.
⌇ opentsp.org - Time Series Pipeline