Skip to content
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.

Overview

kevinballard edited this page Jun 4, 2012 · 3 revisions

Taba is a system that collects a large number of small events, and centrally tracks them in persistent state objects. These state objects can be aggregated and exposed in a number of ways. An overview of Taba's architecture can be found at: http://tellapart.com/taba-low-latency-event-aggragation

Basic terminology:

  • Name: The identifier for a specific Tab. (e.g. "inbound_request_count")

  • Type: Every Tab has a Type that determines the behavior of the various components in the Tab's lifecycle.

  • Event: A distinct point of data posted to a Tab. This includes the Name and Type, a timestamp, and an arbitrary value to be posted (depending on the Type)

  • State: The persistent state of the Tab. The specific format of State objects is dependent on the Type, and are frequently binary.

  • Projection: A conversion of a State into a dictionary of simple values.

  • Aggregate: The combination of many Projections.

  • Render: Conversion of a Projection or Aggregate into a human-consumable form.

  • Client: A process which generates Events.

  • Agent: A process which accepts Events from several Clients, and forwards them to a Server.

  • Server: A process which accepts Events, folds them into States, and responds to requests for Projections, Aggregates, or Renders.

Events are recorded by the various Clients and pushed to the Server (usually through the use of an Agent). At the Server, Events are folded into State objects, which are persisted. One State object is maintained per (Client, Name) pair. States can then be converted into Projections, and Projections can be combined into Aggregates. Projections and Aggregates are dictionaries of simple values, and for any given Type, have the same schema. Projections and Aggregates can be Rendered into human-readable form, or consumed as JSON data.

Types

There are several Types available in the default Taba deployment.

  • TotalsCounter: Produces a persistent sum and count of all posted Events. Input value is a single floating-point number.
  • CounterGroup: Produces sums and totals over 4 sliding windows (the last 1 minute, 10 minutes, 1 hour, and 1 day), and a persistent reservoir sampled percentile of values (at 25%, 50%, 75%, 90%, 95%, and 99%). Input value is a single floating-point number. This is the default Type.
  • CommonPrefixCounterGroup: Produces a group of CounterGroups, each with a different Name prefix. Input value is a 2-tuple of (prefix, floating-point number).
  • String: Tracks a string value, keeping the latest for each Client, and Aggregating the counts of each unique value. Input value is an arbitrary string.
  • ExpiryString: Like String, but also includes an expiry time, at which the value gets removed from the Aggregate. Input value is a 2-tuple of (string value, expiry timestamp in UTC seconds).
  • Buffer: Produces a list of arbitrary string values, each with a specified expiry. Input value is a 2-tuple of (string value, expiry timestamp in UTC seconds).

Accessing Data

The Server exposes a REST-style interface to accessing data. For example, to access the Rendered Aggregate for a Tab on a specific Client, you could run: curl 'localhost:8370/taba?client=client.id&taba= ad_decision_ad_chosen_2'

The URLs to extract States, Projections, Aggregates, and Renders support the same parameters:

  • client: If specified, retrieves data for a specific Client ID.
  • taba: If specified, retrieves data for a specific Name.

WARNING: Try to always specify a Client ID, Name, or both! An omitted parameter is assumed to mean that you want all values of that parameter, which can take longer to generate (i.e. if you don't specify a Client ID, the results will be for all Client IDs). Typos in parameter name will effectively make that parameter unspecified.

The data extraction methods are:

  • /raw: Raw State objects. The output will be binary -- not intended for users.
  • /projection: Projections. Output is of the form [Client ID: [Name: {Projection Dictionary}]].
  • /aggregate: Aggregates. Output is of the form [Name: {Aggregate Dictionary}]. This method does not support the client parameter.
  • /taba: Rendered Tabs. Output is human-readable text.

Other available methods are:

  • /clients: List of all Client IDs.
  • /names: List of all Names.
  • /type: Mapping from all Names to the associated Type.
  • /status: Information about the Server's status.
Clone this wiki locally