Skip to content
This repository has been archived by the owner on Apr 4, 2019. It is now read-only.

Latest commit

 

History

History
346 lines (240 loc) · 19.8 KB

README.md

File metadata and controls

346 lines (240 loc) · 19.8 KB

hashmark

hashmark is a MySQL time-series database and PHP library for data point insertion and analytic queries.

Features

  • Numeric and string data types.
  • PHP client library for collecting data points in preexisting apps.
  • Custom scripts for analysis and periodic data point collection.
  • SQL macros allowing queries to reference intermediate results from prior statements.
  • Configurable date-based partitioning.
  • Cache and database adapters provided by bundled Zend Framework 1.x components.
  • High unit test coverage.

Analytics

Support

  • MySQL aggregate functions: AVG, SUM, COUNT, MAX, MIN, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP
  • MySQL aggregate functions eligible for DISTINCT selection: AVG,'SUM, COUNT, MAX, MIN
  • Time intervals for aggregates: hour, day, week, month, year
  • MySQL time functions for aggregates of recurrence groups (e.g. "1st of the month"): HOUR, DAYOFMONTH, DAYOFYEAR, MONTH

Methods

multiQuery($scalarId, $start, $end, $stmts)

Perform multiple queries using macros to reference prior intermediate result sets. Internally supports many of the functions below.

values($scalarId, $limit, $start, $end)

Return samples within a date range.

valuesAtInterval($scalarId, $limit, $start, $end, $interval)

Return the most recent sample from each interval within a date range.

valuesAgg($scalarId, $start, $end, $aggFunc, $distinct)

E.g. return **"average value between date X and Y" or "volume of distinct values between date X and Y."

valuesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)

Similar to valuesAgg except that results are grouped into a given interval, e.g. "average weekly value between date X and Y."

valuesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)

Aggregate values returned by valuesAggAtInterval, e.g. "average weekly high between date X and Y."

valuesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)

E.g. "peak value in the 8-9am hour between date X and Y."

changes($scalarId, $limit, $start, $end)

Return from a date range each sample's date, value, and change in value from the prior sample.

changesAtInterval($scalarId, $limit, $start, $end, $interval)

Similar to changes except that valuesAtInterval provides the source data, e.g. "weekly value and its change (week-over-week) between date X and Y."

changesAgg($scalarId, $start, $end, $aggFunc, $distinct)

E.g. "peak value change between date X and Y."

changesAggAtInterval($scalarId, $start, $end, $interval, $aggFunc, $distinct)

Similar to changesAgg except that changes provides the source data, e.g. "weekly peak value change (week-over-week) between date X and Y."

changesNestedAggAtInterval($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner)

Aggregate values returned by changesAggAtInterval, e.g. "average of weekly peak value changes (week-over-week) between date X and Y."

changesAggAtRecurrence($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct)

E.g. "peak value change on Black Friday between year X and year Y."

frequency($scalarId, $limit, $start, $end, $descOrder)

Return unique values and their frequency between date X and Y.

moving($scalarId, $limit, $start, $end, $aggFunc, $distinct)

Return from a date range each sample's date, value, and the aggregate value at sample-time. E.g. "values and their moving averages between date X and Y."

movingAtInterval($scalarId, $limit, $start, $end, $interval, $aggFunc, $distinct)

Similar to valuesAtInterval except that moving provides the data source, e.g. "the last value and its moving average from each week between date X and Y."

Example Code

Quick Background

Main database tables:

  • scalars: Metadata and current value of a named string or number, e.g. "featureX:optOut".
  • samples_decimal: Historical values of a numeric data points in scalars.
  • samples_string: Historical values of a string data points in scalars.

Client

Hashmark_Client supplies methods for updating a current value (in scalars) and adding a historical sample (in samples_decimal or samples_string).

  • incr($name, $amount = 1, $newSample = false)
  • decr($name, $amount = 1, $newSample = false)
  • set($name, $amount, $newSample = false)
  • get($name)
<?php
if ($userOptedOutOfFeatureX) {
 $client->incr('featureX:optOut', 1, true);
}

To enable drop-in client calls to work without any prior setup, e.g. if "featureX:optOut" above did not yet exist, use $client->createScalarIfNotExists(true).

Agent

Each script is just a class that implements the small Hashmark_Agent interface.

The Agent/StockPrice.php demo fetches AAPL's price from Google Finance and creates a historical data point.

Cron/runAgents.php normally runs each agent on a configured schedule, but a manual run might look like:

<?php
$agent = Hashmark::getModule('Agent', 'StockPrice');
$price = $agent->run($scalarId);

$partition = Hashmark::getModule(Partition, '', $db);
$partition->createSample($scalarId, $price, time());

Create a Scalar

<?php
$core = Hashmark::getModule('Core', '', $db);

$scalarFields = array();
$scalarFields['name'] = 'featureX:optOut';
$scalarFields['type'] = 'decimal';
$scalarFields['value'] = 0;  // Initial value.
$scalarFields['description'] = 'Opt-out requests for featureX.';

$scalarId = $core->createScalar($scalarFields);

$savedScalarFields = $core->getScalarById($scalarId);
$savedScalarFields = $core->getScalarByName('featureX:optOut');

Create a Category

<?php
$categoryId = $core->createCategory('Feature Trackers');
if (!$core->scalarHasCategory($scalarId, $categoryId)) {
   $core->addScalarCategory($scalarId, $categoryId);
}

Create a Milestone

<?php
$milestoneId = $core->createMilestone('featureX initial release');
$core->setMilestoneCategory($milestoneId, $releaseCategoryId);

Query

<?php
$analyst = Hashmark::getModule('Analyst', 'BasicDecimal', $db);

$sampleDateMin = '2012-01-01 00:00:00';
$sampleDateMax = '2012-02-01 00:00:00';

$limit = 10;

// Returns first 10 samples: their dates, values, and running/cumulative totals
$analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM');
// Now only distinct values affect aggregates
$analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM', true);

// Returns first 10 samples: their dates and values
$analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax);

// Returns first 10 samples: their dates and values
$analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax);

// Returns first 10 samples: their dates, values, and difference from prior sample
$analyst->changes($scalarId, $limit, $sampleDateMin, $sampleDateMax);

Requirements

Most recently tested with PHP 5.4.0beta1, PHPUnit 3.6.0RC4, and MySQL 5.5.16.

  • PHP 5.2+
  • MySQL 5.1+
  • PDO or MySQL Improved
  • apc, xcache or memcache

For tests:

Installation

  • CREATE DATABASE hashmark DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;
  • Import Sql/Schema/hashmark.sql
  • Optionally repeat 1 and 2 for a separate unit test DB.

Database Configuration

Hashmark uses Zend Framework's database component. Refer to the ZF guide for option values. Example:

<?php
$config['DbHelper']['profile']['unittest'] = array(
 'adapter' => 'Mysqli',
 'params' => array(
   'host' => '127.0.0.1',
   'port' => 5516,
   'dbname' => 'hashmark_test',
   'username' => 'msandbox',
   'password' => 'msandbox'
 )
);

Config/Hashmark-dist.php only includes a database config profile for cron scripts and unit tests. Normally the client app will supply its own connection instance. For example:

<?php
$this->hashmark = Hashmark::getModule('Client', '', $db);
...
$this->hashmark->incr('featureX:optOut', 1, true);

Cache Configuration

Hashmark also uses Zend Framework's cache component. Refer to the ZF guide for option values.

Using Memcache as an example, you might update $config['cache'] in Config/Hashmark-dist.php:

$config['Cache'] = array(
 'backEndName' => 'Memcached',
 'frontEndOpts' => array(),
 'backEndOpts' => array(
   'servers' => array(
     array('host' => 'localhost', 'port' => 11211)
   )
 )
);

Other Configuration

See Config/Hashmark-dist.php comments.

Verify

$ php -f Test/Install.php
pass: Connected to DB with 'cron' profile in Config/DbHelper.php
pass: Found all Hashmark tables with 'cron' profile in Config/DbHelper.php
pass: Connected to DB with 'unittest' profile in Config/DbHelper.php
pass: Found all Hashmark tables with 'unittest' profile in Config/DbHelper.php
pass: Loaded Hashmark_BcMath module.
pass: Loaded Hashmark_Cache module.
pass: Loaded Hashmark_Client module.
pass: Loaded Hashmark_Core module.
pass: Loaded Hashmark_DbHelper module.
pass: Loaded Hashmark_Partition module.
pass: Loaded Hashmark_Agent_YahooWeather module.
pass: Loaded Hashmark_Test_FakeModuleType module.
pass: Built samples_1234_20111000 partition name with 'm' setting in Config/Partition.php.

Schema

  • agents: Available Agent classes.
  • agents_scalars: Agent's schedules and last-run metadata.
  • categories: Groups to support front-end browsing, searches, visualization, etc.
  • categories_milestones: For example, to link category "ShoppingCart" with milestone "site release 2.1.2".
  • categories_scalars: For example, to link category "ShoppingCart" with data point "featureX:optOut".
  • milestones: Events to correlate with scalar histories, e.g. to visualize "featureX:optOut" changes across site releases that tweak "featureX".
  • samples_analyst_temp: When Hashmark creates temporary tables to hold intermediate aggregates, it copies this table's definition.
  • samples_decimal and samples_string: Identical except for one column. Hashmark copies their definitions when creating new partitions. id auto-increment values are seeded from the associated scalar's sample_count column.
  • scalars: The table holds columns that define each data point's type (string or decimal), current value, and other metadata.

File Layout

Naming Convention

Zend Framework's style is followed pretty closely. Parent classes, some abstract, live in the root directory. Child classes live in directories named after their parents. Class names predictable indicate ancestors, e.g. [Hashmark_Analyst_BasicDecimal`, and file names mirror the class name's last part, e.g. Analyst/BasicDecimal.php.

Analyst/
 BasicDecimal.php
Analyst.php
...
Agent/
 YahooWeather.php
 ...
Agent.php
...

Classes

Tests

Most test-related files live under Test/, but a few like Config/Test.php live outside so cases can cover code relying on naming conventions.

Sql/Analyst/

Contains SQL templates. For example, Sql/Analyst/BasicDecimal.php templates allow Analyst/BasicDecimal.php to reuse and combine statements as intermediate results toward final aggregates.

Cron Scripts

Tests

Running

First: php -f Test/Analyst/BasicDecimal/Tool/writeProviderData.php which Test/Analyst/BasicDecimal/Data/provider.php. The BasicDecimal suite relies on a bcmath and a series of generators in Test/Analyst/BasicDecimal/Tool/ to provide calculate a comprehensive set of expected test results.

  • Run suites for all modules: phpunit [--group name] Test/AllTests.php
  • Run a specific module's suite: phpunit [--group name] Test/[module]/AllTests.php