# Importing Data

* Monitoring systems provide the single source of truth for Operations data
* If you to have data available for ops tasks like business reports, usage accounting, SLOs, alerting, system understanding, then it should be inside your monitoring system
* If it's not, then get it into your monitoring system, first.
* Hence: *The monitoring system is the single data source for Ops Data*

```
SRC ===> +-------------------+
SRC ===> | Monitoring System | ===> Data Analysis Workbench 
SRC ===> +-------------------+
```

<hr>

* We will use the Python Data Science toolchain to manipulate data
* Most monitoring tools have a certain set of analysis methods integrated. Those will have limitations. If we want to go all the way, we might want to use more powerfull data analysis tools
* The first step in this kind of analysis is data fetching.
* We will use Circonus b/c I work there and have some example datasets that I can share
* Similar steps apply to any other tool you are using


## Importing data from Circonus

API Docs: https://login.circonus.com/resources/api

Using API wrapper: https://github.com/circonus-labs/python-circonusapi

In [2]:
TOKEN="3bfe6f29-418f-4e34-9f87-787fc8c29490" # provided token for demo account

In [3]:
# general purpose imports
from pprint import pprint
from functools import *
from itertools import *
from datetime import datetime

In [4]:
from circonusapi import circonusapi
from circonusapi import config

# Now initialize the API
circapi = circonusapi.CirconusAPI(TOKEN)

In [6]:
circapi.debug = False # enable/disable debug output

In [10]:
# Example: Get list of all metrics
circapi.api_call("GET","/metric")

[{'_active': True,
  '_check': '/check/235447',
  '_check_active': True,
  '_check_bundle': '/check_bundle/195464',
  '_check_tags': [],
  '_check_uuid': '3cd9397b-34c3-4b7a-bbf6-5ff7fb937ae0',
  '_cid': '/metric/235447_rtt',
  '_histogram': 'active',
  '_metric_name': 'rtt',
  '_metric_type': 'numeric',
  'link': None,
  'notes': None,
  'tags': [],
  'units': 'milliseconds'},
 {'_active': True,
  '_check': '/check/235446',
  '_check_active': True,
  '_check_bundle': '/check_bundle/195464',
  '_check_tags': [],
  '_check_uuid': '5edef88f-42af-4981-94a9-5d17451bade2',
  '_cid': '/metric/235446_rtt',
  '_histogram': 'active',
  '_metric_name': 'rtt',
  '_metric_type': 'numeric',
  'link': None,
  'notes': None,
  'tags': [],
  'units': 'milliseconds'},
 {'_active': True,
  '_check': '/check/235441',
  '_check_active': True,
  '_check_bundle': '/check_bundle/195459',
  '_check_tags': [],
  '_check_uuid': '3a16668a-1b3d-4ce8-88e0-ea5311cfa091',
  '_cid': '/metric/235441_rtt',
  '_histogra

Metrics are identified by "check" and "name", but have many other attributes, that can be manipulated.

In [11]:
#
# Limit metrics by providing a search query
#
def simple_search(q):
    return list(map(post_search,
                    circapi.api_call("GET","/metric", params={"search": q})
                   ))
    
def post_search(r):
    return {
       "name" : r['_metric_name'],
        "check_id" : r["_check"][len('/check/'):]
    }

In [12]:
simple_search("(metric:duration)")

[{'check_id': '195902', 'name': 'duration'},
 {'check_id': '154743', 'name': 'duration'},
 {'check_id': '222868', 'name': 'duration'},
 {'check_id': '222866', 'name': 'duration'},
 {'check_id': '222867', 'name': 'duration'},
 {'check_id': '222865', 'name': 'duration'},
 {'check_id': '222863', 'name': 'duration'},
 {'check_id': '222864', 'name': 'duration'},
 {'check_id': '222861', 'name': 'duration'},
 {'check_id': '222860', 'name': 'duration'},
 {'check_id': '222862', 'name': 'duration'},
 {'check_id': '222859', 'name': 'duration'},
 {'check_id': '222858', 'name': 'duration'},
 {'check_id': '222857', 'name': 'duration'},
 {'check_id': '222855', 'name': 'duration'},
 {'check_id': '222856', 'name': 'duration'},
 {'check_id': '222854', 'name': 'duration'},
 {'check_id': '218002', 'name': 'duration'},
 {'check_id': '218003', 'name': 'duration'},
 {'check_id': '217834', 'name': 'duration'}]

In [49]:
#
# Fetch data from Circonus
#
def simple_fetch(check_id, metric, start, period, count):
    """
    Fetch data from Circonus API
    """
    return list(map(post_fetch, circapi.api_call("GET", "data/{}_{}".format(check_id, metric), params = {
        "period": period,
        "start": int(start), 
        "end": int(start + count * period),
        "format" : "object"
    })['data']))

def post_fetch(r):
    return r.get('value', None)


In [50]:
simple_fetch(222857,"duration", datetime(2017,10,1).timestamp(), 60, 10)

[11, 12, 12, 11, 12, 12, 12, 12, 12, 12]

In [43]:
#
# Combine search and fetch into single command
#
def simple_search_fetch(q, start, period, count):
    return [ simple_fetch(r['check_id'], r['name'], start, period, count) for r in simple_search(q) ]

In [39]:
%time simple_search_fetch("(metric:duration)", datetime(2017,10,1).timestamp(), 60, 10)

CPU times: user 250 ms, sys: 10 ms, total: 260 ms
Wall time: 16.3 s


[[1, 2, 2, 7, 1, 2, 2, 1, 1, 1],
 [1, 1, 1, 1, 14, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [12, 11, 11, 11, 11, 12, 13, 12, 12, 12],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [7, 1, 1, 1, 11, 1, 1, 1, 1, 1],
 [11, 12, 13, 12, 11, 12, 12, 11, 11, 11],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [11, 12, 12, 12, 12, 12, 11, 12, 13, 12],
 [1, 1, 1, 1, 1, 1, 1, 9, 1, 7],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [11, 12, 12, 11, 12, 12, 12, 12, 12, 12],
 [3, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [12, 12, 13, 12, 11, 13, 12, 12, 13, 14],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]

Serial fetches are slow. We can use the Query Language CAQL to move the fetching step into the database.

In [17]:
#
# CAQL Api wrapper
#
def simple_caql(q, start, period, count):
    return list(map(post_caql, circapi.api_call("GET","/caql", params={
        "start":int(start),
        "end":int(start + count*period),
        "period":int(period),
        "query":q
    })["_data"]))

def post_caql(r):
    return r[1]

In [28]:
# Example: Same query as above
%time simple_caql('search:metric("(metric:duration)")', datetime(2017,10,1).timestamp(), 60, 10)

CPU times: user 0 ns, sys: 10 ms, total: 10 ms
Wall time: 789 ms


[[1, 1, 1, 12, 1, 7, 11, 1, 1, 11, 1, 1, 1, 11, 3, 1, 12, 1, 1, 1, 1],
 [2, 1, 1, 11, 1, 1, 12, 1, 1, 12, 1, 1, 1, 12, 1, 1, 12, 1, 1, 1, 1],
 [2, 1, 1, 11, 1, 1, 13, 1, 1, 12, 1, 1, 1, 12, 1, 1, 13, 1, 1, 1, 1],
 [7, 1, 1, 11, 1, 1, 12, 1, 1, 12, 1, 1, 1, 11, 1, 1, 12, 1, 1, 1, 1],
 [1, 14, 1, 11, 1, 11, 11, 1, 1, 12, 1, 1, 1, 12, 1, 1, 11, 1, 1, 1, 1],
 [2, 1, 1, 12, 1, 1, 12, 1, 1, 12, 1, 1, 1, 12, 1, 1, 13, 1, 1, 1, 1],
 [2, 1, 1, 13, 1, 1, 12, 1, 1, 11, 1, 1, 1, 12, 1, 1, 12, 1, 1, 1, 1],
 [1, 1, 1, 12, 1, 1, 11, 1, 1, 12, 9, 1, 1, 12, 1, 1, 12, 1, 1, 1, 1],
 [1, 1, 1, 12, 1, 1, 11, 1, 1, 13, 1, 1, 1, 12, 1, 1, 13, 1, 1, 1, 1],
 [1, 1, 1, 12, 1, 1, 11, 1, 1, 12, 7, 1, 1, 12, 1, 1, 14, 1, 1, 1, 1]]

In [18]:
# Example: 1d worth of data with 1h resolution
simple_caql('search:metric("(metric:duration)")', datetime(2017,9,1).timestamp(), 60*60, 24)

[[8.8,
  1.0166666666667,
  1.3,
  11.833333333333,
  3.0333333333333,
  1.0666666666667,
  11.833333333333,
  3.0666666666667,
  53.566666666667,
  12.2,
  1.1333333333333,
  1.3,
  53.1,
  11.916666666667,
  3.05,
  1.2166666666667,
  11.666666666667,
  36.75,
  1.0166666666667,
  1.2166666666667,
  3.0333333333333],
 [8.6166666666667,
  1.3166666666667,
  1.25,
  12.05,
  3.05,
  1.1833333333333,
  11.716666666667,
  3,
  3.05,
  11.916666666667,
  1.5,
  1,
  3.05,
  11.816666666667,
  3.05,
  1.35,
  11.683333333333,
  3.05,
  1.05,
  1.4833333333333,
  3.0666666666667],
 [1.3666666666667,
  1.0166666666667,
  1.05,
  11.7,
  3.0333333333333,
  1.3333333333333,
  11.983333333333,
  3.05,
  3.1166666666667,
  11.95,
  1.2,
  1.3,
  3.0166666666667,
  12.066666666667,
  3.15,
  1.05,
  12.083333333333,
  3.1833333333333,
  1.1833333333333,
  1.2666666666667,
  3.0666666666667],
 [1.6166666666667,
  1.15,
  1,
  11.816666666667,
  3.0833333333333,
  1.0833333333333,
  12.416666666667

# Storing Data as CSV or JSON files

In [None]:
!mkdir -p datasets

In [None]:
import json

In [None]:
X = simple_caql('search:metric("(metric:duration)")', datetime(2017,9,1).timestamp(), 60, 24*60)

In [22]:
with open("datasets/http_durations_caql.json","w") as fh:
    json.dump(X,fh)

In [None]:
!jq "." datasets/http_durations_caql.json


# Data Fetching from Graphite

In [47]:
# TBD