## Usage

This is a helper notebook. Run it from another notebook as:

```
%run ../common/benchmark_analysis.ipynb 
```

## Input

Set `data_absolute_path` to the file with JSON data to be analyzed.
This data should be generated using the `notsofine::benchmark_run` from `harness`.

## Output

When this notebook completes,
* Raw data will be loaded in variable `df`
* Per program data series will be loaded in `series`.

In [13]:
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats

plt.style.use('seaborn-whitegrid')

In [2]:
# Uncomment for debugging.

# import os
# data_absolute_path = os.path.join(os.getcwd(), '..', 'benchmark_trivial', 'data.json')

In [None]:
print("### Loading data from " + data_absolute_path)

with open(data_absolute_path) as f:
    data = json.loads(f.read())
df = pd.json_normalize(data['iterations'], 'runs', ['i'])

In [None]:
df['duration.total_ms'] = df['duration.secs'] * 1e3 + df['duration.nanos'] / 1e6

series = df.pivot(
    index='i',
    columns='program', 
    values=['duration.total_ms' ])

print('### Statistics: Raw data')
print(series.describe())
series.plot(y='duration.total_ms', kind='line',
            title='Measured time in all benchmark runs (milliseconds)')


# 

In [None]:
# Remove outliers by dropping rows where _any_ column value is more than 2 SDs away from the column mean.

mask = (np.abs(stats.zscore(series)) < 2).all(axis=1)
series_sans_outliers = series[mask];

print('### Statistics: After outlier removal')
print(series_sans_outliers.describe())
series_sans_outliers.plot(y='duration.total_ms', kind='line',
            title='Measured time in all benchmark runs (milliseconds)')

# TODO

* Compare resulting series with original
* Pairwise:
  * Compute relative diff
  * Hypothesis testing. Null hypothesis: "relative error > 0.01". Use paired two-sided t-test.