# Analyzing NumPy benchmarks using microbench

## Background
This is a quick microbench example to demonstrate how two different versions of the NumPy library can give different outputs. The change highlighted here is a documented API change and not a bug, however, if a user runs code across heterogeneous compute environments (e.g. a cluster) it may not be immediately obvious that some nodes may be differently or misconfigured. A small Python program can be found in this directory, together with a JSON file containing microbench-captured metadata.

## Load the data

In [1]:
import pandas as pd

In [2]:
results = pd.read_json('microbench-numpy.json', lines=True)

## Show the captured metadata

In [3]:
results

Unnamed: 0,function_name,package_versions,hostname,operating_system,start_time,finish_time,return_value
0,sum_linspace,"{'numpy': '1.15.0', 'microbench': '0.7'}",bismo,darwin,2021-03-02 13:06:35.414658,2021-03-02 13:06:35.414758,-6
1,sum_linspace,"{'numpy': '1.20.0', 'microbench': '0.7'}",bismo,darwin,2021-03-02 13:07:16.945669,2021-03-02 13:07:16.945913,-11


In [4]:
results['runtime'] = results['finish_time'] - results['start_time']

In [5]:
results['runtime']

0   0 days 00:00:00.000100
1   0 days 00:00:00.000244
Name: runtime, dtype: timedelta64[ns]

We can also reformat the `package_versions` dictionaries into separate columns, which is useful if we want to sort or filter by version number.

In [8]:
results = pd.concat([results, pd.json_normalize(results['package_versions'])], axis=1).drop(columns='package_versions')

In [9]:
results.head()

Unnamed: 0,function_name,hostname,operating_system,start_time,finish_time,return_value,runtime,numpy,microbench
0,sum_linspace,bismo,darwin,2021-03-02 13:06:35.414658,2021-03-02 13:06:35.414758,-6,0 days 00:00:00.000100,1.15.0,0.7
1,sum_linspace,bismo,darwin,2021-03-02 13:07:16.945669,2021-03-02 13:07:16.945913,-11,0 days 00:00:00.000244,1.20.0,0.7


## Discussion
The above example demonstrates different `return_value`s for NumPy 1.15 and 1.20. While environment standardization using virtual environments is encouraged, capturing metadata can help detect misconfigured environments.