# tools.hdf5 Interface

This notebook demonstrates the basic syntax for retrieving Lustre server-side data from LMT via the pytokio API.

In [None]:
%matplotlib inline

In [None]:
import matplotlib
matplotlib.rcParams.update({'font.size': 14})
import datetime
import tokio.tools

## Define input time range

`start_time` and `end_time` define the time range of interest.  Note that LMT stores data every five seconds, so requesting a large time range (e.g., multiple days) can result in very large query times and very slow plotting.

In [None]:
start_time = datetime.datetime(2017, 5, 17, 21, 35, 25)
end_time = datetime.datetime(2017, 5, 18, 9, 35, 53)

## Retrieve LMT data from HDF5

The arguments for `tools.tokio.hdf5.get_dataframe_from_time_range` requires a bit of arcane knowledge.  Specifically:

`file_name` can be:

* `cori_snx11168.h5lmt` for cscratch
* `edison_snx11025.h5lmt` for edison scratch1
* `edison_snx11035.h5lmt` for edison scratch2
* `edison_snx11036.h5lmt` for edison scratch3

`group_name` can be:

* `OSTReadGroup/OSTBulkReadDataSet` for read bytes/sec
* `OSTWriteGroup/OSTBulkWriteDataSet` for write bytes/sec
* `OSSCPUGroup/OSSCPUDataSet` for OST CPU loads (out of 100.0)
* `MDSCPUGroup/MDSCPUDataSet` for MDS CPU loads (out of 100.0)

In [None]:
result = tokio.tools.hdf5.get_dataframe_from_time_range(
            file_name='cori_snx11168.h5lmt',
            group_name='OSTReadGroup/OSTBulkReadDataSet',
            datetime_start=start_time,
            datetime_end=end_time)
result.head()

## Plot the OST Read Rates

In [None]:
fig, ax = matplotlib.pyplot.subplots()
fig.set_size_inches(10, 8)

### Convert bytes/sec to GiB/sec
(result / 2.0**30.0).plot.area(ax=ax)

ax.grid(True)
ax.legend_.remove()
ax.set_ylabel("GiB/sec")
ax.set_xlabel("Time")