In [None]:
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# The hdf5 data file

This notebook provides examples for accessing data within an oskar hdf5 datafile.

In [None]:
from e11 import run_file, H5Data

`run_file()`  
    - A function that generates the path to the data file using the run ID and base directory.

`H5Data`  
    - A class that provides a convienient interface for an oskar hdf5 data file.

Normally, the datafiles files would be saved in a timestamp structure and each can be found using the `rid`.  The path to the file can then be built using `run_file`.

``` python
>>> fil = run_file(base="Q:\E11_atmos\data", rid='20171127_155753')
```

But for this example we'll use the example data.

In [None]:
import os 
fil = os.path.join(os.getcwd(), 'example_data', 'laser_data.h5')
# read hdf5 file
h5 = H5Data(fil)
h5.pprint()

Here, `h5` is an instance of the H5Data class.  Creating this instance generates a `pandas.DataFrame` summary of the group attributes called `h5.log`. 

Usually it's a good idea to specify an `out_dire` when creating the instance,

``` python
>>> h5 = H5Data(fil, out_dire='analysis')
```

If `out_dire` is declared then the log can be cached as a pickle file.  When loading an instance H5Data checks to see if this cache already exists and won't rebuild the log if it does, which could otherwise take a long time for large files accessed over a slow network.

Another use of `out_dire` is for quickly building useful paths, e.g., for saving plots to a sub directory.

``` python
>>> out_fil = h5.sub_dire('plots', fname='signal.png')
>>> plt.savefig(out_fil, bbox_inches='tight', dpi=200)
```

In [None]:
# In our case building the log doesn't take very long.
%time h5.update()

In [None]:
# log output
h5.log.head()

Experimental settings are stored in the log file as VARS and measurements as RECS.

In [None]:
from e11.tools import add_column_index

In [None]:
# combine VAR and REC data
df = add_column_index(h5.var, 'VAR').join(add_column_index(h5.rec, 'REC'))
df.head()

In [None]:
# plot
fig, ax = plt.subplots()

# data
xvals = df[('VAR', 'WL?1')]    # laser wavelength PID reference
yvals = df[('REC', 'WLM?2')]   # measured wavelength
ax.scatter(xvals, yvals, marker='.')

# format
ax.set_xlim([xvals.min(), xvals.max()])
ax.set_ylim([yvals.min(), yvals.max()])
ax.set_xlabel('set wavelength (nm)')
ax.set_ylabel('measured wavelength (nm)')

# output
plt.show()

# Datasets

The hdf5 datafile exists to store datasets.  In our case, these are distributed within groups. Each group represents one configuration of experimental variables (VARS), and they are numbered sequentually by the `squid`.

In [None]:
print(h5.squids)

In [None]:
# list the datasets in a particular group
squid = 1
print(h5.datasets(squid))

See 'Raw datasets.ipynb' for examples for how to access different types of hdf5 dataset.