In [None]:
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# The hdf5 data file

This notebook provides examples for accessing data within an oskar hdf5 datafile.

In [None]:
from e11 import run_file, H5Scan, H5Data

`run_file()`  
    - A function that generates the path to the data file using the run ID and base directory.

`H5Scan`
    - A class that provides a convienient interface for simple hdf5 data files with no groups.
`H5Data`  
    - A class that provides a convienient interface for more complicated hdf5 data files with groups.

Normally, the datafiles files would be saved in a timestamp structure and each can be found using the `rid`.  The path to the file can then be built using `run_file`.

``` python
>>> fil = run_file(base="Q:\E11_atmos\data", rid='20171127_155753')
```

But for this example we'll use the example data.

## H5Scan

`H5Scan` provides a very simple interface for files which only contain datasets (no groups).

In [None]:
import os 
fil = os.path.join(os.getcwd(), 'example_data', 'microwave_scan.h5')
scan = H5Scan(fil)
# root attributes
scan.attrs()

### Datasets
To list the datasets found in the file, 

In [None]:
scan.datasets

Or to get the attributes associated with a dataset

In [None]:
scan.attrs('analysis')

Datasets can be accessed using `h5.array()`, for array data, or `h5.df()` for DataFrame data, e.g., 

In [None]:
scan.array('osc_0')

In [None]:
scan.df('analysis').head()

See `2)_Raw_datasets.ipynb` for further examples.

## H5Data

More complicated files can be accessed using `H5Data`, which expects to find the datasets divided between groups.  `H5Data` is simular to `H5scan`, but it has a few extra features to help keep track of the relationship between groups and the datasets within them.

In [None]:
fil = os.path.join(os.getcwd(), 'example_data', 'laser_data.h5')
data = H5Data(fil)
data.pprint()

Here, `data` is an instance of the H5Data class.  

A `pandas.DataFrame` summary of the attributes of each group can be accessed as `data.log`. This can be rebuilt at any time using `data.update_log()`.

In [None]:
# In our case building the log doesn't take very long.
%time data.update_log()

In [None]:
# log output
data.log.head()

Experimental settings are stored in the log file as VARS and measurements as RECS.

In [None]:
from e11.tools import add_column_index

In [None]:
# combine VAR and REC data
df = add_column_index(data.var, 'VAR').join(add_column_index(data.rec, 'REC'))
df.head()

In [None]:
# plot
fig, ax = plt.subplots()

# data
xvals = df[('VAR', 'WL?1')]    # laser wavelength PID reference
yvals = df[('REC', 'WLM?2')]   # measured wavelength
ax.scatter(xvals, yvals, marker='.')

# format
ax.set_xlim([xvals.min(), xvals.max()])
ax.set_ylim([yvals.min(), yvals.max()])
ax.set_xlabel('set wavelength (nm)')
ax.set_ylabel('measured wavelength (nm)')

# output
plt.show()

### Groups

The datasets are distributed within groups. Each group represents one configuration of experimental variables (VARS), and they are numbered sequentually by the `squid`.

In [None]:
print(data.squids)

To discover the settings relating to a particular group check the log,

In [None]:
squid = 1
data.log.loc[squid]

which is equivilent to,

In [None]:
data.attrs(squid)

### Datasets

To list the datasets in a particular group,

In [None]:
squid = 1
print(data.datasets(squid))

Or to see the attributes of a particular dataset

In [None]:
data.attrs(squid, 'WLM')

See 'Raw datasets.ipynb' for examples for how to access the data within different types of hdf5 dataset.