# Hawk data starboard wing data API

The aim of this notebook is to guide the user through the use of the API package `hawk` for interacting with the dataseries collected as part of the DTHIVE project ins 2022. In order to use this notebook, the `hawk` package is required. The package is freely available and can be installed with pip (requres python 3.9+):

`pip install git+https://github.com/MDCHAMP/hawk-data`

For more information about the test, head to the [INSERT LINK HERE]

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from hawk import SBW

## Basic usage

The `hawk` package contains the `hawk.SBW` function for interacting with the data from the starboard wing test. During testing, data were collected in both the time and frequency domains as part of two test campaigns named after the measurement equipment. These are:

- `LMS` (Frequency domain data)
- `NI` (Time domain data)

The `hawk.SBW` function provides a convenient wrapper for both of these test campaigns. To start exploring the data, simply call the function with a single argument of the path in which the data should be saved on disk.

It is the intention of the authors that the hawk data should be entirely self-describing. In order to facilitate this, the hawk package implements two functions `describe` and `explore`. Let us see the effect of these functions now.

In [None]:
data_dir = "./hawk_data"
data = SBW(data_dir)
data.describe()

During the test campaigns a great deal of data were collected. Within the `data` object we created above, the various test series and signals are organised like a file-tree structure. The `describe` method above returns information pertaining to where in the tree we are currently. The `explore` method provides a look at what is contained within the tree beneath us. Lets see the result of calling the `explore` method on `data` (the top of the tree).

In [None]:
data.explore()

As expected this producdes a structured description of which data are available as well as data that is available but is not yet downloaded. 

Lets now try to access a test series that may or may not be downloaded in `data_dir` form the LMS campaign.

In [None]:
series = 'BR_AR' # i.e Burst-random amplitude-ramp 
rep = '01'
test_series = data["LMS"][series][rep]
test_series.explore()

Just by accessing the data in our code, the relevant files have been downloded and saved to disk in `data_dir`. If we were to access the data again, the downloaded copy wold be used automatically. 

Looking at the output of the `explore` function, we can see that there are a number of sensor addresses and for each one we are able to access three signals; the coherence, spectra and FRF. These signals are directly recovered form the 'LMS-box' measurement system.

Lets now take a look at the output of the `describe` function:

In [None]:
test_series.describe()

This is simply a python `dict` with all the details of the test setup, senor metadata and notes from the operators -- handy!

Lets look now at one of the sensors. Sensors are accesed by their key. For more information on sensor locations please see [LINK TO PIC].

In [None]:
sensor = "LLC-07"  # Lower leading edge centre position 7 (wing tip)
sensor_data = test_series[sensor]
sensor_data.explore()

As expected, there are three channels (`frf` is simply shorthand for `frequencyResponseFunction`). 

Lets (finally) plot the data:

In [None]:
signal = "frequencyResponseFunction"
frfs = sensor_data[signal]
print(frfs.shape) # (spectralLines, Nrepeats)

# Grab the frequency information from the xData group
fs = data['LMS/xData/freq'] 

plt.figure()
plt.semilogy(fs[:], np.abs(frfs[:]))
# note that the units can be pulled directly from the data
plt.xlabel(f"{fs.attrs['measurement']} ({fs.attrs['units']})") 
plt.ylabel(f"{frfs.attrs['measurement']} ({frfs.attrs['units']})")
plt.show()

Note that the metadata corresponding to the signals we are plotting is available in the `attrs` attribute of the dataset - how convenient!

So far we have taken a roundabout may of accessing the FRFs. Instead, we could have just used the path to the data we were interested in:

In [None]:
# note that we can access data directly from its path
frfs = data['/LMS/BR_AR/01/LLC-01/frf']
# or even
frfs = data[f'/LMS/{series}/{rep}/{sensor}/{signal}']
frfs.describe()

Note the general form of the path:

`/{campaign}/{series}/{rep}/{sensor}/{signal}`

For more information on the possible values of these fields please see the data report available at [LINK TO DATA REPORT]. 

Lets now take a look at some of the time series data:

In [None]:
ts = data['NI/xData/time']
accs = data['NI/RPH_AR/01/LLC-01/acc']
print(accs.shape) # (timePoints, Nrepeats)

plt.figure()
plt.plot(ts[:], accs[:])
plt.xlabel(f"{ts.attrs['measurement']} ({ts.attrs['units']})") 
plt.ylabel(f"{accs.attrs['measurement']} ({accs.attrs['units']})")
plt.show()

# Can access a full description of the test setup
accs.describe()

In just a few lines of python code we have downloaded and plotted the dat alongside all of the relevant metadata. 

## Advanced/production usage

So far we have seen how to access and explore the data. Lets now cover some more realistic usage scenarios.

Under the hood, all the data that is downloaded is in `.hdf5` format. This means that instead of using the `hawk` package you could instead use any off the shelf `.hdf5` viewing software. In fact, the `hawk` package is actually just a very simple wrapper for the `h5py` [link here] packcage that adds functionality for the `describe`, `explore` methods and manages automatic downloads -- neat!

Because the data is stored in `.hdf5` format, it is only read from disk when it is needed by the python program. This means that python keeps open (several) file objects while interacting with the data. To prevent corruption and other bugs, files in python should always be closed when they are not bieng used. Thankfully, the `SBW` function support python context managers. 

In [None]:
with SBW(data_dir) as data:

    frfs = data['/LMS/BR_AR/01/LLC-01/frf']
    frfs_numpy = np.array(data['/LMS/BR_AR/01/LLC-01/frf'])

try:
    data['/LMS/BR_AR/01/LLC-01/frf'].shape # This fails because the file is now closed
except ValueError:
    pass

frfs.shape # this is ok because the variable frfs is still referenced BUT internal references may be broken
frfs_numpy.shape # this will always be ok and should be considered best practice

Notice that once outside the context manager, the `data` object can no longer be interacted with. In the above, the variable `frf` maintains a link to the file object and so can still be accessed, however, internal links may be broken and so this behavouir should also be avoided.

You may have noticed in the plotting code above that the data were sliced into the `plot` function. This is because although the underlying `h5py.Dataset` implements some of the same functionality as `np.ndarray`, it cannot be considered a drop-in replacement.  

For best practice in production code, the required data should be cast to numpy arrays inside the context manager. There are several ways to achive this:

In [None]:
with SBW(data_dir) as data:

    frf1 = data['/LMS/BR_AR/01/LLC-01/frf']
    frf2 = np.array(data['/LMS/BR_AR/01/LLC-01/frf']) # return np.ndarray
    frf3 = data['/LMS/BR_AR/01/LLC-01/frf'][:] # also returns np.ndarray
    frf4 = data['/LMS/BR_AR/01/LLC-01/frf'][:100] # also returns np.ndarray (simple sliceing only)

print(type(frf1))
print(type(frf2))
print(type(frf3))
print(type(frf4))


### Dataset structure

In order to avoid downloading all the data every time, the dataset has been divided into a number of independent files, each one correspondiong to one of the test series repeats. Overall there are 71 test series. 

The `hawk` package relies on a single 'header' `.hdf5` file for accessing all of the data simultaneously without loading it all in to memory (or even having all of the data on disk). This works thanks to the `ExternalLink` feature of the `.hdf5` spec, more details of which can be found at [LINK HERE].