## Loading in BMPL Datasets in HDF5 format

The typical datafile used for analysis in the BMPL is a .h5 file which uses a storage format called HDF5. To load at dataset into RAM memory for doing analysis, use the following code:

In [None]:
from load_hdf5 import load_hdf5
datafilename= 'Dataset4_wiretarget_1p5kV_1msstuff_0kAwire_density_10shots.h5'
data1=load_hdf5(datafilename,verbose=True)

You should see a long list of names that look like directory names. An HDF5 file works sort of like a filing system where arrays are stored in groups. This kind of system is called a dictionary. Within this dictionary there are two types of items: groups and datasets. Both are named using words in quotations. Groups are like the file folders and datasets are like the files---they are either numbers, strings, or arrays.

To exact a particular dataset from the file, you need to list out the group names and dataset names. For example, there is a dataset that contains magnetic field values in the z-direction (based on cylindrical coordinates) as a function of time for three probes and for 10 shots. To isolate this dataset, we use the following code:

In [None]:
data_br = data1["mag_probe"]["r"]["b"]

To determine if we extracted the right object, we can check the array size of ```data_br``` using:

In [None]:
import numpy as np
np.shape(data_br)

As you see, it lists three numbers: The first is the number of probes (there are three magnetic probes in this dataset at three different axial locations), number of shots (we took 10 total shots), and number of samples (there are 25003 timesteps recorded for this data).

Next, let's start taking a look at the data itself. The main way we will do this is by making plots of time series or time history. Let's define a variable that will contain the data for the first shot of the theta direction of magnetic field from the third probe. Remember, Python indexes lists/arrays starting at zero, not one. So the 1st element of an array is indexed with a zero, the 2nd element indexed with a one, and so on. We use a colon as a placeholder for the entire set of numbers in an array. In this case, we will call all of the samples for this shot. Conversely, if we put in a number, we would be defining ```timeseries1``` as a single number rather than a 1D array of numbers.

In [None]:
timeseries1 = data1["mag_probe"]['t']['b'][2,0,:]

To visualize this data we will plot. We need to load ```matplotlib``` which is our plotting module. A basic plotting code looks like this:

In [None]:
import matplotlib.pylab as plt
plt.plot(timeseries1)

Technically, this is magnetic field data as a function of index, not of time. Note that the numbers on the x-axis range from 0 to 25000. In order to associate these fluctuations with time, we will need to extract a time array. Look back at the list of groups produced when you first loaded in the data. You should see a group labeled as ```time```. Let's extract the time array in microseconds, ```time_us```.

In [None]:
time_us = data1["time"]["time_us"]
np.shape(time_us)

The ```np.shape()``` call shows us that ```time_us``` is indeed a 1D array of numbers which should represent the shot duration in microseconds. Now, we will plot magnetic fluctuations as a function of time.

In [None]:
plt.plot(time_us,timeseries1)

Oops, an error was raised. This was because ```time_us``` and ```timeseries1``` do not have the same number of samples. You cannot plot two arrays together if they have different numbers. The reason for the difference has to do with something that happened earlier, when we converted the data from the measured voltage into magnetic field (this happened before you got the dataset). It has to do with an integration method. At any rate, we need to adjust our ```time_us``` array to match ```timeseries1```. We do this by using colon notation within the array. Below, we use the colon to cut off the first timestep, where the ```1``` indicates the starting index and the colon indicates whatever follows that index.

In [None]:
timeB_us = time_us[1:]

Now we can try plotting again.

In [None]:
plt.plot(timeB_us,timeseries1)

Next, see Paper Plotting Formats notebook for various ways of modifying and labeling your plots.