In [None]:
%pip install h5py numpy

In [None]:
import numpy as np
import h5py

## Storing Scientific Data into HDF5 Files with `h5py`

In this section, we'll get a sense of how libraries that help us store scientific data into binary files in a way that gives us both flexibility and simplicity!

Particularly, we'll end up looking at HDF5 ('.h5', '.hdf', '.hdf5) files, which are used as the core file format for a wide variety of popular data formats, and which works as the basic for the NIX files we'll be working with later on to store complex neuroscience data.

### Writing Mixed-Type, Labelled, Data and Metadata to HDF5 Files using h5py

HDF5 files have a lot more features:
  - They are highly cross-platform and work with a wide variety of tools
  - They can store many different datasets in a single file (or even in multiple linked files)
  - They can store metadata alonside the data
  - They let you store data hierarchically, making a nice dict-like nested organization for your data
  - They can compress your data.
  - They let you work with data that is larger than memory, making it easy to read in only the data that you need.
  - They can be easily previewed and inspected using the https://myhdf5.hdfgroup.org/ web tool!
  
So many features!  Here, we'll get a basic senses of how they work by using the `h5py` library, which gives us a dict-like, Numpy-native interface to HDF5 files and is used internally by many popular Python frameworks.

| Code | Description |
| :-- | :-- |
| **`f = h5py.File('filename.h5', 'w')`** | Open an h5py file object for writing |
| **`f.close()`** | Closes the h5py file and releases the linked file back to the operating system. |
| **`f.create_dataset('temp', data=x)`** | Write an array called 'temp' with the data in the numy array `x` into the HDF5 file |
| **`f.create_dataset('data/temp', data=x)`** | Write an array called 'temp' in the folder called "data" with the data in the numy array `x` into the HDF5 file |
| **`f.attrs['name'] = 'Session 1'`** | Set an attribute as metadata onto the root group of the HDF5 file -- this works like a normal Python dictionary |
| **`f['x'].attrs['id'] = 'ABC'`** | Set an attribute as metadata onto the 'x' node of the HDF file |



In [None]:
# %pip install h5py numpy

In [None]:
import h5py
import numpy as np

**Exercises**

**Example**: Write an HDF5 File called `temp.h5` with the following schema:
```
root/
└── temp: uint16, 1000 x 1  (temperature measurements over time)
```

In [None]:
temp = np.random.randint(15, 22, size=(1000,1)).astype(np.uint16)


In [None]:
with h5py.File('temp.h5', 'w') as f:
    f.create_dataset('temp', data=temp)
    f['temp'].attrs.update({
        'description': 'Temperature measurements over time'
    })



Write an HDF5 File called `ephys.h5` with the following schema and descriptions:

```
root/
├── time: float32, 1 x 1000 (trial time, in seconds)
├── voltage: int16, 4 x 1000 (voltage measurements for each recording channel)
└──chan_names: S, 4 (channel names for each recording channel)
```

In [None]:
time = np.linspace(0, 3, 1000).astype(np.float32)
voltage = np.random.normal(1, 1, size=(4, 1000)).astype(np.float32)
chan_names = ['CH01', 'CH02', 'CH03', 'CH05']

Write an HDF5 File called `motion_tracking.h5` with the following schema (feel free to skip the descriptions this time):
```
root/
├── session_date: str
├── subject_id: str
├── camera: 
│   ├── black_noise_image: uint16, 640 x 640 x 3 (reference image taken with lights out)
│   ├── image_width: uint16
│   ├── image_height: uint16
│   ├── shutter_speed: uint16
│   └── aperture: float32
│
└── motion_tracking: 
    ├── time: uint32 1 x 3000 (session time, in milliseconds)
    ├── rb_pos: float32  2 x 3 x 3000 (XYZ coordinates of the center of each tracked rigid body)
    ├── rb_rot: float32  2 x 3 x 3000 (XYZ Euler rotations of each tracked rigid body)
    ├── xyz_names: str 1 x 3 (The spatial coordinate names)
    └── rb_names: str 1 x 2 (The name of each rigid body)


```

In [None]:
session_date = '2024-04-22'
subject_id = 'AD11'
camera_black_noise_im = np.random.randint(0, 30, size=(640, 640)).astype(np.uint16)
im_width = 640
im_height = 640
shutter_speed = 800
aperture = 2.8
motion_time = (np.arange(0, 1000, step=1/shutter_speed)[:3000] * 1000).astype(np.uint32)
rb_pos = np.random.random(size=(2, 3, 3000)).astype(np.float32)
rb_rot = np.random.random(size=(2, 3, 3000)).astype(np.float32)
xyz_names = ['X', 'Y', 'Z']
rb_names = ['head', 'tail_base']

### Reading Data from HDF5 Files

| **Code** | **Description** |
| :-- | :-- |
| **`f = h5py.File('file.h5')`** | Opens an h5py file object for reading |
| **`f.close()`** | Closes the h5py file and releases the linked file back to the operating system. |
| **`f.keys()`** | See a list of datasets and groups at the root node |
| **`f.attrs`** | Get the dict-like attributes at the root node |
| **`f.attrs['a']']`** | Get the 'a' attribute at the root node  |
| **`f['x'][:]`** | Read in the 'x' dataset as a numpy array |
| **`f['x'][5:20]`** | Read in a slice of the 'x' dataset as a numpy array |
| **`f['x'].keys()`** | See a list of datasets and groups at the 'x' node |
| **`f['folder']['x']`** | Get tthe 'x' dataset in the 'folder' group |
| **`f['folder/x']`** | (Alternative Syntax) Ge tthe 'x' dataset in the 'folder' group |





**Exercises**

**Example**: From the temperature file, read in only the last 5 temperature measurements as a numpy array.

In [None]:
f = h5py.File('temp.h5')
temp = f['temp'][-5:, :]
f.close()
temp

array([[16],
       [21],
       [21],
       [16],
       [17]], dtype=uint16)

From the ephys file, read in the first 10 voltage measurements as a numpy array.

from the ephys file, get the name of the second recording channel.

From the ephys file, get the description of the voltage dataset

From the motion tracking file, get the all the XYZ positions of the first rigid body during the recording.

From the motion tracking file, get the camera's shutter speed during the session