In [None]:
%pip install h5py numpy

In [1]:
import numpy as np
import h5py

## Writing Mixed-Type, Labelled, Data and Metadata to HDF5 Files using `h5py`

HDF5 files have a lot more features:
  - They are highly cross-platform and work with a wide variety of tools
  - They can store many different datasets in a single file (or even in multiple linked files)
  - They can store metadata alonside the data
  - They let you store data hierarchically, making a nice dict-like nested organization for your data
  - They can compress your data.
  - They let you work with data that is larger than memory, making it easy to read in only the data that you need.
  - They can be easily previewed and inspected using the https://myhdf5.hdfgroup.org/ web tool!
  
So many features!  Here, we'll get a basic senses of how they work by using the `h5py` library, which gives us a dict-like, Numpy-native interface to HDF5 files and is used internally by many popular Python frameworks.



| Code | Description |
| :-- | :-- |
| **`f = h5py.File('filename.h5', 'w')`** | Open an h5py file object for writing |
| **`f.close()`** | Closes the h5py file and releases the linked file back to the operating system. |
| **`f.create_dataset('temp', data=x)`** | Write an array called 'temp' with the data in the numy array `x` into the HDF5 file |
| **`f.create_dataset('data/temp', data=x)`** | Write an array called 'temp' in the folder called "data" with the data in the numy array `x` into the HDF5 file |
| **`f.attrs['name'] = 'Session 1'`** | Set an attribute as metadata onto the root group of the HDF5 file -- this works like a normal Python dictionary |
| **`f['x'].attrs['id'] = 'ABC'`** | Set an attribute as metadata onto the 'x' node of the HDF file |



### Storing Arrays in HDF5 as "Datasets" and Organizing them in "Groups"

| Code | Description |
| :-- | :-- |
| **`f = h5py.File('filename.h5', 'w')`** | Open an h5py file object for writing |
| **`f.close()`** | Closes the h5py file and releases the linked file back to the operating system. |
| **`f['x'] = np.array([1, 2, 3])`** | Put a Numpy array into a **dataset** called `x` |
| **`f['folder/x'] = np.array([1, 2, 3])`** | Put a Numpy array into  **dataset** called `x`, in a **group** called `folder` |
| **`f.create_dataset('folder/x', shape=(100,2), dtype=np.uint8)`** | Make an empty dataset in the file |
| **`f.create_dataset('folder/x', data=my_array, compression='gzip')`** | Store a numpy array as a dataset using a given compression algorithm. |
| **`f['x'][:]`** | Read in the dataset into a Numpy array. |

**Exercises**

**Exercise: Open and Close a File**.  Open an HDF5 file named `exercise1.h5` for storing EEG data, then close it.



**Exercise: Store a Small Array of Spike Counts**. Store a small 1D NumPy array of spike counts in a file called `exercise2.h5`, in a dataset called `spike_counts`.


**Exercise: Read Back the Spike Counts**.  Read the `spike_counts` dataset from `exercise2.h5`.

**Exercise: Create a Dataset Inside a Folder for LFP**. Create a dataset named `lfp_data` inside a group called `session1`.

**Exercise 5: Read the LFP Dataset from the Folder**.  Read the `lfp_data` from the `session1` group in `exercise4.h5`.


**Exercise: Create an Empty Dataset for Neuron Responses**. Create an empty dataset called `neuron_responses` with shape (100, 2).


**Exercise: Create and Immediately Fill an Empty Dataset for Trials**. Create an empty dataset called `trials` and fill it with random integers.

**Exercise: Compressed Dataset for EEG**.  Create a compressed dataset called `eeg_data` using gzip.

**Exercise: Read the Compressed EEG Data**. Read the `eeg_data` dataset from the previous exercise. 

**Exercise: Store a 2D Array of Spike Trains in a Group**  Store a 2D array of spike trains in a group called `recordings`.


**Exercise: Read the Spike Trains**.  Read the `spike_trains` dataset.

**Exercise: Put Multiple Datasets into a Schema**:  Create a file with the following organization:

  - behavior/
    - trial: 10 values, uint8
    - time: 100 values,  float32 
    - lick_rate: 10 x 100 values, float32
  - ephys/
    - spike_times: 50 values, float64


### Storing Metadata in HDF5 as "Attributes"

Labeling our data and keeping those labels together with the data itself is a key practice in having well-organized data, and hdf5 makes it easy to put labels directly inside the files.  

One practice to be aware of:  
  - General metadata for the file is often attached to the "root" group, so it's easy to find.  Things like experiment-level or session-level metadata are often stored there.  
  - Metadata that describes a specific array are often attached to the hdf5 dataset itself.  Things like dimension labels, units, text descriptions of what the data represents can usually be found here.

.

| Code | Description |
| :-- | :-- |
| **`f.attrs['subject'] = 'Doug'`** | Add a "subject" attribute to the root-level group. |
| **`f['x'].attrs['subject'] = 'Doug'`** | Add a "subject" attribute to the 'x' group or dataset. |
| **`dict(f.attrs)`** | Get all the attributes attached to the root-level group as a dict.
| **`dict(f['x'].attrs)`** | Get all the attributes attached to the 'x' group or dataset as a dict.


**Exercises**

**Exercise**: Create a file with the following schema:

```
meta_ex1.h5
└─ / (root)
   └─ subject_id = "RatA"  (attribute)
```



**Exercise**: Reopen `meta_ex1.h5` and get the "subject_id" attribute value from the file.

**Exercise**: Create a file with the following schema:
```
meta_ex2.h5
└─ / (root)
   └─ lfp_data (dataset, shape=(100,))
      └─ recording_date = "2025-03-31"  (attribute)
```

**Exercise:** Reopen `meta_ex2.h5` and the `recording_date` attribute.

**Exercise**: Create a file with the following schema:

```
meta_ex3.h5
└─ / (root)
   ├─ experimenter = "Dr. Gray"       (attribute)
   ├─ experiment_type = "Optogenetics" (attribute)
   └─ lab = "NeuroLab"                (attribute)
   └─ spike_trains (dataset, shape=(4,50))
      ├─ brain_region = "Hippocampus"   (attribute)
      └─ num_channels = 4              (attribute)
```



**Exercise:** Reopen `meta_ex3.h5` and get all the root-level metadata into a dict.

### Labeling Dimensions of Datasets by Linking Arrays