<a href="https://colab.research.google.com/github/Toshea111/sleap/blob/develop/Convert_HDF5_to_CSV_updated.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First, we'll upload a HDF5 file that was generated from within the SLEAP GUI. This can be created by opening a tracked project file (`.slp`) and going to **File** -> **Export Analysis HDF5...**

Note that you can also upload the file by navigating the sidebar on the left side of the page in Colab.

In [None]:
from google.colab import files

uploaded = files.upload()
h5_filepath = list(uploaded.keys())[0]

print(f"h5_filepath = {h5_filepath}")

Saving flies_example.00.analysis.h5 to flies_example.00.analysis.h5
h5_filepath = flies_example.00.analysis.h5


Once you have the file uploaded, let's open it, load its contents and inspect the data.

In [None]:
import numpy as np
import pandas as pd
import h5py

# Open the HDF5 file using h5py.
with h5py.File(h5_filepath, "r") as f:

  # Print a list of the keys available.
  print("Keys in the HDF5 file:", list(f.keys()))

  # Load all the datasets into a dictionary.
  data = {k: v[()] for k, v in f.items()}

  # Here we're just converting string arrays into regular Python strings.
  data["node_names"] = [s.decode() for s in data["node_names"].tolist()]
  data["track_names"] = [s.decode() for s in data["track_names"].tolist()]

  # And we just flip the order of the tracks axes for convenience.
  data["tracks"] = np.transpose(data["tracks"])

  # And finally convert the data type of the track occupancy array to boolean.
  # We'll see what this array is used for further down.
  data["track_occupancy"] = data["track_occupancy"].astype(bool)


# Describe the values in the data dictionary we just created.
for key, value in data.items():
  if isinstance(value, np.ndarray):
    print(f"{key}: {value.dtype} array of shape {value.shape}")
  else:
    print(f"{key}: {value}")

Keys in the HDF5 file: ['node_names', 'track_names', 'track_occupancy', 'tracks']
node_names: ['head', 'thorax', 'abdomen', 'wingL', 'wingR', 'forelegL4', 'forelegR4', 'midlegL4', 'midlegR4', 'hindlegL4', 'hindlegR4', 'eyeL', 'eyeR']
track_names: ['track_0', 'track_1']
track_occupancy: bool array of shape (14000, 2)
tracks: float64 array of shape (14000, 13, 2, 2)


The `data["tracks"]` array has the raw tracking coordinates, with axes corresponding to `(frames, nodes, xy, tracks)`.

In this case we don't have data for every frame since we just tracked a small clip; this is indicated by the `data["track_occupancy"]` array.

First, let's find all the frames that have at least one animal tracked.

In [None]:
valid_frame_idxs = np.argwhere(data["track_occupancy"].any(axis=1)).flatten()
valid_frame_idxs

array([12255, 12256, 12257, ..., 13997, 13998, 13999])

Great, so now let's build up a `tracks` table where each row contains the detected body part coordinates for a single animal in a single frame.

In [None]:
tracks = []
for frame_idx in valid_frame_idxs:
  # Get the tracking data for the current frame.
  frame_tracks = data["tracks"][frame_idx]

  # Loop over the animals in the current frame.
  for i in range(frame_tracks.shape[-1]):
    pts = frame_tracks[..., i]
    
    if np.isnan(pts).all():
      # Skip this animal if all of its points are missing (i.e., it wasn't
      # detected in the current frame).
      continue
    
    # Let's initialize our row with some metadata.
    detection = {"track": data["track_names"][i], "frame_idx": frame_idx}

    # Now let's fill in the coordinates for each body part.
    for node_name, (x, y) in zip(data["node_names"], pts):
      detection[f"{node_name}.x"] = x
      detection[f"{node_name}.y"] = y

    # Add the row to the list and move on to the next detection.
    tracks.append(detection)

# Once we're done, we can convert this list of rows into a table using Pandas.
tracks = pd.DataFrame(tracks)

tracks.head()

Unnamed: 0,track,frame_idx,head.x,head.y,thorax.x,thorax.y,abdomen.x,abdomen.y,wingL.x,wingL.y,wingR.x,wingR.y,forelegL4.x,forelegL4.y,forelegR4.x,forelegR4.y,midlegL4.x,midlegL4.y,midlegR4.x,midlegR4.y,hindlegL4.x,hindlegL4.y,hindlegR4.x,hindlegR4.y,eyeL.x,eyeL.y,eyeR.x,eyeR.y
0,track_0,12255,675.095215,447.150055,640.43573,439.25296,608.769897,433.792969,591.232117,422.372589,590.235718,429.915405,693.240051,446.778748,686.841553,471.311646,670.471802,408.143524,649.016663,481.786407,616.57605,412.050201,605.569031,458.284271,668.248779,434.470459,664.264526,455.357605
1,track_1,12255,775.279053,441.41272,782.961548,485.112,787.437317,517.084778,785.228271,537.595581,793.324158,537.482361,753.108948,432.660736,786.360291,418.087067,739.537231,465.65451,802.052368,454.832794,763.61084,519.440247,814.246155,501.140991,765.125122,452.851868,788.234009,449.416565
2,track_0,12256,675.054016,447.212982,640.519043,439.238403,608.763,433.694153,591.198669,422.476471,590.340515,429.932983,693.085205,446.820618,686.848145,471.278412,670.431274,408.254486,649.037659,481.755127,616.490295,412.099854,605.625183,458.365356,668.273132,434.447754,664.266113,455.357422
3,track_1,12256,774.891846,441.471863,783.032959,485.260071,787.524841,516.55426,785.242004,537.591858,793.64856,537.606079,753.396484,447.158203,782.449829,416.640808,739.760864,465.682037,802.204773,454.997894,763.507019,519.817871,814.236877,501.171417,764.985535,453.07608,787.857849,449.323792
4,track_0,12257,675.102112,447.229248,640.517639,439.304169,608.589233,433.605316,591.185974,422.511292,590.349304,429.735077,693.255249,446.874512,686.856079,471.305237,670.476379,408.137573,649.045105,481.833374,616.544006,412.152802,605.600952,458.374969,668.335266,434.51947,664.268799,455.382538


Finally, we can save the table we just generated into a CSV file and download it for further analysis.

In [None]:
tracks.to_csv("tracks.csv", index=False)
files.download("tracks.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>