# 📓 Notebook 1 – Exploratory Data Analysis (EDA) of Pose Outputs

## 1. Introduction & objectives

In this notebook, we will explore pose estimation outputs generated with SuperAnimal ModelZoo on 10-minute top-view mouse videos.

Learning goals:
	•	Understand the structure of .h5 output files
	•	Explore metadata and summary statistics
	•	Visualize likelihoods, trajectories, and skeletons
	•	Detect and correct errors (missing points, jumps)
	•	Compare outputs from clear vs challenging videos
	•	Prepare cleaned data for further analysis

![Mouse pose](https://raw.githubusercontent.com/LizbethMG-Teaching/pose2behav-book/main/assets/notebook-image1.png)

Imagine you are a junior researcher in a neuroscience lab. Your colleague just handed you pose estimation outputs generated with SuperAnimal ModelZoo from 10-minute videos of mice exploring an arena. Before you can ask scientific questions about locomotion, posture, or social behavior, you need to verify the quality of these model predictions. Are all the keypoints tracked reliably? Do some body parts drop out in certain conditions? How does tracking quality differ between a clear video and a more challenging one?

In this notebook, you will take the role of a data detective: opening the .h5 pose files, exploring the structure, visualizing likelihoods and trajectories, spotting errors, and applying simple corrections. By the end, you will produce a cleaned dataset and a short “quality report” that prepares you for deeper behavioral analysis in the next notebooks.


## 2. Data Loading & Format Inspection

👉 Goal: learn to open .h5 files and understand their structure.
	•	Load one file into a pandas DataFrame
	•	Inspect columns: scorer, bodypart, x, y, likelihood
	•	Count frames and list bodyparts

Exercise 1:
List all detected bodyparts and classify them as head / body / tail.

## 3. Metadata & basic summary

## 4. Likelihood distributions

## 5. Time series inspection

## 6. Spatial distributions

## 7. Visual diagnostics 

In [None]:


# Install helpers (quiet)
!pip -q install gdown tables

import gdown, os, pandas as pd, numpy as np
from pathlib import Path

# 👇 Replace with your real Google Drive FILE ID (not the whole link!)
FILE_ID = "11zcVPSS4D-JLQQ11hkMbPwmqs-cd6Am2"

# Build a direct-download URL for gdown
URL = f"https://drive.google.com/uc?id={FILE_ID}"

DEST = Path("/content/dlc_output.h5")
print("Downloading from Drive...")
gdown.download(URL, str(DEST), quiet=False)

# Quick sanity check
assert DEST.exists() and DEST.stat().st_size > 0, "Download failed or empty file."
print(f"✅ Downloaded to {DEST} ({DEST.stat().st_size/1_000_000:.2f} MB)")


In [None]:
# DeepLabCut H5 often stores under keys like 'df_with_missing' or 'df'
# We'll try common keys, and fall back to listing what's available.

def load_dlc_h5(path: Path):
    try:
        # Try default (let pandas pick)
        return pd.read_hdf(path)
    except (KeyError, ValueError):
        # Inspect keys and try common ones
        with pd.HDFStore(path, mode="r") as store:
            keys = [k.strip("/") for k in store.keys()]
        print("Available keys in H5:", keys)
        for k in ["df_with_missing", "df", "pose", "table"]:
            if k in keys:
                return pd.read_hdf(path, key=k)
        # Last resort: first key
        if keys:
            return pd.read_hdf(path, key=keys[0])
        raise RuntimeError("No readable tables found in this H5.")

df = load_dlc_h5(DEST)
print("✅ Loaded H5 into DataFrame:", df.shape)
display(df.head(3))


In [None]:
# DLC H5 columns are often a MultiIndex: (scorer, bodypart, coord)
if isinstance(df.columns, pd.MultiIndex):
    df.columns = ["{}/{}/{}".format(*lvl) for lvl in df.columns]
print("Columns (first 10):")
print(df.columns[:10])


In [None]:
import matplotlib.pyplot as plt

# Try to find any '/x' and matching '/y' columns
xcols = [c for c in df.columns if c.endswith("/x")]
assert len(xcols) > 0, "Couldn't find any '/x' columns. Inspect df.columns."
xcol = xcols[0]
ycol = xcol[:-2] + "/y"

plt.figure()
plt.plot(df[xcol].values, -df[ycol].values)  # invert y for display
plt.title(f"Trajectory preview: {xcol.split('/')[1] if '/' in xcol else xcol}")
plt.xlabel("x"); plt.ylabel("y (top=up)")
plt.show()


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/LizbethMG-Teaching/pose2behav-book/blob/main/notebooks/EDA.ipynb)]