# 📓 Notebook 1 – Exploratory Data Analysis (EDA) of Pose Outputs

## 1. Introduction & objectives

In this notebook, we will explore pose estimation outputs generated with SuperAnimal ModelZoo on 10-minute top-view mouse videos.

**Learning goals:**
- Understand the structure of .h5 output files
- Explore metadata and summary statistics
- Visualize likelihoods, trajectories, and skeletons
- Detect and correct errors (missing points, jumps)
- Compare outputs from clear vs challenging videos
- Prepare cleaned data for further analysis

--- 
**Instructions**

This notebook mixes pre-filled code cells (nothing to change) and coding exercises that you will complete.

👉 Here’s how to work through it:
1. Read carefully each section before running the cells.
2.	When a cell requires you to code, you’ll see a TODO comment.
3.	The TODO will tell you how many lines to write.
4.	Write your code only between the markers:
    
```python
# >>>>>>>>>>>>>>>>>>>
# your code goes here
# <<<<<<<<<<<<<<<<<<<
```

✋ Do not edit anything outside these markers.

⚡ After finishing the course, feel free to experiment and modify the notebook as you like!

✨ Example

What you will see in the notebook:

```python
# >>>>>>>>>>>>>>>>>>>
# TODO (2 lines): compute the duration of the video and print it 
# variables: frame_count, fps
# YOUR CODE: duration = 
# YOUR CODE: print(...)
# <<<<<<<<<<<<<<<<<<< 
```

What you are expected to write: 

```python
# >>>>>>>>>>>>>>>>>>>
# TODO (2 lines): compute the duration of the video and print it 
# variables: frame_count, fps
duration = frame_count / fps
print(f"Duration (s):", duration)
# <<<<<<<<<<<<<<<<<<< 
```
---
  

<img src="https://raw.githubusercontent.com/LizbethMG-Teaching/pose2behav-book/main/assets/notebook-image1.png" width="50%">

**Narrative**

Imagine you are a junior researcher in a neuroscience lab. Your colleague just handed you pose estimation outputs generated with SuperAnimal ModelZoo from 5-minute videos of mice exploring an arena. Before you can ask scientific questions about locomotion, posture, or social behavior, you need to verify the quality of these model predictions. What are the keypoints tracked? Are all the keypoints tracked reliably? Do some body parts drop out in certain conditions? 

In this notebook, you will take the role of a data detective: opening the .h5 pose files, exploring the structure, visualizing likelihoods and trajectories, spotting errors, and applying simple corrections. By the end, you will produce a short “quality report” that prepares you for deeper behavioral analysis in the next notebooks.

--- 


## 2. Data Loading & Format Inspection

👉 Goal: learn to open .h5 files and understand their structure.
- Load one file into a pandas DataFrame
- Inspect columns: scorer, bodypart, x, y, likelihood
- Count frames and list bodyparts

### 2.1 Download data (prefilled)

**📋 Instructions:**
1. Run the code cell below to download a dataset file from Google Drive (with gdown), save it locally (path depends on Colab vs local), and verify the download.

In [None]:
# PREFILLED, NO NEED TO CHANGE, JUST RUN THIS CELL
# Install and import the required libraries:
!pip -q install gdown tables

import os
from pathlib import Path
import gdown, pandas as pd, numpy as np
from IPython.display import display

# --------------------------------------------------------------

# Detect if running in Google Colab
if "COLAB_RELEASE_TAG" in os.environ or "COLAB_GPU" in os.environ:
    DEST = Path("/content/dlc_output.h5")
else:
    DEST = Path("dlc_output.h5")  # save in current folder locally
print("Saving to:", DEST)

# Select here the experiment you want to download, comment the others:
# Opt 1: Single mouse - arena with bedding
FILE_ID = "1JEpAtkANcXLb9Tsg0GrdNxjQTlx3edlk"
# Opt 2: Single mouse - arena without clear floor
# FILE_ID = 
# Opt 3: Single mouse - beatbox
#FILE_ID = "11zcVPSS4D-JLQQ11hkMbPwmqs-cd6Am2"

URL = f"https://drive.google.com/uc?id={FILE_ID}"

print("Downloading from Drive...")
_ = gdown.download(URL, str(DEST), quiet=False)

# Basic checks
assert DEST.exists() and DEST.stat().st_size > 0, "❌ Download failed or empty file."
print(f"✅ Downloaded to {DEST} ({DEST.stat().st_size/1_000_000:.2f} MB)")

Saving to: dlc_output.h5
Downloading from Drive...


Downloading...
From: https://drive.google.com/uc?id=1JEpAtkANcXLb9Tsg0GrdNxjQTlx3edlk
To: /Users/lix/Library/CloudStorage/OneDrive-Personnel/3-work/teaching/2025_BehavioralAnalysis/pose2behav-book/notebooks/dlc_output.h5
100%|██████████| 58.5M/58.5M [00:09<00:00, 5.96MB/s]

✅ Downloaded to dlc_output.h5 (58.47 MB)





### 2.2 Load the H5 into a DataFrame, explore the content.

**📋 Instructions:**

1. Load the HDF5 pose estimation output into a Pandas DataFrame using the provided `read_pose_h5()` function.
2. 🧩 Complete the line marked with `# TODO` to run a basic sanity check: 
- Verify that the DataFrame `df` is **not empty** (has at least one row).  


In [None]:
# Load the HDF5 pose output into a pandas DataFrame.

def read_pose_h5(path: Path) -> pd.DataFrame:
    for key in ("df_with_missing", "df", "tracks", "pose"):
        try:
            return pd.read_hdf(path, key=key)
        except Exception:
            pass
        
    return pd.read_hdf(path)

df = read_pose_h5(DEST)

# Basic sanity check

# >>>>>>>>>>>>>>>>>>>
# TODO: Check that the DataFrame "df" has at least 1 row.
# Clue: ( assert <logical statement>, “message to return if assertion fails” )
#   If the file was loaded but empty (no rows), this condition is False.
#   If condition is True, nothing happens, code continues.
# YOUR CODE HERE: assert ...
assert df.shape[0] > 0, "Empty DataFrame after loading. Check file."
# <<<<<<<<<<<<<<<<<<<

print("✅ H5 loaded.")

# Show dataframe info
print("📊 Data shape:", df.shape)
# Display the first 5 rows as a nice HTML table
display(df.head())

✅ H5 loaded.
📊 Data shape: (9000, 810)


scorer,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004,superanimal_topviewmouse_snapshot-fasterrcnn_resnet50_fpn_v2-004_snapshot-hrnet_w32-004
individuals,animal0,animal0,animal0,animal0,animal0,animal0,animal0,animal0,animal0,animal0,...,animal9,animal9,animal9,animal9,animal9,animal9,animal9,animal9,animal9,animal9
bodyparts,nose,nose,nose,left_ear,left_ear,left_ear,right_ear,right_ear,right_ear,left_ear_tip,...,right_midside,right_hip,right_hip,right_hip,tail_end,tail_end,tail_end,head_midpoint,head_midpoint,head_midpoint
coords,x,y,likelihood,x,y,likelihood,x,y,likelihood,x,...,likelihood,x,y,likelihood,x,y,likelihood,x,y,likelihood
0,783.023438,262.625,1.0,797.960938,273.875,1.0,812.898438,255.125,1.0,801.695312,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
1,777.289062,261.625,1.0,795.960938,276.625,1.0,807.164062,257.875,1.0,795.960938,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
2,776.617188,265.71875,1.0,791.429688,276.78125,1.0,806.242188,258.34375,1.0,795.132812,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
3,776.054688,268.445312,1.0,790.664062,277.210938,0.981241,802.351562,259.679688,1.0,796.507812,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
4,775.367188,271.132812,1.0,793.648438,277.226562,1.0,802.789062,258.945312,1.0,796.695312,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0


### 2.3 — **List parts**

Let’s list the columns we just saw to answer:  
- How many individuals are in the file?  
- How many body parts are included, and which ones?

**📋 Instructions:**
1. Run the code cell below and verify the output.

In [None]:
# 👉🏼 PREFILLED, NO NEED TO CHANGE, JUST RUN THIS CELL
# Extract column multiindex levels
animals = df.columns.get_level_values("individuals").unique()
bodyparts = df.columns.get_level_values("bodyparts").unique()

print("Number of animals:", len(animals))
print("Animals:", animals.tolist())
print("Number of bodyparts:", len(bodyparts))
print("Bodyparts:", bodyparts.tolist())

Number of animals: 10
Animals: ['animal0', 'animal1', 'animal2', 'animal3', 'animal4', 'animal5', 'animal6', 'animal7', 'animal8', 'animal9']
Number of bodyparts: 27
Bodyparts: ['nose', 'left_ear', 'right_ear', 'left_ear_tip', 'right_ear_tip', 'left_eye', 'right_eye', 'neck', 'mid_back', 'mouse_center', 'mid_backend', 'mid_backend2', 'mid_backend3', 'tail_base', 'tail1', 'tail2', 'tail3', 'tail4', 'tail5', 'left_shoulder', 'left_midside', 'left_hip', 'right_shoulder', 'right_midside', 'right_hip', 'tail_end', 'head_midpoint']


**Why do we see 10 animals if the video had only 1 mouse?**

SuperAnimal models export in a multi-animal format with fixed slots animal0…animal9. 

In a **single-mouse experiment**:  
- Only one slot (usually `animal0`) contains meaningful coordinates.  
- The remaining slots are filled with **placeholder values** (`-1` likelihood).  

👉 **In practice:** You only need to keep the data for the detected individual(s) with the **highest likelihoods** and ignore the unused slots.  


### 2.4 Evaluate tracking quality per animal
**How do we find the real animal?**
In a single animal video this will be animal0, we can confirm it easily by computing for each animal a few **summary metrics** to check which “animal slot” actually contains valid tracking data.

**📋 Instructions:**

1. Run the code cell below, it defines a helper function (not running yet) to compute the following metrics for each animal:
- **mean_likelihood:** averages the detection likelihood (≈ 1 for real, ≈ –1 for placeholders)
- **frac_conf:** the fraction of points with likelihood ≥ 0.5 (ignoring negatives). Real animals have a high fraction; empty slots have ~0.
- **mean_xy_var:** It’s the average variance (how much a value changes overtime) of the x and y coordinates for an animal across frames, computed only where likelihoods are valid (≥0) or in simpler terms: how much this animal’s detected bodyparts move in the video

2. Inspect the output table and identify which animal shows high confidence values.
   
A partially empty animal (or a fake animal ) may have just a few noisy detections → small variance, but with nearly zero likelihood confidence.



In [7]:
# PREFILLED, NO NEED TO CHANGE, JUST RUN THIS CELL
# Helper to summarize per-animal signal quality & motion

def animal_activity_summary(df: pd.DataFrame, conf_thresh: float = 0.5) -> pd.DataFrame:
    """
    Returns a small per-animal table with:
      - mean_likelihood : mean over all bodyparts/frames (often -1 when unused)
      - frac_conf       : fraction of points with likelihood >= conf_thresh (ignores <0)
      - mean_xy_var     : average variance of x/y where detections exist
    Sorted so the most likely real animal is on top.
    """
    if not isinstance(df.columns, pd.MultiIndex):
        raise ValueError("Expected MultiIndex columns (scorer/individuals/bodyparts/coords).")
    expected = ['scorer', 'individuals', 'bodyparts', 'coords']
    if list(df.columns.names) != expected:
        raise ValueError(f"Unexpected column levels: {df.columns.names} (expected {expected})")

    idx = pd.IndexSlice
    animals = df.columns.get_level_values("individuals").unique()

    rows = []
    for a in animals:
        A = df.xs(a, axis=1, level="individuals")

        # Likelihoods table: (frames, bodyparts)
        L = A.xs("likelihood", axis=1, level="coords")
        mean_L = float(L.mean().mean())

        # Valid (>=0) then fraction above threshold
        L_valid = L.where(L >= 0)
        frac_conf = float((L_valid >= conf_thresh).mean().mean())

        # Build masked XY (only where L is valid) to get motion variance
        XY = A.loc[:, idx[:, :, ["x", "y"]]]  # (frames, bodyparts, coords[x,y])

        det_mask = L_valid.notna()  # (frames, bodyparts)
        # duplicate mask for x and y, then reorder levels to match XY
        mask_xy = pd.concat([det_mask, det_mask], axis=1, keys=["x", "y"])
        mask_xy = mask_xy.swaplevel(0, 2, axis=1).swaplevel(0, 1, axis=1).sort_index(axis=1)
        mask_xy = mask_xy.reindex(columns=XY.columns)

        mov_var = float(XY.where(mask_xy).var(ddof=0).mean())

        rows.append((a, mean_L, frac_conf, mov_var))

    out = (pd.DataFrame(rows, columns=["animal", "mean_likelihood", "frac_conf", "mean_xy_var"])
             .set_index("animal")
             .sort_values(["frac_conf", "mean_xy_var", "mean_likelihood"], ascending=False))
    return out



📝 **Instructions**

You will now detect the most likely **real animal** using the helper function defined above.

1. 🧩 Complete the line marked with `# TODO` to use the helper function `animal_activity_summary()` and compute a **per-animal summary**.  
2. Start with a confidence threshold `conf_thresh = 0.5`.  
3. Store the result in a variable called `summary`.  


In [None]:
# TODO: complete the code below to pick the best animal,use the helper above
print("\n=== Detecting the most likely real animal... ===")

# >>>>>>>>>>>>>>>>>>>
# TODO: Produce the per-animal summary (1 line). Try conf_thresh=0.5 first.
# YOUR CODE (1 line) : summary = ...
summary = animal_activity_summary(df, conf_thresh=0.5)
# <<<<<<<<<<<<<<<<<<

print("\n--- Active-animal summary (sorted) ---")
display(summary)


=== Detecting the most likely real animal... ===

--- Active-animal summary (sorted) ---


Unnamed: 0_level_0,mean_likelihood,frac_conf,mean_xy_var
animal,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
animal0,0.870497,0.932481,32284.869155
animal1,-0.998751,0.000601,4015.845365
animal2,-1.0,0.0,
animal3,-1.0,0.0,
animal4,-1.0,0.0,
animal5,-1.0,0.0,
animal6,-1.0,0.0,
animal7,-1.0,0.0,
animal8,-1.0,0.0,
animal9,-1.0,0.0,


The summary table helps you verify that:

- High mean_xy_var + high frac_conf = “This animal is real and moving.”
- Low or NaN variance + low confidence = “This is an empty placeholder.”

How to interpret the result (threshold = 0.5)

If we look at the top animal: 
- mean_likelihood ≈ 0.87 → very high confidence detections (close to 1.0 = perfect).
- frac_conf ≈ 0.93 → about 93% of points are above the threshold (0.5).
- mean_xy_var ≈ 32,000 → shows strong movement variance in x/y over time (a real animal moves across frames).
  
This is the real mouse in your video and should be kept. 

All other “animals” are just empty slots because this model's configuration reserved space for up to 10.

### 2.5  Isolate one animal's pose data and prepare it for downstream analysis.

📝 **Instructions**

Now that you have the per-animal summary, your goal is to **pick the most likely real animal**.

1. 🧩 Complete the line marked with `# TODO`:  from the `summary` DataFrame, select the **index** that corresponds to the most active animal after sorting.  
2. Store it in a variable called `best_animal`.   

Now that you’ve identified the `best_animal`, the rest fo the code isolate its pose data as following:

1.  **Slice** the original DataFrame `df` to keep only the data for `best_animal`.  
   - Use `.xs(best_animal, axis=1, level="individuals")` to extract the correct sub-DataFrame.  
1. **Flatten** the multi-index column names into a single level for easier handling.  
   - For example, `("individuals", "nose", "x")` → `"nose_x"`.  
2. Store the cleaned result in a new variable called `df_one`.



In [None]:
# >>>>>>>>>>>>>>>>>>>
# TODO: Pick the most likely real animal from the summary index (1 line)
# Hint: it's the first row after sorting, so index[0]
# YOUR CODE HERE (1 line) : best_animal = ...
best_animal = summary.index[0]
# <<<<<<<<<<<<<<<<<<

print(f"Best animal picked: {best_animal}")

# Slice that animal and flatten columns to 'bodypart_coord' for simple downstream use
A = df.xs(best_animal, axis=1, level="individuals")

# Flatten multi-index columns to a single level like 'nose_x', 'paw_likelihood' (1 line)
A.columns = [f"{bp}_{coord}" for _, bp, coord in A.columns]

df_one = A  # keep a clear name for the single-animal dataframe

print("Single-animal DataFrame shape:", df_one.shape)
display(df_one.head())

Best animal picked: animal0
Single-animal DataFrame shape: (9000, 81)


Unnamed: 0,nose_x,nose_y,nose_likelihood,left_ear_x,left_ear_y,left_ear_likelihood,right_ear_x,right_ear_y,right_ear_likelihood,left_ear_tip_x,...,right_midside_likelihood,right_hip_x,right_hip_y,right_hip_likelihood,tail_end_x,tail_end_y,tail_end_likelihood,head_midpoint_x,head_midpoint_y,head_midpoint_likelihood
0,783.023438,262.625,1.0,797.960938,273.875,1.0,812.898438,255.125,1.0,801.695312,...,1.0,850.242188,307.625,1.0,831.570312,435.125,0.639258,794.226562,266.375,1.0
1,777.289062,261.625,1.0,795.960938,276.625,1.0,807.164062,257.875,1.0,795.960938,...,1.0,851.976562,306.625,1.0,837.039062,434.125,0.832935,788.492188,265.375,1.0
2,776.617188,265.71875,1.0,791.429688,276.78125,1.0,806.242188,258.34375,1.0,795.132812,...,1.0,850.679688,298.90625,1.0,835.867188,435.34375,0.679971,787.726562,265.71875,0.996491
3,776.054688,268.445312,1.0,790.664062,277.210938,0.981241,802.351562,259.679688,1.0,796.507812,...,1.0,849.101562,303.507812,0.941554,852.023438,399.929688,0.421167,787.742188,271.367188,0.817191
4,775.367188,271.132812,1.0,793.648438,277.226562,1.0,802.789062,258.945312,1.0,796.695312,...,1.0,848.492188,301.601562,1.0,863.726562,399.101562,0.708868,784.507812,274.179688,0.836767


In [None]:
#

# 3) Likelihood-based QC per bodypart (H5 only)
#    SuperAnimal often uses -1 for “no detection”. Convert <0 to NaN before stats.
L = A.xs('likelihood', axis=1, level='coords')      # (frames, bodyparts)
L_valid = L.where(L >= 0)                            # drop -1 sentinel -> NaN

per_bp = pd.DataFrame({
    'coverage'        : L_valid.notna().mean(axis=0),          # fraction of frames with any detection
    'frac_conf>=0.5'  : (L_valid >= 0.5).mean(axis=0),         # fraction of frames confidently detected
    'mean_likelihood' : L_valid.mean(axis=0),                  # average likelihood (ignoring -1)
}).sort_values(['frac_conf>=0.5','coverage','mean_likelihood'], ascending=False)
per_bp.index.name = 'bodypart'

print("\n=== Per-bodypart QC for first animal (top 10) ===")
display(per_bp.head(10))
# -------------------------
# 4) (Optional) Compute duration if FPS is known
# -------------------------
def compute_duration_from_df(df: pd.DataFrame, fps: float) -> dict:
    """
    Returns frames, fps, seconds, minutes.
    Note: DLC H5 usually doesn't store FPS; you must supply it (e.g., from the source video).
    """
    if fps <= 0:
        raise ValueError("FPS must be > 0.")
    n_frames = int(df.shape[0])
    seconds = n_frames / fps
    return {
        "frames": n_frames,
        "fps": float(fps),
        "seconds": float(seconds),
        "minutes": float(seconds / 60.0),
    }

# -------

## 3. Metadata & basic summary

👉 Goal: extract key metadata and get a first impression of data quality.

We’ll start by inspecting general information about this recording and how reliable each body part was detected.

###  What you'll do
- 3.1 Print **frame rate**, **duration**, and **number of frames**  
- 3.2 Compute the **percentage of missing or low-confidence points** per body part  
- 3.3 Create a **summary table** of likelihoods for each body part  

These steps will help you answer questions such as:  
> Which body part is most reliably detected?  
> Which one tends to be missing or uncertain?  

### 3.1 Basic metadata exploration

📝 **Instructions**
1. 🧩 Complete the line marked with `# TODO`

## 4. Likelihood distributions


👉 Goal: visualize the reliability of detections.
- Histograms of likelihood per bodypart
- Violin plots comparing bodyparts
- Fraction of frames below confidence threshold

Exercise 3:
Compare the tail base vs nose likelihood distributions. What do you observe?

## 5. Time series inspection

👉 Goal: detect failures and instability across time.
- Plot time series of x,y positions for nose (or other keypoints)
- Plot likelihood as a function of time

Exercise 4:
Spot at least two segments where the model clearly failed (likelihood drops).

## 6. Spatial distributions

👉 Goal: understand where in the arena each bodypart was detected.
- Scatter plot of nose positions
- Kernel density estimate heatmap of occupancy
- Overlay all bodypart scatter plots

Exercise 5:
Does the mouse explore the arena uniformly or are there preferences (corners, walls)?

## 7. Visual diagnostics 

👉 Goal: overlay skeletons on frames and create animations.
- Pick random frames and overlay skeleton on image
- Short animation (GIF or video snippet) of 200 frames with skeleton overlay
- Compare clear video vs challenging video

Exercise 6:
Compare skeleton overlays between clear and noisy video. What errors do you see?

## 8. Outlier & Error Detection

👉 Goal: identify extreme jumps and suspicious frames.
- Compute frame-to-frame displacement for each keypoint
- Histogram of displacements; flag outliers
- Mark “bad frames” with low likelihood or jumps

Exercise 7:
How many frames of the tail tip exceed a jump threshold of 30 pixels?

9. Filtering & Correction

👉 Goal: correct noisy or missing data.
- Apply interpolation to missing points
- Apply smoothing (rolling median or spline)
- Compare raw vs corrected trajectories

Exercise 8:
Apply interpolation to ear-left trajectory and plot before vs after.

10. Comparative Analysis (Clear vs Noisy Video)

👉 Goal: see how conditions affect pose quality.
- Load outputs from two videos: one clear, one dark/low contrast
- Create summary table: % of low-confidence frames per bodypart
- Violin plots comparing likelihood distributions

Exercise 9:
Which video shows more missingness for the nose keypoint? Why might that be?

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/LizbethMG-Teaching/pose2behav-book/blob/main/notebooks/EDA.ipynb)]