# DLC Preprocessing for Barnes Maze Videos

Author: Rafael S. Bessa

This Jupyter Notebook provides a step-by-step pipeline for pre-processing DeepLabCut (DLC) tracking data from animal behavior videos recorded in a Barnes Maze task.

It is designed for users with little or no programming experience and offers a semi-automated routine to:

Select video and DLC tracking output files

- Set analysis parameters

- Detect trial onsets and offsets

- Evaluate and visualize body part labeling likelihood

- Define the spatial coordinates of maze holes

- Identify and correct mislabeled frames with low likelihood

### 📋 Main Sections

1. Import libraries

2. Select files – Choose the video and tracking coordinate files

3. Set analysis parameters – Define thresholds and general settings

4. Detect trials and assess labeling quality – Identify trial segments and   evaluate label confidence

5. Define maze hole coordinates

   5.1. _Optionally correct maze hole coordinates_

7. Find and correct mislabeled frames – Detect low-likelihood labels and correct them manually

### 🧭 How to Use This Notebook

Each code cell is structured as an independent analysis module.
Although the notebook is designed to be executed in order from top to bottom, users may choose to run only specific cells.

> ⚠️ **Attention:** Just make sure that any required input variables are already defined when skipping steps or running out of order.


In [1]:
# 1) Import libraries and custom functions

# Standard libraries
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('TkAgg') # open plots in sepate windows

# Custom modules with helper functions
from data_handling_mod import *            # Functions for file handling and hole definition
from Evaluate_frame_labels import *        # Functions for assessing labeling likelihood and correction


💡 _These custom modules (data_handling_mod, Evaluate_frame_labels) must be in the same folder as this notebook or properly installed in your environment._

In [2]:
# 2) Select files (video and DLC tracking data)

# Choose where to save output results (e.g., figures, corrected files)
path = '/home/rafael/Downloads/Dados Parkinson LANEC/Dados Barnes/Coords processed/'  # <-- Replace with your desired folder path

# Select input video files (.mp4, .avi, .mov, .mkv, .webm formats)
# Set multiple=False if you want to select only one video
video_files = select_file_video(multiple=True)

# Select DLC tracking data files (.h5 format)
coord_files = select_file_coord(multiple=True)

📁 _A file selection window will appear. Make sure the number of selected video and DLC files matches!_

In [3]:
# 3) Set initial parameters for analysis

conf_threshold = 0.95   # Minimum confidence (likelihood) required for a body part to be considered valid
n_bp = 9                # Number of labeled body parts (as defined in your DLC project)
nholes = 12             # Number of holes in the Barnes Maze

⚙️ _These values can be adjusted depending on your experimental setup or labeling quality._

### Step 4: Detect Trials and Evaluate Labeling Confidence

In this step, we call the function `check_frames_likelihood()`.

This function identifies the **first and last frames** in which at least a certain percentage of body parts exceed the likelihood threshold. These frames are considered the start and end of a valid trial.

For each video, the function:
- Extracts frame snapshots at trial **onset** and **offset**
- Also shows frames **before and after** those events (±3 seconds by default)
- Plots **likelihood curves** and **pie charts** for each body part
- Summarizes all results in a **DataFrame**

> ⚠️ **IMPORTANT**: If your machine has limited memory and you are processing **many videos (>100)**, consider running this step in smaller batches by splitting `video_files` and `coord_files`.

In [None]:
# 4) Detect trial onset/offset and evaluate label quality using likelihood scores

filename = 'Video_info.csv'  # Output filename for results

# Run the evaluation function
video_info = check_frames_likelihood(
    video_files,
    coord_files,
    conf_threshold,
    n_bp,
    path,
    plt_format='png'  # Output format for figures
)

# Save the resulting DataFrame with trial info and statistics
video_info.to_csv(path + filename, index=False)

### Step 5: Define the Spatial Coordinates of Maze Holes

Now we define the positions of the holes in the Barnes Maze using the function `define_hole_position()`.

This function can work in two modes:

- **Automatic mode (`dv=0`)**: Coordinates are extracted directly from the DLC output table (if holes were labeled).
- **Manual mode (`dv=1`)**: The user manually clicks on hole locations using the plotted frame.

To use automatic mode, you must specify the frame index (`frame_idx`) that will be used to extract the coordinates.
This frame should contain clearly visible and well-labeled holes.

In [None]:
# 5) Get/Define holes positions

frame_idx = 15 # Frame to plot/extract coordinates. (frame_onset + frame_idx)
hole_coords = define_hole_position(video_files, coord_files, video_info, nholes, n_bp, path, frame_int, dv=0)
hole_coords.to_csv(path + 'Hole_Coords.csv', index=False, float_format='%.2f', decimal='.') # Save output dataframe

### Step 5.1 (Optional): Manually Correct Maze Hole Coordinates

Sometimes, one or more hole coordinates may be incorrect in the selected frame.

In such cases, you can **rerun the function** `define_hole_position()`, changing the frame index to be used to extract coordinates or running it in **manual mode** by passing:

- The previously generated `hole_coords` DataFrame
- A vector of indexes (NumPy array) indicating which videos need correction

> ℹ️ These indexes refer to the **rows of the `hole_coords` DataFrame** (i.e., the order of the videos).

> ⚠️ **Important**: Be careful **not to overwrite** the original file unless you're sure you want to replace it. Remember to change the DataFrame name when saving it.

In [None]:
# Optional: Correct hole coordinates for specific videos

# Example: Correct hole positions for videos 1 and 3 (indexes 0-based)
error_indices = np.array([1, 3])
# Load previous hole coords dataframe (.csv)
old_coords = pd.read_csv('/Directory_path/Old_Coords.csv') # <-- Replace with your desired folder path + name

# Rerun in manual mode (dv=1)
hole_coords_corrected = define_hole_position(
    video_files,
    coord_files,
    video_info,
    nholes,
    n_bp,
    path,
    frame_idx,
    dv=1,
    df=old_coords,
    idx=error_indices
)

# Save corrected version if needed
hole_coords_corrected.to_csv(path + 'Hole_Coords_Corrected.csv', index=False)


### Step 6: Identify and Correct Low-Confidence Frames

In this step, we use the function `fix_frames_likelihood()` to identify and correct frames where body part tracking confidence is too low.

This function:
- Automatically detects sequences of **low-likelihood frames**
- Prompts the user to **manually re-label** the body part positions on selected frames
- Then uses **interpolation** to fill in the remaining missing/corrected coordinates across the time series

This is especially useful when the animal is partially occluded, misdetected, or jumps in the frame.


In [None]:
# 6) Identify and fix unreliable labels

# List of body parts to check and correct (must match DLC labels exactly)
bp_list = ['head_centre', 'neck', 'body_centre', 'tail_base']

seq_thre = 15     # Minimum sequence length (in frames) to consider for correction
frame_jump = 10   # Number of frames to skip when sampling frames for manual correction

# Run the correction function for all selected files
fix_frames_likelihood(
    video_files,
    coord_files,
    video_info,
    conf_threshold,
    bp_list,
    seq_thre,
    frame_jump,
    path
)


✍️ _You will be asked to manually click with the mouse on the body part positions for each selected frame._
- Left button: add point
- Middle button: undo last click
- Right button: skip point (insert NaN) 

### Optional: Run Corrections on a Subset of Files

Sometimes, you may want to analyze **only a subset** of your video files — either to test the routine or to skip already-processed videos.

You can do this in two ways:
1. In **Step 2**, manually select only the files you want to work with.
2. Or, use Python indexing to select a range directly from the `video_files` and `coord_files` lists, as shown below.

In [4]:
# Run correction on a specific range of files using Python indexing

# video_info = pd.read_csv('/home/rafael/Downloads/Dados Parkinson LANEC/Dados Barnes/Video_info.csv')
bp_list = ['head_centre', 'neck', 'body_centre', 'tail_base']
seq_thre = 15
frame_jump = 10

# Example: Define the range of videos to process
id_on = 5   # Start index (inclusive)
id_off = 7  # End index (exclusive)

# Run correction for selected files only
fix_frames_likelihood(
    video_files[id_on:id_off],
    coord_files[id_on:id_off],
    video_info.iloc[id_on:id_off, :],
    conf_threshold,
    bp_list,
    seq_thre,
    frame_jump,
    path,
    id_on
)


0


Press Enter to continue or 'q' to exit:  


1


Press Enter to continue or 'q' to exit:  


🎯 _This is useful when reprocessing just a few problematic files or working in batches._