# DataJoint Element for Pose Estimation with DeepLabCut

**Open-source Data Pipeline for Markerless Pose Estimation in Neurophysiology**

This tutorial aims to provide a comprehensive understanding of the open-source data pipeline by `Element-DeepLabCut`.

![pipeline](../images/flowchart.svg)

The package is designed to simplify pose estimation analyses and streamline data organization using `DataJoint`. 

![pipeline](../images/pipeline.svg)

By the end of this tutorial, participants will have a clear grasp of how to set up and apply the `Element DeepLabCut` for their specific pose estimation projects. 

**Key Components and Objectives**

**- Setup**

**- Designing the DataJoint Pipeline**

**- Step 1: Register an Existing Model in the DataJoint Pipeline**

**- Step 2: Insert Subject, Session, and Behavior Videos**

**- Step 3: DeepLabCut Inference Task**

**- Step 4: Visualization of Results**

For detailed documentation and tutorials on general DataJoint principles that support collaboration, automation, reproducibility, and visualizations:

[`DataJoint for Python - Interactive Tutorials`](https://github.com/datajoint/datajoint-tutorials) covers fundamentals, including table tiers, query operations, fetch operations, automated computations with the make function, and more.

[`DataJoint for Python - Documentation`](https://datajoint.com/docs/core/datajoint-python/0.14/)

[`DataJoint Element for DeepLabCut - Documentation`](https://datajoint.com/docs/elements/element-deeplabcut/0.2/)

## Setup

This tutorial examines the behavior of a freely-moving mouse in an open-field environment. 

The goal is to extract pose estimations of the animal's head and tail base from video footage. 

This information offers valuable insights into the animal's movements, postures, and interactions within the environment. 

The results of this Element example can be combined with other modalities to create a complete data pipeline for your specific lab or study.

#### Steps to Run the Element-DeepLabCut

To run the Element, ensure that you have:

- A DeepLabCut (DLC) project folder on your machine.

- Labeled data in your DLC project folder.

This tutorial includes a DLC project folder with example data and its results in `example_data`. 

In [None]:
import os
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='element-deeplabcut', ("Please move to the "
                                                              + "element directory")

First start by importing the packages necessary to run this pipeline.

In [None]:
import datajoint as dj
from pathlib import Path
import yaml

This codespace provides a local database private to you for experimentation. Let's connect to the database server:

In [None]:
dj.conn()

## Design the DataJoint Pipeline

This tutorial assumes that `element-deeplabcut` is already configured and instantiated, with the database connected downstream from existing subject and session tables. Import schemas for subject, session, train, model, etc.:

In [None]:
from tutorial_pipeline import lab, subject, session, train, model  

In [None]:
(
    dj.Diagram(subject) 
    + dj.Diagram(lab) 
    + dj.Diagram(session) 
    + dj.Diagram(model) 
    + dj.Diagram(train)
)

As you can see, this data pipeline is quite extensive, with various tables related to other components like models, training, and evaluation in DLC. Some, such as the `Subject` table, are not relevant to this tutorial and are upstream.

In [None]:
dj.Diagram(model) + dj.Diagram(train)

This diagram represents the `element-deeplabcut` pipeline.

## Step 1 - Register an Existing Model in the DataJoint Pipeline

A DeepLabCut model is defined in a specific folder structure with a `config.yaml` file that contains the model's specifications (see folder `example_data/inbox`). To "register" this DLC model with DataJoint, you can specify this config file:

In [None]:
config_file_rel = "./example_data/inbox/from_top_tracking-DataJoint-2023-10-11/config.yaml"

The `insert_new_model` function is a helper function provided in `element-deeplacut` for convenient model registration.

This function prints out the essential information, like the `model_name` and the `model_description`, together with other relevant information from the config file. 

If all the information is correct, you can confirm the insertion by typing 'yes,' which will insert the new model and its two body parts, `head` and `tailbase`:

In [None]:
model.Model.insert_new_model(model_name='from_top_tracking_model_test',
                             dlc_config=config_file_rel,
                             shuffle=1,
                             trainingsetindex=0,
                             model_description='Model in example data: from_top_tracking model')

You can check the `Model` table to confirm that the new model has been added:

In [None]:
model.Model()

Much of this information is directly sourced from the `config` file. However, it's worth noting that this model is currently distinct and singular. 

If you wish to incorporate another model, you must specify a new `model_name`; duplication of an existing model is not permitted—it must be an entirely new model.

## Step 2 - Insert Subject, Session, and Behavior Videos

Confirm the availability of data in the `Subject` and `Session` tables:

In [None]:
subject.Subject()

Insert a subject into the `Subject` table:

In [None]:
# Subject and Session tables
subject.Subject.insert1(
    dict(
        subject="subject6",
        sex="F",
        subject_birth_date="2020-01-01",
        subject_description="hneih_E105",
    ),
    skip_duplicates=True,
)

Define session keys and insert them into the `Session` table:



In [None]:
#Definition of the dictionary named "session_keys"
session_keys = [
    dict(subject="subject6", session_datetime="2021-06-02 14:04:22"),
    dict(subject="subject6", session_datetime="2021-06-03 14:43:10"),
]

#Insert this dictionary in the Session table
session.Session.insert(session_keys, skip_duplicates=True)


Confirm the inserted data:

In [None]:
session.Session()

Insert data into the `VideoRecording` table:

In [None]:
### VideoRecording
recording_key = {'subject': 'subject6',
       'session_datetime': '2021-06-02 14:04:22',
       'recording_id': '1'}
model.VideoRecording.insert1({**recording_key, 'device': 'Camera1'}, skip_duplicates=True)

Insert video files into the `VideoRecording.File` table:

In [None]:
### VideoRecording.File

video_files = ["./example_data/inbox/from_top_tracking-DataJoint-2023-10-11/videos/train1.mp4"]

model.VideoRecording.File.insert({
    **recording_key, 
    'file_id': v_idx, 
    'file_path': Path(f)} for v_idx, f in enumerate(video_files))

Populate the `RecordingInfo` table:

In [None]:
### RecordingInfo
model.RecordingInfo.populate()
model.RecordingInfo()

Recording info extracts metadata from the video and validates the number of frames (n_frames), which will correspond to the number of entries for each body part in the pose estimation results.

## Step 3 - DeepLabCut Inference Task

The `PoseEstimationTask` table is used for defining an inference task. Let's explore the table description:

In [None]:
model.PoseEstimationTask.describe()

To define and insert a task, you need to:

1. Define a video recording.
2. Select a model.
3. Choose the task mode (load or trigger).
4. Specify the output directory and optional parameters.

When the task mode is "trigger," DataJoint triggers the inference, running the DeepLabCut model. This might take a long time, depending on the hardware. If the hardware lacks GPU support, it's not recommended.

For this exercise, we are choosing the **"load" task** mode because the server does not have the necessary GPU for inference. The results have already been prepared. The results of this inference are generated in `example_data\outbox`. 

If you select the **"trigger" task**, DataJoint will perform the entire inference process and generate these file sets.

Let's define the keys for recording and task:

In [None]:
recording_key

In [None]:
task_key = {**recording_key, 'model_name': 'from_top_tracking_model_test'}

The results are located in the `pose_estimation_output_dir` location.

In [None]:
model.PoseEstimationTask.insert1(
    {**task_key,
     'task_mode': 'load',
     'pose_estimation_output_dir': './example_data/outbox/from_top_tracking-DataJoint-2023-10-11/videos/device_1_recording_1_model_from_top_tracking_100000_maxiters'
     })

Display the `PoseEstimationTask` table:

In [None]:
model.PoseEstimationTask()

In [None]:
### PoseEstimation
model.PoseEstimation.populate()

Let's look into the `PoseEstimation` table.

In [None]:
model.PoseEstimation()

The most critical table is the `PoseEstimation.BodyPartPosition`. 

In [None]:
### Results
model.PoseEstimation.BodyPartPosition()

After pose estimation, entries related to the task include `subject`, `session`, `recording_id`, `model name`, and each detected `body_part` (two entries in this case).

Entries contain `frame_index`, `x_pos` and `y_pos` positions, and `likelihood` (`z_pos` is zero). This structure is familiar to DeepLabCut users.

These results can be fetched in a Pandas DataFrame structure: 

In [None]:
df = (model.PoseEstimation.BodyPartPosition & task_key).fetch(format='frame').reset_index()

In [None]:
df

`frame_index` is an array of frame numbers, `x_pos` is a NumPy array of x positions, and `likelihood` is also a NumPy array.


Use DataJoint `fetch` as a Pandas DataFrame and utilize the `explode` function to expand `x` and `y` positions.

In [None]:
df = df.explode(['frame_index', 'x_pos', 'y_pos', 'likelihood']).reset_index()
df

As mentioned earlier, you can confirm these results by the number of entries. There are 66000 frames for each body part, matching the `n_frames` from the `RecordingInfo` table.

## Step 4 - Visualization of results

First, separate the data for the head and tailbase and then plot the head pose estimation and tailbase pose estimation.

In [None]:
import matplotlib.pyplot as plt

head_data = df[df['body_part'] == 'head']
tail_data = df[df['body_part'] == 'tailbase']

plt.title('Head pose estimation')
plt.plot(head_data['x_pos'],label='x_pos')
plt.plot(head_data['y_pos'],label='y_pos')
plt.xlabel('time (frames)')
plt.ylabel('pos (pixels)')
plt.legend()
plt.show()

In [None]:
plt.title('Tailbase pose estimation')
plt.plot(tail_data['x_pos'],label='x_pos')
plt.plot(tail_data['y_pos'],label='y_pos')
plt.xlabel('time (frames)')
plt.ylabel('pos (pixels)')
plt.legend()
plt.show()

Finally, let's plot the head and tailbase positions on the same graph.

In [None]:
plt.plot(head_data['x_pos'], head_data['y_pos'], label='head')
plt.plot(tail_data['x_pos'], tail_data['y_pos'], label='tailbase')
plt.xlabel('x_pos (pixels)')
plt.ylabel('y_pos (pixels)')
plt.legend()
plt.show()