# DataJoint Element DeepLabCut

**Open-source Data Pipeline for Markerless Pose Estimation in Neurophysiology**

This tutorial focuses on providing a comprehensive understanding of the open-source data pipeline offered by `Element-DeepLabCut`. The package is designed to facilitate pose estimation analyses and streamline the organization of data using `DataJoint`. By the end of this tutorial, participants will have a clear grasp of how to set up, utilize, ad optimize the package for their specific pose estimation projects. 

**Key Components and Objectives**

- 1. Download Sample Data and Context

- 2. Setup

- 3. Design the DataJoint Pipeline

- 4. Enter the Metadata into the Pipeline

- 5. Run the Model Training

- 6. Run the Model Evaluation


For detailed documentation and tutorials on general DataJoint principles that support collaboration, automation, reproducibility, and visualizations:

[`DataJoint for Python - Interactive Tutorials`](https://github.com/datajoint/datajoint-tutorials) - Fundamentals including table tiers, query operations, fetch operations, automated computations with the make function, etc.

[`DataJoint for Python - Documentation`](https://datajoint.com/docs/core/datajoint-python/0.14/)

[`DataJoint Element for DeepLabCut - Documentation`](https://datajoint.com/docs/elements/element-deeplabcut/0.2/)

## 1. Download Sample Data and Context

In this section, you will download the sample data that simulates a real research project. By working through this sample data, you will gain valuable insights into the `practical application` of the package's tools and techniques.

### Project Context: 


In this research project, we are studying the `behavior of a freely-moving mouse in an open-field environment`. The objective is to `extract pose estimations of the animal's head, body, and tail` from video footage. This information can provide valuable insights into the animal's movements, postures, and interactions within the environment.

### Downloading Sample Data:

1. Click the following link to download the sample data archive: `##TO-DO`


2. Once the download is complete, extract the contents to a `path of your choice on your local machine`.

After running this tutorial, you can try `Element-DeepLabCut` with your own dataset. To do so, create a new `DeepLabCut` folder with your own videos. Then, remember to change the path in the configuration file (`config.yaml`) in your new `DeepLabCut project` folder accordingly.

### Challenges: 

**Complex Background**: The open field environment introduces complex backgrounds and varying lighting conditions, making accurate pose estimation challenging.

**Multiple Body Parts**: Extracting the pose of multiple body parts (head, body, tail) adds complexity to the analysis due to potential occlusions and variations in appearance.

**Data Management**: Managing the large volume of video data generated in the field and ensuring consistent annotation requires an efficient data pipeline.

### Expected Outcomes:

Upon completing this tutorial, you will have acquired practical proficiency in employing the `Element-DeepLabCut` package to effectively tackle the complexities of pose estimation. 

This tutorial and sample dataset will serve as a practical foundation for your learning journey with the Element package, enabling you to apply these techniques to your own research projects. 

By integrating this element package with other Elements of DataJoint, you unlock a powerful data pipeline that provides numerous benefits for your research workflow. 

### 

## 2. Setup

In [None]:
####Explain this part better and include the link to download the project folder

Before using DataJoint and this tutorial, you need an account to gain access to the database server. 

Please, go to ### and create an account. 



Now that you have your credentials (DJ_USER, DJ_PASS), you need to connect to the server. To do so, we need to `configure the connection` with the user credentials. 

- If this is the first time that you are running this tutorial:
    - Then you will need to specify the connection parameters by input arguments as in the next subsection `Configuration Code for Initiating this Tutorial`. This section will create a DataJoint configuration file named `dj_local_conf.json` that will save your credentials as environment variables in your local machine. You can find this file in your `Element-Deeplabcut` folder. This configuration file is unique to each machine and DataJoint user.

- If you have already run this tutorial and created the `.json` file with your credentials info:
    - Then you can directly start from the subsection `Configuration Code to Configure this Tutorial in Subsequent Restarts`.

##### Configuration Code for Initiating This Tutorial

##### *The configuration file only needs to be set up once. If you already have one, jump to the following subsection `Configuration Code to Configure this Tutorial in Subsequent Restarts`*

In [None]:
import os
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='element-deeplabcut', ("Please move to the "
                                                              + "element directory")

Let's start by importing the packages necessary to run this pipeline.

In [None]:
import datajoint as dj
from pathlib import Path
import yaml


The connection parameters are specified by input arguments:
- HOST, USER, AND PASSWORD are the fields for the user credentials
- Configuring a `custom` field helps manage privileges on a server,for instance, teams who work on the same schemas should use the same schema prefix. 
    - Setting the prefix to `dlc_` means that every schema we then create will start with `dlc_` (e.g. `dlc_lab`, `dlc_subject`, `dlc_model` etc.)

Please, substitute the blue text with your personal host, username, and prefix. Also, your password will be asked.


In [None]:
##TO-DO: WHAT HOST IS NECESSARY FOR A NEW USER?

In [None]:
import getpass
dj.config['database.host'] = '{YOUR_HOST}' 
dj.config['database.user'] = '{YOUR_USERNAME}' 
dj.config['database.password'] = getpass.getpass() # enter the password securely
dj.config['custom']['database.prefix']= '{YOUR_USERNAME_dlc_}' 

In [None]:
### DELETE BEFORE COMMIT TO GITHUB

import getpass
dj.config['database.host'] = 'rds.datajoint.io' 
dj.config['database.user'] = 'milagrosmarin' 
dj.config['database.password'] = getpass.getpass() # enter the password securely
dj.config['custom']['database.prefix']= 'milagrosmarin_dlc_' 

Credentials will be saved and the connection to the database server will be run with the next cells.

In [None]:
dj.config.save_local() 

Let's make the connection to the database server.

In [None]:
dj.conn()

Once set the configuration file, it will be created and saved as `dj_local_conf.json` in the `Element-DeepLabCut directory`. Please, you may verify this file and its content. Remember that this step only needs to be set up once.

#### Configuration Code to Configure this Tutorial in Subsequent Restarts

If you have already run the previous subsection, the next time you want to run this tutorial (restart the kernel of the notebook) you will only need to start the tutorial from here: 

In [None]:
import os
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='element-deeplabcut', ("Please move to the "
                                                              + "element directory")

Let's start by importing the packages necessary to run this pipeline.

In [None]:
import datajoint as dj
from pathlib import Path
import yaml

Now, let's connect to the database server to be able to use DataJoint.

In [None]:
dj.conn()

## 3. Design the DataJoint Pipeline

First, you need to update the path of your `DeepLabCut project folder` into your configuration file `dj_local_conf.json`. Open the file in your `DeepLabCut-Element` folder, and copy and paste the `DeepLabCut project folder` path in `dlc_root_data_dir`. Also, copy and paste the `DeepLabCut project folder` name in `current_project_folder`:

        "dlc_root_data_dir": "{DLC_PROJECT_PATH}",
        "current_project_folder": "{DLC_PROJECT_NAME}"

Or you can run the following lines to automatically change this information in the configuration file.

In [None]:
from element_interface.utils import find_full_path
dj.config.load('dj_local_conf.json')
data_dir = find_full_path(dj.config['custom']['dlc_root_data_dir'], # root from config
                          'Top_tracking-DataJoint-2023-08-03')       
               # DLC project dir

Based on the project path specified in the `.json` file, the paths of the input files are charged as variables in this tutorial's session:

In [None]:
### DLC Project
dlc_project_path_abs = Path(dj.config["custom"]["dlc_root_data_dir"]) / Path(
    dj.config["custom"]["current_project_folder"]
)  # use pathlib to join; abs path
dlc_project_folder = Path(
    dj.config["custom"]["current_project_folder"]
)  # relative path

### Config file
config_file_abs = dlc_project_path_abs / "config.yaml"  # abs path
assert (
    config_file_abs.exists()
), "Please check the that you have the Top_tracking folder"

### Labeled-data
labeled_data_path_abs = dlc_project_path_abs / "labeled-data"
labeled_files_abs = list(
    list(labeled_data_path_abs.rglob("*"))[1].rglob("*")
)  # substitute 'training_files'; absolute path
labeled_files_rel = []
for file in labeled_files_abs:
    labeled_files_rel.append(
        file.relative_to(dlc_project_path_abs)
    )  # substitute 'training_files'; relative path


### Combine multiple Elements into a pipeline

Each DataJoint Element is a modular set of tables that can be combined into a complete pipeline.

Each Element contains one or more modules, and each module declares its own schema in the database. Schemas are conceptually related sets of tables. 

This tutorial pipeline is assembled from four DataJoint Elements.

| Element | Source Code | Documentation | Description |
| -- | -- | -- | -- |
| Element Lab | [Link](https://github.com/datajoint/element-lab) | [Link](https://datajoint.com/docs/elements/element-lab) | Lab management related information, such as Lab, User, Project, Protocol, Source. |
| Element Animal | [Link](https://github.com/datajoint/element-animal) | [Link](https://datajoint.com/docs/elements/element-animal) | General subject meta data, genotype, and surgery information. |
| Element Session | [Link](https://github.com/datajoint/element-session) | [Link](https://datajoint.com/docs/elements/element-session) | General information of experimental sessions. |
| Element DeepLabCut | [Link](https://github.com/datajoint/element-deeplabcut) | [Link](https://datajoint.com/docs/elements/element-deeplabcut) | DataJoint schemas (Train and Model) for storing and running analysis of markerless pose estimation with DeepLabCut.

The Elements are imported and activated in the next code cell.

In [None]:
from tutorial_pipeline import lab, subject, session, train, model  # after creating json file

By importing the modules for the first time, the schemas and tables will be created in the database.  

In [None]:
dj.list_schemas()

Once created, importing modules will not create schemas and tables again, but the existing schemas/tables can be accessed.
To empty these schemas and tables for introducing new entries, run (uncomment) the following code lines (note that you will have to commit the delete in the prompt by typing "yes")

In [None]:
# Empty the session in case of rerunning
safemode=True # Set to false to turn off confirmation prompts
session.Session.delete(safemode=safemode)
train.TrainingParamSet.delete(safemode=safemode)
train.VideoSet.delete(safemode=safemode)
model.BodyPart.delete(safemode=safemode)
subject.Subject.delete(safemode=safemode)

Each Python module (e.g. `subject`) contains a schema object that enables interaction with the schema in the database.

In [None]:
subject.schema

The Python classes in the module correspond to a table in the database server. We can check also if there is any entry in the table.

In [None]:
subject.Subject()

Let's plot the diagram of the whole data pipeline for this `Element-DeepLabCut`.

In [None]:
(
    dj.Diagram(subject) 
    + dj.Diagram(lab) 
    + dj.Diagram(session) 
    + dj.Diagram(model) 
    + dj.Diagram(train)
)

And this is the main body of this `Element-DeepLabCut`.

In [None]:
dj.Diagram(model) + dj.Diagram(train)

## 4. Enter the Metadata into the Pipeline

In order to run the `Model Training`, we need to start by adding the input data to the `train` module. Let's start having a look at the `TrainingTask` table. This table will pair each video set with their corresponding training parameters.



In [None]:
train.TrainingTask()

Let's pair some example data and launch training via `process`. 

In [None]:
#IS THIS NEEDED???

#key={'paramset_idx':0,'training_id':0,'video_set_id':0, 
#     'project_path':dlc_project_folder}
#train.TrainingTask.insert1(key, skip_duplicates=True)
#process.run(verbose=True, display_progress=True)
#model.RecordingInfo()

The `Subject` module corresponds to the table that will contain the subject (e.g., the mouse) information. Let's insert example entries into the `subject.Subject` table.

In [None]:
# Subject and Session tables
subject.Subject.insert1(
    dict(
        subject="subject6",
        sex="F",
        subject_birth_date="2020-01-01",
        subject_description="hneih_E105",
    ),
    skip_duplicates=True,
)


Let's repeat the step for the `Session` module. We can also insert in the `Session` table by passing a dictionary to the `insert1` method. 


In [None]:
#Definition of the dictionary named "session_keys"
session_keys = [
    dict(subject="subject6", session_datetime="2021-06-02 14:04:22"),
    dict(subject="subject6", session_datetime="2021-06-03 14:43:10"),
]

#Insert this dictionary in the Session table
session.Session.insert(session_keys, skip_duplicates=True)
session.Session()

The `VideoSet` table in the `train` schema retains records of files generated in the video labeling process (e.g., `h5`, `csv`, `png`). DeepLabCut will refer to the `mat` file located under the `training-datasets` directory.

We recommend storing all paths as relative to the root in your config.

In [None]:
# Videoset table 
train.VideoSet.insert1({"video_set_id": 0}, skip_duplicates=True)

for idx, filename in enumerate(labeled_files_rel):
    train.VideoSet.File.insert1(
        {
            "video_set_id": 0, 
            "file_id": idx, 
            "file_path": dlc_project_folder / filename
        },
    )  

In [None]:
train.VideoSet.File()

## Training a network

To train the network, we need to add the parameter set (`TrainingParamSet`) of the model training (`train`).

In [None]:
train.TrainingParamSet()


The `params` attribute has to be a dictionary that captures all the items for the DeepLabCut's `train_network` function. At minimum, this is the contents of the project's config file, as well as `suffle` and `trainingsetindex`, which are not included in the configuration file.


We will insert these items, load the config contents, and overwrite some defaults, including `maxiters`, to restrict our training iterations to 5.



In [None]:
# Restrict the training interations to 5 modifying the default parameters in config.yaml
paramset_idx = 0
paramset_desc = "First training test with DLC using shuffle 1 and maxiters = 5"

# default parameters
with open(config_file_abs, "rb") as y:
    config_params = yaml.safe_load(y)
config_params.keys()

# new parameters
training_params = {
    "shuffle": "1",
    "trainingsetindex": "0",
    "maxiters": "5",
    "scorer_legacy": "False",  # For DLC ≤ v2.0, include scorer_legacy = True in params
    "maxiters": "5",
    "multianimalproject": "False",
}
config_params.update(training_params)

train.TrainingParamSet.insert_new_params(
    paramset_idx=paramset_idx, paramset_desc=paramset_desc, params=config_params
)

Now, we add a `TrainingTask`. As a computed table, `ModelTraining` will reference this to start training when calling `populate()`

In [None]:
dj.Diagram(train)

In [None]:
train.TrainingTask()

In [None]:
# TrainingTask table
key = {
    "video_set_id": 0,
    "paramset_idx": 0,
    "training_id": 1,
    "project_path": dlc_project_folder,
}
train.TrainingTask.insert1(key, skip_duplicates=True)
train.TrainingTask()

After inserting the training parameters and the video recordings, the model training can be run and outputs will be stored in `ModelTraining` table.

*Note that the following code line will run the model training with DeepLabCut. It will take some minutes if you have installed DeepLabCut in the GPU. However, it will take longer if the installation was in CPU*

In [None]:
train.ModelTraining.populate(display_progress=True)


In [None]:
train.ModelTraining.fetch()


The network is now trained and ready to evaluate. The next step consists of evaluating the network. 


## 5. Evaluating the network model

### Tracking Joints/Body Parts

The `model` schema uses a lookup table for managing the body parts tracked across models.

In [None]:
model.BodyPart()
new_body_parts = [
    dict(body_part="subject6", session_datetime="2021-06-02 14:04:22"),
    dict(subject="subject6", session_datetime="2021-06-03 14:43:10"),
]
session.Session.insert(session_keys, skip_duplicates=True)

We can also modify the body parts as desired. For that, we can use helper functions to identify and insert the new body parts from a given DeepLabCut configuration file (`config.yaml`) in the data pipeline.

In [None]:
model.BodyPart.extract_new_body_parts(config_file_abs)

In [None]:
# Add ONLY if there are new body parts compared to the config.yaml. If the table has already descriptions, then leave it empty.
bp_desc=[]
model.BodyPart.insert_from_config(config_file_abs,bp_desc)

### Declaring/Evaluating a Model

We can insert into `Model` table for automatic evaluation

In [None]:
model.Model.insert_new_model(model_name='FromTop-latest',
                             dlc_config=config_file_abs,
                             shuffle=1,
                             trainingsetindex=0,
                             model_description='FromTop - latest snapshot',
                             paramset_idx=0,
                             params={"snapshotindex":-1})

In [None]:
model.BodyPart()

In [None]:
model.Model()

`ModelEvaluation` will reference the `Model` using the `populate` method and insert the  output from DeepLabCut's `evaluate_network` function

In [None]:
model.ModelEvaluation.populate()

In [None]:
model.ModelEvaluation()

### Pose Estimation

To use our model, we'll first need to insert a session recording into `VideoRecording`.

In [None]:
model.VideoRecording()

In [None]:
key = {'subject': 'subject6',
       'session_datetime': '2021-06-02 14:04:22',
       'recording_id': '1', 'device': 'Camera1'}
model.VideoRecording.insert1(key, skip_duplicates=True)

_ = key.pop('device') # get rid of secondary key from master table // why this step???
key.update({'file_id': 1, 
            'file_path': 'Top_tracking-DataJoint-2023-08-03/videos/train1_trimmed.mp4'})
model.VideoRecording.File.insert1(key, skip_duplicates=True)

In [None]:
model.VideoRecording.File()

`RecordingInfo` automatically populates with file information

In [None]:
model.RecordingInfo.populate()
model.RecordingInfo()

Next, we specify if the `PoseEstimation` table should load results from an existing file or trigger the estimation command. Here, we can also specify parameters for DeepLabCut's `analyze_videos` as a dictionary.

In [None]:
recording_dict = (model.VideoRecording & {"recording_id": "1"}).fetch1("KEY")
recording_dict.update({"model_name": "FromTop-latest", "task_mode": "trigger"})
# videotype, gputouse, save_as_csv, batchsize, cropping, TFGPUinference, dynamic, robust_nframes, allow_growth, use_shelve
analyze_videos_params = {"save_as_csv": True}

By default, DataJoint will store results in a subdirectory
>       <processed_dir> / videos / device_<name>_recording_<#>_model_<name>

`processed_dir` is optionally specified in the datajoint config, or in the `insert_estimation_task`. If unspecified, this will be the project directory. 

In [None]:
model.PoseEstimationTask.infer_output_dir(key)

In [None]:
model.PoseEstimationTask.insert_estimation_task(recording_dict, model_name = recording_dict["model_name"], analyze_videos_params=analyze_videos_params)

In [None]:
#model.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True})
model.PoseEstimation.populate()


The resulting coordinates of the pose estimation are now available in the corresponding `BodyPartPosition` table, ready to use for visualization, or to combine with other Elements.

In [None]:
model.PoseEstimation.BodyPartPosition()

We can visualize the pose estimation results directly as a pandas dataframe.

In [None]:
model.PoseEstimation.coordinates_dataframe(key)