# Label analysis

This notebook can be used to check the completeness of labelling in a particular dataset, which may be useful for finding labels that were added as part of an instance but were not moved into their correct position. A breakdown of body parts marked as not visible is also provided, which can show which body parts require more labelled instances.

Enter values for each variable in the first cell below according to your setup and requirements. Each variable and corresponding value should look like the following,

`variable_name = value`

After entering values for all variables, click Cell > Run All in the top menu bar to execute the notebook and show the label analysis outputs.

Note that the 'r' before the opening quotation mark for the `filename` variable is required to ensure that backslashes in the folder path don't cause issues during processing, as the backslash is usually used to denote an "escaped" character in character strings.

## Parameters

In [None]:
# Enter the file path for the SLEAP dataset file to be analysed
# e.g. r"D:\Documents\Ant-posture\datasets\col218.v001.slp"
filename = r"D:\Documents\COMP3850-Group23-Ant-posture\datasets\main.v001.slp"

## Code execution

In [None]:
# Import the Python modules required for the notebook to run
import os
import sleap
import pandas as pd
from matplotlib import pyplot as plt

# Set pandas to display up to 200 rows in a data frame before truncating it
pd.set_option('display.max_rows', 200)

# Check if the dataset exists at the given path and raise an error if it is not found
if not os.path.exists(filename):
    raise FileNotFoundError("File does not exist at " + os.path.abspath(filename))
    
# Load the labels from the SLEAP dataset file
labels = sleap.load_file(filename)

In [None]:
# Initialise all variables
total_instances = 0
total_unverified = 0
unverified_dict = {}
hidden_points = {}
frames_list = []
nodes_dict = {}

# Create keys for each body part in the node dictionary
for node in labels.skeletons[0].nodes:
    nodes_dict[node.name] = 0

# Cheack each labelled frame for unverified and non-visible points
for labelled_frame in labels.labeled_frames:
    unverified_dict[labelled_frame.frame_idx+1] = 0
    hidden_points[labelled_frame.frame_idx+1] = 0
    unverified_count = 0
    
    for instance in labelled_frame.user_instances:
        total_instances += 1
        for node, point in instance.nodes_points:
            if not point.complete:
                total_unverified += 1
                unverified_dict[labelled_frame.frame_idx+1] += 1
            if not point.visible:
                nodes_dict[node.name] += 1
                hidden_points[labelled_frame.frame_idx+1] += 1
    frames_list.append((labelled_frame.video.backend.filename, labelled_frame.frame_idx, unverified_count))

# Display the total number of instances and unverified points
print(f"Total labelled frames: {len(labels.labeled_frames)}")
print(f"Total labelled instances: {total_instances}")
print(f"Total unverified points: {total_unverified}")
display(pd.DataFrame(sorted(frames_list), columns=['video filename', 'frame index', 'unverified point count']))

In [None]:
# Display the count and percentage of body part labels marked not visible
inv_df = pd.DataFrame.from_dict(nodes_dict, orient='index', columns=['invisible'])
inv_df['invisible percent of total'] = inv_df['invisible'] / total_instances * 100
display(inv_df)

In [None]:
# Display a sample labelled frome from the dataset
try:
    labels.labeled_frames[0].plot()
except FileNotFoundError as e:
    print(e)
    print("Make sure that missing videos have been replaced if the project file was created on another machine")