# Dataset merging

This notebook merges multiple SLEAP datasets into a single dataset, which is easier to use for training and inference compared to using multiple datasets. To minimise the possibility of merge conflicts occurring, it is recommended that multiple individual SLEAP datasets do not contain labelled frames for the same video.

Enter values for each variable in the first cell below according to your setup and requirements. Each variable and corresponding value should look like the following,

`variable_name = value`

After entering values for all variables, click Cell > Run All in the top menu bar to execute the notebook and merge the datasets.

Note that the 'r' before the opening quotation mark for the `input_folder` variable is required to ensure that backslashes in the folder path don't cause issues during processing, as the backslash is usually used to denote an "escaped" character in character strings.

## Parameters

In [None]:
# Enter the path to the input folder that contains all of the datasets that will be merged
# e.g. r"D:\Documents\COMP3850-Group23-Ant-posture\datasets"
input_folder = r""

# Enter a name for the combined dataset output file
# e.g. "main.v001.slp"
output_filename = ""


## Code execution

In [None]:
# Import the Python modules required for the notebook to run
import sleap
import os

# Create an empty labelled dataset to load individual datasets into
combined = sleap.Labels()

# Find all of the SLEAP dataset files in the input folder
files = [file for file in os.scandir(input_folder)
         if file.is_file() and file.name.endswith(".slp") and file.name != output_filename]

# Raise an error if there are no SLEAP dataset files found in the input folder
if len(files) == 0:
    raise RuntimeError("No SLEAP files found in the input folder, check that .slp files exist in the input folder")

In [None]:
# Iterate through the individual datasets and add to combined dataset
for file in files:
    labels = sleap.load_file(file.path)
    _, base_conflicts, new_conflicts = sleap.Labels.complex_merge_between(combined, labels)
    if base_conflicts or new_conflicts:
        raise RuntimeError("A conflict occurred, make sure that the individual datasets do not contain labelled frames for the same video")

In [None]:
# Print out information for the combined dataset
# More information can be obtained using the 'Label analysis' notebook

instances = 0

for frame in combined.labeled_frames:
    instances += len(frame.instances)

print(f"Number of videos: {len(combined.videos)}")
print(f"Total number of labelled frames: {len(combined.labeled_frames)}")
print(f"Total number of labelled instances: {instances}")

In [None]:
# Save combined dataset to the input folder
combined.save(os.path.join(input_folder, output_filename))