<h2>File paths and imports</h2>

In [3]:
from ui_lib import *

# path of the model
model_path = "..."

# video paths:
# input video (without any annotation)
input_video_path = "C:/Users/Theo/Documents/Unif/chimprec-videos/sample_1/sample1.MP4"
# output (tracking without human-based improvements)
output_video_path = "C:/Users/Theo/Documents/Unif/chimprec-videos/sample_1/sample1_test.MP4"
# output (final version - with human interaction)
output_edited_video_path = "C:/Users/Theo/Documents/Unif/ChimpRec/Code/Tracking/user_interaction/sample_1/output_edited.txt"

# text file paths
# text file containing the output of the tracking operations
input_text_file_path = "C:/Users/Theo/Documents/Unif/ChimpRec/Code/Tracking/user_interaction/test_files/raw_output.txt"
# text file containing the manually produce modifications
edit_path = "C:/Users/Theo/Documents/Unif/ChimpRec/Code/Tracking/user_interaction/test_files/edit_stage1.txt"
# final output text file (this is a modification of <input_text_file_path> based on <edit_path>)
output_text_file_path = "C:/Users/Theo/Documents/Unif/ChimpRec/Code/Tracking/user_interaction/test_files/edited_w_swaps.txt"


<h2>First step:</h2>
<h3>Processing the raw video without annotation and produce a textual output (stored in <i>input_text_file_path</i>) and a visual output (accessible via <i>output_video_path</i>).</h3>

In [4]:
max_cosine_distance = 0.5       # maximal distance to match an object (lower = more strict)
nn_budget = None                # maximal buffer size
metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)

# YOLOv8s initialisation
YOLOv8s = YOLO(model_path)

# DeepSORT initialisation
DeepSort = DeepSortTracker(metric)

# Osnet initialisation
Osnet = torchreid.models.build_model(name='osnet_x1_0', num_classes=751, pretrained=True)
Osnet.eval()

# production of the textual output
perform_tracking(
    input_video_path = input_video_path, 
    output_text_file_path = input_text_file_path, 
    detection_model = YOLOv8s, 
    tracker = DeepSort,
    confidence_threshold = 0.5, 
    model_feature_extraction = Osnet
)

# production of the visual output
draw_bbox_from_file(
    file_path = input_text_file_path, 
    input_video_path = input_video_path, 
    output_video_path = output_video_path
)

Successfully loaded imagenet pretrained weights from "..."
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']


In [2]:
# production of the visual output
draw_bbox_from_file(
    file_path = input_text_file_path, 
    input_video_path = input_video_path, 
    output_video_path = output_video_path,
    draw_frame_count=True
)

<h2>Second step: 

Human interaction required to bind the chains of detections together and perform recognition</h2>
<h3>This part of the process is based on a human edition. The modifications have to:</h3>

* Be econded according to a very well defined format (see explanations below)
* Be stored in the text file located at <i>edit_path</i>

<h3>The edition file aims at solving two major tasks:</h3>

* Merging some chains of detections.
* Labelising the chimpanzees with their names when they can be identified instead of ids. When no name is given, a default name is written (UNK_X).

<h3>Format of the edition file:</h3>

* The identifiers of the chains of detections to be merged have to be written on the same line.
    * As a result, if a chain of detections needs to appear without being merged, its identifier must be isolated on a line written in the edition file.
* If a chain of detections has to be removed, then its identifier must not appear in the edition file. 
* If a name can be attached to set of chains of detections, then it must be written on the same same line as them separated by ": " (note: the space character is important)

<h3>Concrete examples:</h3>

![Example 1](images/ex1.png)
![Example 2](images/ex2.png)

As we can see on those two images, the same individual belongs to two subsequent chains of detections (respectly 6 and 10). The following line must appear in the edition file if you want to merge them:

<b>6 10</b>

If you are able to identify the identity of this indiviual (say its name is "Muke"), then you can type the following line to attach it a name:

<b>Muke: 6 10</b>

![Example 3](images/ex3.png)

In this example, we are interested in removing the boxes 52 but keep the boxes 43. To do so, you simply need to not mention the id 52 in the edition file. However, 43 has to appear even if it would be alone on a line. For instance, the following line can possibly figure in the edition file:

<b>43</b>

Finally, if a chimpanzee cannot be recognized, you can still merge the chains of detection and a default name will be assigned.

![Example 4](images/ex4.png)

This picture shows an example of individuals that couldn't have been recognised (note: UNK is used to designate unkown in shorts).

Once this step is completed, all the human processing is finished. Meaning that the execution of the following code is going to produce a correction of the first output video.

<h2>Third step:</h2>
<h3>Processing the output of the two first steps to produce a textual output (stored in <i>output_text_file_path</i>) and a visual output (accessible via <i>output_edited_video_path</i>).</h3>

In [5]:
# instanciating the readers and writer
raw_reader = raw_tracking_data_reader(input_text_file_path)
edit_reader = modification_reader(edit_path)
writer = data_writer(output_text_file_path)

modified_data = edit_raw_output(raw_reader, edit_reader)  

# production of the textual output
writer.write(modified_data)

# # production of the visual output
# draw_bbox_from_file(
#     file_path = output_text_file_path, 
#     input_video_path = input_video_path, 
#     output_video_path = output_edited_video_path
# )

{'1': [('1', '2')], '2': [('1', '1')]}
[[['UNK_0', '1', '1', '1', '1'], ['UNK_1', '2', '2', '2', '2']], [['UNK_1', '1', '1', '1', '1'], ['UNK_0', '2', '2', '2', '2']], [['UNK_1', '1', '1', '1', '1'], ['UNK_0', '2', '2', '2', '2']], [['UNK_1', '2', '2', '2', '2']]]


In [3]:
# test cell
from ui_lib import *
raw_path = "..."
edit_path = "..."
output_file_path = "..."

raw_reader = raw_tracking_data_reader(raw_path)
edit_reader = modification_reader(edit_path)

print(edit_reader.data)

writer = data_writer(output_file_path)

modified_data = edit_raw_output(raw_reader, edit_reader)  
writer.write(modified_data)

[['UNK_0', ['3']], ['UNK_1', ['4', '2']]]
