# UFO Project Report for Fish Keypoint Detection #

Sam Nündel, for M.Sc. Information Engineering

August 2024

### Content ###

1. Introduction & Related Work
2. Implementation
3. Experiments
5. Results

### Instructions on running this notebook ###

* clone repository FishVision including the necessary, adapted YOLOv7 main and pose branches, as well as the most recent dataset
* change paths to your own environment's
* run the pip install and conda install commands for the necessary versions of certain packages that work best in combination with each other

### Other repositories (own and used) ###

https://github.com/fhkiel-mlaip/ufo-keypoint-detection
https://github.com/WongKinYiu/yolov7
https://github.com/ultralytics/JSON2YOLO/
https://github.com/TanjaNuendel/FishVision (Code for Norwegian working group)

## Introduction & Related Work ##

### Project Context ###

The scope of this particular task in Project UFO was to examine and adapt various methods of keypoint detection to cod fish, filmed by an underwater observatory in the Baltic Sea. After having completed a similar course at Universitet i Agder in Norway, the results on salmon in aqua culturing were compiled and funneled into additional experiments, specific to the cod fish dataset.

### Keypoint detection in aquatic animals ###

The current state-of-the-art methods in aquatic monitoring consist of a large number of computer vision systems that work on a variety of different architectures and methods from simpler mathematical classifiers to highly complex detection pipelines using neural network architectures. It can be attested that the YOLO model family has been very popular for tasks relation to underwater image detection, especially with moving fish in real-time applications. In 2020, Mohamed et al. introduced MSR-YOLO [7], an adaptation of YOLO to specific fish detection, a model family that has often been used for aquatic monitoring or species detection, such as by Kalhagen et al. [8], who used a hierarchical approach with YOLOv3. The same network was used by Wang et al. in 2021 for general fish instance detection [9] Chuang et al 2020 used Mask-RCNN for segmentation, but did not mention specific keypoint-based measurement [10]. YOLO-Fish by Muksit et al. [11] deserves a mention as a further adaptation of a YOLO model to the sphere of fish detection. It can be concluded that the realm of fish detection and monitoring consists of a fairly large volume of research that branches into numerous areas of AI-based techniques.

Usually, keypoint detection is applied with the purpose of pose estimation, especially for Current SOTA for keypoint detection (COCO Benchmark for Keypoint Detection is 4xRSN50 as of 2020) is the 4xRSN-50 network as proposed by [12]. Reaching a test AP of 78.6 %, it is yet to be adapted to
other purposes than human pose estimation, but remains the current winner. A Mask-RCNN [13] implementation called Keypoint RCNN which only exists in a PyTorch implementation, consists of masks for single keypoints being used for anatomical keypoint analysis. For this research context, pose estimation is currently not of the greatest interest, but could be used later in adapting a monitoring system for behavioral analysis using fish poses and tracking movement.

Species-specific keypoint detection in fish to obtain either measurements or predict animal pose has not been a focus topic both in aquatic science resarch, as well as computer vision in general. Yet, Suo et al. 2020 [14] and Chen et al. 2017 [15] used two different types of neural network architectures for fish keypoint detection with good results. In [14], a stacked hourglass network is integrated into a fish detection pipeline that first detects instances and secondly uses a model trainied specifically on fish. [15]’s fish pose estimation system works with VGG16. Some researchers have thusly proposed fish pose analysis, but rarely are anatomical keypoints used. Therefore, a scientific niche can be attested to the specific task of anatomical landmark detection in fish, especially for measurement purposes. This project’s keypoin detection system was mostly inspired by the ideas laid down in [14], even though a different keypoint detection system is being used.

### YOLO model family ###

YOLO models are specifically designed to work well with real-time object detection [16], especially YOLOv7 has pushed the boundaries in how fast it can infer, according to [17], surpassing its predecessors in both speed of inference and accuracy, e.g. by 120% over YOLOv5 with a V100 inference system at an AP of 55. Wong Kin Yiu’s implementation of Wang et al. 2022 paper is based on the earlier YOLOv5 and features keypoint detection as a new feature that had formerly only been introduced by the authors of YOLO-Pose [18] as a model adaptation for human pose estimation.

YOLOv7 was chosen for the task at hand due to it being the first implementation of the model family to incorporate keypoint detection, due to its high prominence in applications for real-time detection and its built-in pose estimation that can in the furure be adapted for estimating poses of fish after keypoint detection. YOLOv7 has shown good success in developing both speed and accuracy of the model family further. [17] The model family is under constant development and will likely consist of more updates to the keypoint detection mechanisms, such as YOLOv8 by Ultralytics also incorporated keypoints after YOLOv7 started. Furthermore, YOLOv7 can with relative ease be ported from still image training to video inference, therefore planting the road to real-time application in fish monitoring.

As one of the most prominent family of neural architectures, the YOLO family follows some base principles, according to [19]: a single pass on network, hence the slogan "You only look once", a regression-based output and non-maximum suppression. The non-maximum suppression is well established in the computer vision model world as a duplicate removal technique to declutter the detected bounding boxes within an image.

## Method & Implementation ##

### Data Processing ###

The images selected to be pipelined into the detection system are identified and a detection of fish instances occurs to crop each given image to the bounding box of the contained fis, which results in images of a rough size of ca. 500 to 800 pixels on the wider horizontal side. Images are not resized during pre-processing, but through a training image sizing parameter set to 640 pixels on the widest side for training consistency. Slight image processing, e.g. increase in brightness is done to increase image quality.

The annotation process is done manually and results in MS COCO-formatted image annotations which contain bounding boxes reaching over the whole image from (0,0) to (x_max, y_max), as well as a class marker, datapoint ID, size information and, most importantly, 20 salmon-specific anatomical keypoints from head to tail. Since YOLOv7 utilizes a different annotation format from MS COCO, annotations must be converted automatically and sorted into the right folder structure for use with YOLOv7. Data augmentation is not done before processing, as the implementation of YOLOv7 consists of a Mosaic-type augmentation which create eight further variants of each image during the training process if the augmentation parameter is set to True.

Pre-cropped and improved images with annotated keypoints are fed into the YOLOv7 training pipeline using YOLOv7’s built-in parameterized training execution. Variations in training parameters in YOLOv7 can include image size, variant of pretrained weights, hyperparameter files, dataset, batch size, epoch, whether an Adam optimizer or keypoint labels instead of only bounding boxes in instance detection are to be applied. Variations in hyperparameters can include initial learning rate, keypoint, class and box loss gain, IoU thresholds during training, image augmentation types and
weight decay. The chosen parameter changes concern mainly image size, weights, hyperparameters, batch size, Adam optimizer and epochs.

In YOLOv7 [17], the original training was conducted using the MS COCO 2017 dataset which contains human body annotations for pose estimation, using 17 keypoints. Since the number of 17 keypoints is hardcoded into various parts of the implementation code, the choice was made to reduce the trained fish keypoints by three and leave out the least relevant ones. Therefore, keypoints 17, 19 and 20 (starting count at 1) were omitted in training of a 17-keypoint model, as they are at this point of low relevance to deformation detection, since the area around the area prior to the caudal fin of the specimen is only considered in length estimation using keypoint #18 as its caudal fin marker.

To distinguish results on varying numbers of keypoints, a 17-keypoint model and a 8-keypoint model were trained and tested. The 8-keypoint model consists of keypoints 1 through 4, which are planted around the fish mouth to detec jaw deformity, as well as keypoints 9 through 11 for width estimation and keypoint 18 for length estimation. YOLOv7 automatically saves the best weights from each model training run to be used in inference, where specific weights can be chosen to be applied for use on previously unseen data.

The salmon dataset was collected at a Norwegian salmon farming facility at Smørdalen from April 2022 using an unspecified underwater mono camera setup, containing healthy specimen only. 449 images were annotated in MS COCO format with their respective bounding boxes, keypoints and keypoint
visibilities, and delivered in PNG format. The images were cropped to bounding boxes and a fish segmentation mask including a broad border around the fish body with a blacked out background was applied. The data was split into train, test and validation sets based on the same randomization
seed for both model trainings in a ratio of .8/.1/.1.

The cod images, provided by the UFO project team, recorded in the Baltic were processed in a similar manner to the salmon images, also using a conversion on the formatted keypoint annotations and splitting into a ratio of .8/.1/.1 for train/test/validation. The set consisted of ... images of codfish in various poses.

### Preliminary setup and initialization tests ###

In [3]:
# ensure the following packages are present in their respective versions

!pip install seaborn
!pip install split-folders
!pip install --force-reinstall opencv-python-headless==4.1.2.30
!pip install --force-reinstall numpy==1.22.0
!pip install onnxruntime==1.10.0
!pip install wandb
#!conda install -c conda-forge ipywidgets



ERROR: Ignored the following yanked versions: 3.4.11.39, 3.4.11.41, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.5.5.62, 4.7.0.68, 4.8.0.74
ERROR: Could not find a version that satisfies the requirement opencv-python-headless==4.1.2.30 (from versions: 3.4.10.37, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.15.55, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.46, 4.5.1.48, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.64, 4.6.0.66, 4.7.0.72, 4.8.0.76, 4.8.1.78, 4.9.0.80, 4.10.0.82, 4.10.0.84)
ERROR: No matching distribution found for opencv-python-headless==4.1.2.30


Collecting numpy==1.22.0
  Using cached numpy-1.22.0.zip (11.3 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: numpy
  Building wheel for numpy (pyproject.toml): started
  Building wheel for numpy (pyproject.toml): finished with status 'error'
Failed to build numpy


  error: subprocess-exited-with-error
  
  Building wheel for numpy (pyproject.toml) did not run successfully.
  exit code: 1
  
  [206 lines of output]
  Running from numpy source directory.
  Processing numpy/random\_bounded_integers.pxd.in
  Processing numpy/random\bit_generator.pyx
  Processing numpy/random\mtrand.pyx
  Processing numpy/random\_bounded_integers.pyx.in
  Processing numpy/random\_common.pyx
  Processing numpy/random\_generator.pyx
  Processing numpy/random\_mt19937.pyx
  Processing numpy/random\_pcg64.pyx
  Processing numpy/random\_philox.pyx
  Processing numpy/random\_sfc64.pyx
  Cythonizing sources
  INFO: blas_opt_info:
  INFO: blas_armpl_info:
  INFO: No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  INFO: customize MSVCCompiler
  INFO:   libraries armpl_lp64_mp not found in ['C:\\Users\\tnuendel\\.conda\\envs\\FishEnv\\lib', 'C:\\', 'C:\\Users\\tnuendel\\.conda\\envs\\FishEnv\\libs', 'C:\\ProgramData\\anaconda3\\Library\



In [None]:
import sys
import torch
import torchvision
import cv2
import yaml
import matplotlib.pyplot as plt
from torchvision   import transforms
import numpy as np
import os
sys.path.append("FishVision")
%matplotlib inline

# check for nvidia card monitoring
!nvidia-smi

# if error with ipywidgets: use "conda install -c conda-forge ipywidgets" in terminal

In [None]:
# YOLO v7 setup

%cd /home/tanjan/FishVision/yolov7_nokpt

from utils.datasets import letterbox
from utils.general import non_max_suppression_kpt
from utils.plots import output_to_keypoint, plot_skeleton_kpts
import numpy

'''
# testing YOLO setup with inference
!python detect.py --weights yolov7.pt --source inference/images/horses.jpg --img 640

im = plt.imread('/home/tanjan/FishVision/yolov7_nokpt/runs/detect/exp5/horses.jpg')
implot = plt.imshow(im)
plt.show()
'''

In [None]:
# initialize for CUDA with pretrained pose estimation weights for human model test

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load('yolov7-w6-pose.pt', map_location=device)
model = weigths['model']
_ = model.float().eval()

if torch.cuda.is_available():
    model.half().to(device)

if model: 
    print("Model ready")

In [None]:
testimage = 'Pontatal-86klein.jpg'

image = cv2.imread(testimage)
image = letterbox(image, 960, stride=64, auto=True)[0]
image_ = image.copy()
image = transforms.ToTensor()(image)
image = torch.tensor(np.array([image.numpy()]))

if torch.cuda.is_available():
    image = image.half().to(device)   
output, _ = model(image)

print("Ready")

In [None]:
output = non_max_suppression_kpt(output, 0.25, 0.65, nc=model.yaml['nc'], nkpt=model.yaml['nkpt'], kpt_label=True)
if output:
    print("Output ready")
with torch.no_grad():
    output = output_to_keypoint(output)
nimg = image[0].permute(1, 2, 0) * 255
nimg = nimg.cpu().numpy().astype(np.uint8)
nimg = cv2.cvtColor(nimg, cv2.COLOR_RGB2BGR)
for idx in range(output.shape[0]):
    plot_skeleton_kpts(nimg, output[idx, 7:].T, 3)
    
plt.figure(figsize=(8,8))
plt.axis('off')
plt.imshow(nimg)
plt.show()

### Application for Fish Keypoint Detection ###

In [None]:
## Application for Fish Keypoint Detection ##

* ensure that keypoints are in YOLOv7 annotation format
* split data into test/train
* keep weights of yolov7-w6-pose.pt and test on fish to gain new weights from best trained model
* infer using unseen salmon images of various types
* no need to augment data manually, as YOLOv7 uses the mosaic technique to internally augment images in 8 extra ways per image

### Visualization and choice of keypoints in 17-keypoint model ###

* YOLO v7 can accomodate a maximum of 17 keypoints due to its adaptation for 17 keypoints of the human body in pose estimation, set during training with the COCO dataset
* of 20 given keypoints (1-20), the least useful will be removed for training the model
* keypoints 17, 19 and 20 will be removed as they're not necessary for any specific measurements

### Deformity Detection ###

Currently, the detected keypoints are connected within the model as if they're part of the human body. To remedy that and to see if the ratios between correctly assigned keypoints are viewed as desirable (see deformity types), distances between the following keypoints will be calculated and collected within a separate dataset to analyze statistically.

* jaw deformity of keypoint ratios between KP 1 to 4: distances of 1-4, 1-2, 2-3
* length vs. height ratio (slim fish): 1 -> 18 vs 9/10/11 in a triangle in which the midpoint of 9 and 10 is used as the 

In [None]:
# example image for keypoint visualization 

im = plt.imread('/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images/fish/2022-04-11-00-01-10_right.jpg_7.png')
kpts_20 = [(27, 48), (58, 74), (30, 61), (52, 56), (78, 28), (121, 79), (110, 98), (118, 97), (244, 21), (341, 32), (283, 124), (407, 111),
           (445, 98), (441, 55), (470, 62), (494, 66), (515, 53), (540, 77), (521, 99), (502, 94)]
kpts_20 = list(zip(*kpts_20))

implot = plt.imshow(im)
plt.title("Originally proposed 20 keypoints")
plt.plot(kpts_20[0],kpts_20[1], 'or')
for x,y in zip(kpts_20[0],kpts_20[1]):

    label = (kpts_20[0].index(x)+1)

    plt.annotate(label, 
                 (x,y), 
                 textcoords="offset points",
                 xytext=(0,0), 
                 color='white',
                 ha='left')
plt.show()


# 17 keypoints
# realistically, YOLOv7 will take extra points only as zero values from the annotations, so non-used values are zeroed out

kpts_17 = [(27, 48), (58, 74), (30, 61), (52, 56), (78, 28), (121, 79), (110, 98), (118, 97), (244, 21), (341, 32), (283, 124), (407, 111),
           (445, 98), (441, 55), (470, 62), (494, 66), (0, 0), (540, 77), (0, 0), (0, 0)]
kpts_17 = list(zip(*kpts_17))

implot = plt.imshow(im)
plt.title("Reduced to17 keypoints")
plt.plot(kpts_17[0],kpts_17[1], 'or')
for x,y in zip(kpts_17[0],kpts_17[1]):
    if x!=0 and y!=0:
        label = (kpts_17[0].index(x)+1)
        plt.annotate(label, # this is the text
                 (x,y),
                 textcoords="offset points", 
                 xytext=(0,0),
                 color='white',
                 ha='left')
plt.show()

# 8 keypoints

kpts_17 = [(27, 48), (58, 74), (30, 61), (52, 56), (0, 0), (0, 0), (0, 0), (0, 0), (244, 21), (341, 32), (283, 124), (0, 0),
           (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (540, 77), (0, 0), (0, 0)]
kpts_17 = list(zip(*kpts_17))

implot = plt.imshow(im)
plt.title("Reduced to 8 keypoints")
plt.plot(kpts_17[0],kpts_17[1], 'or')
for x,y in zip(kpts_17[0],kpts_17[1]):
    if x!=0 and y!=0:
        label = (kpts_17[0].index(x)+1)
        plt.annotate(label,
                 (x,y),
                 textcoords="offset points",
                 xytext=(0,0),
                 color='white',
                 ha='left')
plt.show()

In [None]:
# function definition for COCO JSON to YOLOv7 keypoint format annotaion conversion
# forked and adapted to keypoint formats from JSON2YOLO by Ultralytics at https://github.com/ultralytics/JSON2YOLO/

import splitfolders
import json
import itertools
from pathlib import Path
from tqdm import tqdm

def make_dirs(dir, version=0):
    # version: labels/images folders, inside each are test/train/valid
    if version==0:
            # Create folders
        dir = Path(dir)
        for p in dir, dir / 'labels', dir / 'images':
            p.mkdir(parents=True, exist_ok=True)  # make dir
        return dir

'''
reordering of directories for YOLOv7 directory format
'''
def reorder_dirs(root):
    # move data up a folder
    for basedir in listdir(root): # /images and /test
        for subdir in listdir(join(root, basedir)): # /test, /train and /val
            for subsubdir in listdir(join(root, basedir, subdir)): # class directory (fish/anno)
                for filename in listdir(join(root, basedir, subdir, subsubdir)): # move all files up into parent folder
                    move(join(root, basedir, subdir, subsubdir, filename), join(root, basedir, subdir, filename))
                rmdir(join(root, basedir, subdir, subsubdir))
    
    yolo_dirs = ['test', 'train', 'val']
    for _dir in yolo_dirs:
        print(_dir)
        _dir = os.path.join(root, _dir)
        if not os.path.exists(_dir):
            os.makedirs(_dir)
        if not os.path.exists(join(root, _dir, 'images')):
            os.makedirs(join(root, _dir, 'images'))
        if not os.path.exists(join(root, _dir, 'labels')):
            os.makedirs(join(root, _dir, 'labels'))
    
    basedir = os.path.join(root, r'images')
    for subdir in listdir(basedir):
        for filename in listdir(join(root, basedir, subdir)): # move all files up into
            move(join(root, basedir, subdir, filename), join(root, subdir, 'images', filename))
        rmdir(join(root, basedir, subdir))
    rmdir(join(root, basedir))

    basedir = os.path.join(root, r'labels')
    for subdir in listdir(basedir):
        for filename in listdir(join(root, basedir, subdir)): # move all files up into
            move(join(root, basedir, subdir, filename), join(root, subdir, 'labels', filename))
        rmdir(join(root, basedir, subdir))    
    rmdir(join(root, basedir))

def coco91_to_coco80_class():  # converts 80-index (val2014) to 91-index (paper)
    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco
    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet
    x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
         None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
         51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
         None, 73, 74, 75, 76, 77, 78, 79, None]
    return x
        
# converter function
def convert_coco_json(json_dir, n_kpts, omit=None, use_segments=False, cls91to80=False):
    save_dir = json_dir
    coco80 = coco91_to_coco80_class()

    # Import json
    for json_file in sorted(Path(json_dir).resolve().glob('*.json')):
        print("enter loop")
        fn = Path(save_dir) / 'labels' / json_file.stem.replace('instances_', '')  # folder name
        
        with open(json_file) as f:
            data = json.load(f)

        # Create image dict
        images = {'%g' % x['id']: x for x in data['images']}

        # Write labels file
        for x in tqdm(data['annotations'], desc=f'Annotations {json_file}'):
            img = images['%g' % x['image_id']]
            h, w, f = img['height'], img['width'], img['file_name']

            # The COCO box format is [top left x, top left y, width, height]
            box = np.array(x['bbox'], dtype=np.float64)
            
            if box[0]==0 and box[1]==0:
                # move bbox coordinates by 1 px in each inward direction to avoid faulty normalization
                box[0] += 1
                box[1] += 1
                box[2] -= 1
                box[3] -= 1

            box[:2] += box[2:] / 2  # xy top-left corner to center
            box[[0, 2]] /= w  # normalize x
            box[[1, 3]] /= h  # normalize y
            
            #extract keypoints from JSON
            keypoints = np.array(x['keypoints'], dtype=np.float64)
            #print("Original KP", keypoints)
            
            if omit:
                # omit chosen keypoints
                idx_to_omit = []
                for kp in omit:
                    kp -= 1
                    idx_to_omit.append(kp*3)
                    idx_to_omit.append(kp*3+1)
                    idx_to_omit.append(kp*3+2)
                keypoints = np.delete(keypoints, idx_to_omit)
                       
            #normalize keypoints in each triplet of x, y and occlusion flag (occlusion flag not converted)
            keypoints[0::3] /= w # normalize x
            keypoints[1::3] /= h # normalize y
            print(keypoints)
            
            # will append three 0.000000 float values after the used keypoints and result in different indexing, but is due to data format YOLOv7 takes
            if n_kpts < 17:
                fillers = 17-n_kpts
                values = [0.000000, 0.000000, 0.000000]
                for fr in range(fillers):
                    keypoints = np.append(keypoints, values, axis=None)
            #print("Normalized KP after filling up with 0.000000", keypoints)
            
            box_key = np.append(box, keypoints)
            for val in box_key:
                if val == 1:
                    val = 1.0
           
            # Segments
            if use_segments:
                segments = [j for i in x['segmentation'] for j in i]  # all segments concatenated
                s = (np.array(segments).reshape(-1, 2) / np.array([w, h])).reshape(-1).tolist()

            # Write
            if box[2] > 0 and box[3] > 0:  # if w > 0 and h > 0
                cls = coco80[x['category_id'] - 1] if cls91to80 else x['category_id'] - 1  # class
                line = cls, *(s if use_segments else box_key)  # cls, box/keypoints or segments
                with open((fn / f).with_suffix('.txt'), 'a') as file:
                    file.write(('%g ' * len(line)).rstrip() % line + '\n')
                    
        
print("Functions ready")

In [None]:
# ONLY COMPUTE IF NECESSARY
'''
keypoint conversion from COCO to YOLO v7 format
converting the ultralytics JSON2YOLO converter at https://github.com/ultralytics/JSON2YOLO for additional keypoint data
format: class, x_center, y_center, width, height, kpt1_x, kpt1_y, visibility_1, ..., kptn_x, kptn_y, visibility_n
'''

source_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct'

# define keypoints for omission to fit the 17-keypoint format used by COCO trained YOLOv7
# WARNING: keypoints start counting at 1, conversion will be made internally, do NOT adjust for index 0
omitted_kp = [17, 19, 20]

convert_coco_json(source_dir, 17, omitted_kp)

In [None]:
# ONLY COMPUTE IF NECESSARY

# Split with a ratio of .8 : .1 : .1 train:test:val with a reproducible seed for both labels and images created within last step

from os.path import join
from os import listdir, rmdir
from shutil import move

working_dir = '/home/tanjan/FishVision'

root = os.path.join(working_dir, r'data_17')
if not os.path.exists(root):
    os.makedirs(root)

label_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/labels'
img_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images'
label_target_dir = '/home/tanjan/FishVision/data_17/labels'
img_target_dir = '/home/tanjan/FishVision/data_17/images'

splitfolders.ratio(label_dir, output=label_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)
splitfolders.ratio(img_dir, output=img_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

reorder_dirs(root)

## Experiments ##

1. setup - describe
2. metrics
3. training and testing with results
4. inference on unseen fish

### Metrics ###

While YOLO uses the well known AP or mAP metric - meaning (mean) Average Precision in detection that works with different IoU thresholds, there are further metrics that keypoint detection systems can rely on to be judged by inference quality. The OKS (average Object Keypoint Similarity), given in percentages, quantifies the distance between keypoints that were predicted by a system and is commonly used as a measure of quality in pose estimation tasks [21]. The PCK (percentage of correct keypoints) metric similarly measures the quality of keypoint inference, but uses a threshold within which a keypoint is considered the same in spite of a distance between its predicted and true location [22]. Both measures and other are useful for quality assessement and will be used frequently.



### Model training and testing for 17 keypoints in salmon ###

After an initial experimentation phase, the optimum values parameters to observe the 17-keypoint model training were set at a batch size of 32 and an epoch count of 200, as 200 epochs have been found sufficient to clear up most of the metric outliers and lead to a stabilized, meaningful curve,
with an IoU threshold for AP metric calculation of 0.5. Two runs with an Adam optimizer were attempted, but resulted in strongly fluctuating mAP curves that did not stabilize. In a 600 epoch test run, mAP was found to be close to unchaning after ca. 200 epochs, while training loss still
dropped slightly.

In [None]:
# initialize for CUDA with pretrained pose estimation weights

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load('yolov7-w6-pose.pt', map_location=device)
model = weigths['model']
_ = model.float().eval()

if torch.cuda.is_available():
    model.half().to(device)

In [None]:
# custom model training in simple CLI fashion using yolov7 pose branch without virtual torch container
%cd /home/tanjan/FishVision/yolov7

!python train.py --data data/fish_17kpts.yaml --workers 8 --epochs 50 --cfg cfg/yolov7-w6-pose.yaml --weights yolov7-w6-pose.pt --batch-size 32 --img 640 --kpt-label --sync-bn --device 0 --name yolov7-fish-seventeen --hyp data/hyp.pose.yaml
# add "-m torch.distributed.launch --nproc_per_node 8 --master_port 9527" after train.py if running distributed torch containers

In [None]:
# testing 

!python test.py --data data/fish_kpts.yaml --conf 0.001 --iou 0.65 --kpt-label --weights yolov7-w6-pose.pt --batch 32 --img 640 --device 0 --name yolov7-fish-seventeen
# -m torch.distributed.launch --nproc_per_node 8 --master_port 9527

### Inference with best models on salmon and cod ###

Unseen fish images from the test set were tested using both best models in an inference setup using YOLOv7’s built in parameterized test calls. No specific further parameters were set, except for the use of the respective best saved model weights from the best model runs. In both cases, an average of 7-8 keypoints could be infered, the PCK score over the respective three test images was ca. 57.14 % for 17 keypoints and 67.86 % for 8 keypoints, not counting uninfered keypoints that could have been recognized. The 8-keypoint model seems to perform slightly better in terms of inference on unseen images. When testing the 8-keypoint model on an unseen unhealthy fish, the keypoint inference was highly accurate with 6 found keypoints, all of which were correctly placed, yet supposedly due to jar deformation, the first two jar-based keypoints could not be infered at all.

In [None]:
# testing 

!python test.py --data data/fish_kpts.yaml --conf 0.001 --iou 0.65 --kpt-label --weights yolov7-w6-pose.pt --batch 32 --img 640 --device 0 --name yolov7-fish-seventeen
# -m torch.distributed.launch --nproc_per_node 8 --master_port 9527

In [None]:
!python detect.py --weights runs/train/yolov7-fish27/weights/best.pt --source "inf_img/4.jpg" --kpt-label
!python detect.py --weights runs/train/yolov7-fish27/weights/best.pt --source "inf_img/salmosalar01.jpg" --kpt-label
!python detect.py --weights runs/train/yolov7-fish27/weights/best.pt --source "inf_img/salmosalar02.jpg" --kpt-label

In [None]:
im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp12/4.jpg')
implot = plt.imshow(im)
plt.show()

im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp13/salmosalar01.jpg')
implot = plt.imshow(im)
plt.show()

im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp14/salmosalar02.jpg')
implot = plt.imshow(im)
plt.show()

## Adaptation for 8 keypoints in salmon and cod ##

To achieve the best possible result with minimal effort for the given goal (deformity detection), a variant with 8 keypoints was chosen to only use those that are used for deformity detection

* 1-4 for jaw deformity
* 18 for length measurement from 1 to 18
* 9-10 for slim fish syndrome detection

The 8-keypoint model ran with the same parameters as the 17-keypoint model, namely over 200 epochs using a batch size of 32, no Adam optimizer and with an IuO threshold set at 0.5.

For the given cod fish, the length is the necessary information that needs to be extracted, therefore, the length of keypoint 1 to 18 is the goal.

In [None]:
# ONLY COMPUTE IF NECESSARY

# create new annotations and images set in /data_8
'''
keypoint conversion from COCO to YOLO v7 format
converting the ultralytics JSON2YOLO converter at https://github.com/ultralytics/JSON2YOLO for additional keypoint data
format: class, x_center, y_center, width, height, kpt1_x, kpt1_y, visibility_1, ..., kptn_x, kptn_y, visibility_n
'''
source_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct'

# define keypoints for omission to fit the 17-keypoint format used by COCO trained YOLOv7
# WARNING: keypoints start counting at 1, conversion will be made internally, do NOT adjust for index 0
omitted_kp = [5, 6, 7, 8, 12, 13, 14, 15, 16, 17, 19, 20]

convert_coco_json(source_dir, 8, omitted_kp)

# Split with a ratio of .8 : .1 : .1 train:test:val with a reproducible seed for both labels and images created within last step

working_dir = '/home/tanjan/FishVision'

root = os.path.join(working_dir, r'data_8')
if not os.path.exists(root):
    os.makedirs(root)

label_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/labels'
img_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images'
label_target_dir = '/home/tanjan/FishVision/data_8/labels'
img_target_dir = '/home/tanjan/FishVision/data_8/images'

splitfolders.ratio(label_dir, output=label_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)
splitfolders.ratio(img_dir, output=img_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

reorder_dirs(root)

In [None]:
# ONLY COMPUTE IF NECESSARY

# create new annotations and images set in /data_8
'''
keypoint conversion from COCO to YOLO v7 format
converting the ultralytics JSON2YOLO converter at https://github.com/ultralytics/JSON2YOLO for additional keypoint data
format: class, x_center, y_center, width, height, kpt1_x, kpt1_y, visibility_1, ..., kptn_x, kptn_y, visibility_n
'''
source_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct'

# define keypoints for omission to fit the 17-keypoint format used by COCO trained YOLOv7
# WARNING: keypoints start counting at 1, conversion will be made internally, do NOT adjust for index 0
omitted_kp = [5, 6, 7, 8, 12, 13, 14, 15, 16, 17, 19, 20]

convert_coco_json(source_dir, 8, omitted_kp)

# Split with a ratio of .8 : .1 : .1 train:test:val with a reproducible seed for both labels and images created within last step

working_dir = '/home/tanjan/FishVision'

root = os.path.join(working_dir, r'data_8')
if not os.path.exists(root):
    os.makedirs(root)

label_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/labels'
img_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images'
label_target_dir = '/home/tanjan/FishVision/data_8/labels'
img_target_dir = '/home/tanjan/FishVision/data_8/images'

splitfolders.ratio(label_dir, output=label_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)
splitfolders.ratio(img_dir, output=img_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

reorder_dirs(root)

In [None]:
# ONLY COMPUTE IF NECESSARY

# create new annotations and images set in /data_8
'''
keypoint conversion from COCO to YOLO v7 format
converting the ultralytics JSON2YOLO converter at https://github.com/ultralytics/JSON2YOLO for additional keypoint data
format: class, x_center, y_center, width, height, kpt1_x, kpt1_y, visibility_1, ..., kptn_x, kptn_y, visibility_n
'''
source_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct'

# define keypoints for omission to fit the 17-keypoint format used by COCO trained YOLOv7
# WARNING: keypoints start counting at 1, conversion will be made internally, do NOT adjust for index 0
omitted_kp = [5, 6, 7, 8, 12, 13, 14, 15, 16, 17, 19, 20]

convert_coco_json(source_dir, 8, omitted_kp)

# Split with a ratio of .8 : .1 : .1 train:test:val with a reproducible seed for both labels and images created within last step

working_dir = '/home/tanjan/FishVision'

root = os.path.join(working_dir, r'data_8')
if not os.path.exists(root):
    os.makedirs(root)

label_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/labels'
img_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images'
label_target_dir = '/home/tanjan/FishVision/data_8/labels'
img_target_dir = '/home/tanjan/FishVision/data_8/images'

splitfolders.ratio(label_dir, output=label_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)
splitfolders.ratio(img_dir, output=img_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

reorder_dirs(root)

In [None]:
# ONLY COMPUTE IF NECESSARY

# create new annotations and images set in /data_8
'''
keypoint conversion from COCO to YOLO v7 format
converting the ultralytics JSON2YOLO converter at https://github.com/ultralytics/JSON2YOLO for additional keypoint data
format: class, x_center, y_center, width, height, kpt1_x, kpt1_y, visibility_1, ..., kptn_x, kptn_y, visibility_n
'''
source_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct'

# define keypoints for omission to fit the 17-keypoint format used by COCO trained YOLOv7
# WARNING: keypoints start counting at 1, conversion will be made internally, do NOT adjust for index 0
omitted_kp = [5, 6, 7, 8, 12, 13, 14, 15, 16, 17, 19, 20]

convert_coco_json(source_dir, 8, omitted_kp)

# Split with a ratio of .8 : .1 : .1 train:test:val with a reproducible seed for both labels and images created within last step

working_dir = '/home/tanjan/FishVision'

root = os.path.join(working_dir, r'data_8')
if not os.path.exists(root):
    os.makedirs(root)

label_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/labels'
img_dir = '/home/tanjan/FishVision/Smordalen_M1_2022-04-11_biomass_correct/images'
label_target_dir = '/home/tanjan/FishVision/data_8/labels'
img_target_dir = '/home/tanjan/FishVision/data_8/images'

splitfolders.ratio(label_dir, output=label_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)
splitfolders.ratio(img_dir, output=img_target_dir, seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

reorder_dirs(root)

### Preliminary results ###

Overall the training metric results for the 17- and 8-keypoint models were fairly similar with a slight favor in mAP in the best 17-keypoint model and slight variations in the metric and loss curves. Running the model test for 17 keypoints resulted in a fair amount of correctly observed keypoints.
Some fish were left entirely undetected, other fish had more than the necessary keypoints strewn across especially the jaw area.

The keypoints trained on salmon do not transfer well to the given codfish dataset, therefore a codfish-specific model should be trained with their given 8 keypoints.

### Model training and testing for 8 keypoints on cod ###

In [None]:
# initialize for CUDA with pretrained pose estimation weights

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load('yolov7-w6-pose.pt', map_location=device)
model = weigths['model']
_ = model.float().eval()

if torch.cuda.is_available():
    model.half().to(device)

# custom model training in simple CLI fashion using yolov7 pose branch without virtual torch container
%cd /home/tanjan/FishVision/yolov7

!python train.py --data data/fish_8kpts.yaml --workers 8 --epochs 200 --cfg cfg/yolov7-w6-pose.yaml --weights yolov7-w6-pose.pt --batch-size 32 --img 640 --kpt-label --sync-bn --device 0 --name yolov7-fish-eight --hyp data/hyp.pose.yaml
# add "-m torch.distributed.launch --nproc_per_node 8 --master_port 9527" after train.py if running distributed torch containers

In [None]:
!python detect.py --weights runs/train/yolov7-fish-eight/weights/best.pt --source "inf_img/4.jpg" --kpt-label
!python detect.py --weights runs/train/yolov7-fish-eight/weights/best.pt --source "inf_img/salmosalar01.jpg" --kpt-label
!python detect.py --weights runs/train/yolov7-fish-eight/weights/best.pt --source "inf_img/salmosalar02.jpg" --kpt-label

In [None]:
im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp15/4.jpg')
implot = plt.imshow(im)
plt.show()

im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp16/salmosalar01.jpg')
implot = plt.imshow(im)
plt.show()

im = plt.imread('/home/tanjan/FishVision/yolov7/runs/detect/exp17/salmosalar02.jpg')
implot = plt.imshow(im)
plt.show()

## Results ##

### Salmon keypoint detection ###

In tests within the Norwegian working group, the Mask R-CNN model used for keypoint inference performed slightly worse than the YOLOv7 models within an average range around 0.5 with the same IoU threshold of 0.5. Therefore, it might be concluded, that for this specific dataset, YOLOv7 is the
superior model, yet this might also be due to differences in augmentation, as YOLOv7 automatically uses the mosaic augmentation technique that results in the 9-fold times of images used during model training. It might be a hint towards a need for higher image count in training datasets and more
exploring of optimum dataset size.

The current SOTA benchmark for fish keypoint detection is held by Suo et al. 2020 [14], namely set at 0.667 OKS in an 8-keypoint scenario. This is not completely comparable to the PCK metric, but moves within a similar realm. The current MS COCO keypoint detection benchmark leader [23] is 4xRSN-50 [12] and boasts a test AP of 78.6. Mask R-CNN currently ranks 15th in the COCO keypoint benchmark with a test AP of 63.1. Results are therefore consistent with typical keypoint detection algorithms and could potentially be even better than current benchmark if extra measures towards data quality, further augmentation, further hyperparameter tuning and deeper changes of the network structure. The interpretation of missing infered keypoints on unseen data can in some specific cases (4.9) be seen as the impact of deformations that should be detected, but produced unusable data to further infer on. Therefore, high regard must be given to improve direct keypoint inference in the best way possible to reduce the likelihood of missing crucial keypoints that will later be used for deformity detection and fish health status classification. In that case, parameters such as detection thresholds could be changes as to incorporate at least some less qualitative predictions to work from and possibly disregard later. Due to very few sample inferences actually containing a usable number of keypoints, the analysis of the distribution of fish measurements and ratios for classification of fish health status will thusly have to wait until the keypoint model is stable enough to predict higher numbers of keypoints to be used in measuring.

### Cod keypoint detection ###
.......
TODO

## Conclusion ##

The experiments have overall shown that YOLOv7 lends itsself to be adapted to anatomical keypoint analysis of other species than humans. The AP and other metrics are currently below SOTA, but work well enough to attest the trained network a) a correct and b) an efficient functionality. It surpasses the project-internal current AP of ca. 0.5 in Mask R-CNN and could, with further tuning and likely a higher numbers of training images, hold up to SOTA standards.

Some work remains to be done on the pipeline prior to it training specific keypoints. Especially for real-time analysis an inference, it is of the highest importance, that single fish are pre-pared for the pipeline from larger images containing a high number of fish at once. Therefore, semantic segmentation or mask-based instance segmentation are techniques that should be applied beforehand. As discussed prior, the classification of fish health status by where a single specimen’s form markers (jaw, ratios, length/width etc.) fall within a Gaussian distribution and how many standard deviations they deviate from the marker mean, will have to wait for a better inference model with a higher PCK, else the sample size would be to small to have robust results.

Futhermore, stereo imaging techniqes, in which double images with slight local variation are taken by stereo camera systems, have the advantage of making it possible to measure real-world distances in images through calibration. Therefore, the introduction of stereo imaging to the project will
result in fish measurements being able to be calculated on the fly to not only control for ratios of one fish, but also detect general sizes of specimen within the population. Real-time video inference is another task towards which this system has been building, but has not yet accomplished it yet. In further work, both the segmentation pipeline and the real-time inference could be combined to complete the proposed whole system of aquaculture monitoring to strive towards a cost-effective system for enabling better animal welfare, lower loss in fish and ultimately
higher product output. In wildlife monitoring, the same results could be achieved for the task of estimating biomass.

## References ##

[1] Norwegian Seafood Industry Statistics: An Overview | Northern Delights. url: https : / /
northerndelights.com/editorial/norwegian-seafood-industry-statistics/.

[2] Fiskevelferd | Institute of Marine Research. url: https://www.hi.no/en/hi/temasider/
aquaculture/fish-welfare.

[3] Deadly fish disease discovered at Norwegian salmon farm - Fish Farmer Magazine. url:
https://www.fishfarmermagazine.com/news/deadly- fish- disease- discovered- at-
norwegian-salmon-farm/.

[4] Disease in Fish Farms — Aquatic Life Institute. url: https://ali.fish/blog/disease-
in-fish-farms-and-the-effects-on-fish.

[5] Hamish D. Rodger, Louise Henry, and Susan O. Mitchell. “Non-infectious gill disorders of
marine salmonid fish.” In: Reviews in Fish Biology and Fisheries 21.3 (Sept. 2011), pp. 423–
440. issn: 09603166. doi: 10.1007/S11160-010-9182-6. url: https://www.researchgate.
net/publication/225450608_Non- infectious_gill_disorders_of_marine_salmonid_
fish.

[6] Disturbing New Footage Shows Diseased, Deformed Salmon in B.C. Fish Farms | The Narwhal.
url: https:/ /thenarwhal.ca/ disturbing- new- footage- shows- diseased- deformed-
salmon-b-c-fish-farms/.

[7] Hussam El Din Mohamed et al. “MSR-YOLO: Method to Enhance Fish Detection and Track-
ing in Fish Farms.” In: Procedia Computer Science 170 (2020), pp. 539–546. issn: 18770509.
doi: 10.1016/J.PROCS.2020.03.123.

[8] Espen Stausland Kalhagen and Ørjan Langøy Olsen. “Hierarchical Fish Species Detection in
Real-Time Video Using YOLO.” In: 70 (2020). url: https://uia.brage.unit.no/uia-
xmlui/handle/11250/2683060.

[9] Wenkai Wang, Bingwei He, and Liwei Zhang. “High-Accuracy Real-Time Fish Detection Based
on Self-Build Dataset and RIRD-YOLOv3.” In: Complexity 2021 (2021). issn: 10990526. doi:
10.1155/2021/4761670.

[10] Chuang Yu et al. “Segmentation and measurement scheme for fish morphological features
based on Mask R-CNN.” In: Information Processing in Agriculture 7.4 (Dec. 2020), pp. 523–
534. issn: 2214-3173. doi: 10.1016/J.INPA.2020.01.002.

[11] Abdullah Al Muksit et al. “YOLO-Fish: A robust fish detection model to detect fish in realistic
underwater environment.” In: Ecological Informatics 72 (Dec. 2022), p. 101847. issn: 1574-
9541. doi: 10.1016/J.ECOINF.2022.101847.

[12] Learning Delicate Local Representations for Multi-Person Pose Estimation | Papers With
Code. url: https://paperswithcode.com/paper/learning-delicate-local-representations-for.

[13] Kaiming He et al. “Mask R-CNN.” In: IEEE Transactions on Pattern Analysis and Machine
Intelligence 42.2 (Mar. 2017), pp. 386–397. issn: 19393539. doi: 10 . 1109 / TPAMI . 2018 .
2844175. url: https://arxiv.org/abs/1703.06870v3.

[14] Feiyang Suo et al. “Fish Keypoints Detection for Ecology Monitoring Based on Underwater
Visual Intelligence.” In: 16th IEEE International Conference on Control, Automation, Robotics
and Vision, ICARCV 2020 (Dec. 2020), pp. 542–547. doi: 10.1109/ICARCV50220.2020.
9305424.

[15] Guang Chen, Peng Sun, and Yi Shang. “Automatic fish classification system using deep learn-
ing.” In: Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
2017-November (June 2018), pp. 24–29. issn: 10823409. doi: 10.1109/ICTAI.2017.00016.

[16] Joseph Redmon et al. “You Only Look Once: Unified, Real-Time Object Detection.” In: Pro-
ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion 2016-December (June 2015), pp. 779–788. issn: 10636919. doi: 10.1109/CVPR.2016.91.
url: https://arxiv.org/abs/1506.02640v5.

[17] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. “YOLOv7: Trainable bag-
of-freebies sets new state-of-the-art for real-time object detectors.” In: (July 2022). url:
https://arxiv.org/abs/2207.02696v1.

[18] Debapriya Maji et al. “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using
Object Keypoint Similarity Loss.” In: IEEE Computer Society Conference on Computer Vision
and Pattern Recognition Workshops 2022-June (Apr. 2022), pp. 2636–2645. issn: 21607516.
doi: 10.1109/CVPRW56347.2022.00297. url: https://arxiv.org/abs/2204.06806v1.

[19] Juan R Terven and Diana M Cordova-Esparaza. “A COMPREHENSIVE REVIEW OF YOLO:
FROM YOLOV1 TO YOLOV8 AND BEYOND UNDER REVIEW IN ACM COMPUTING
SURVEYS.” In: (2023).

[20] Alaa Eldin Eissa. “Clinical and Laboratory Manual of Fish Diseases.” In: (). url: https:
//www.researchgate.net/publication/301302575_Clinical_and_Laboratory_Manual_
of_Fish_Diseases.

[21] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and
Tracking. 2018. url: https://github..

[22] Tewodros Legesse Munea et al. “The Progress of Human Pose Estimation: A Survey and
Taxonomy of Models Applied in 2D Human Pose Estimation.” In: IEEE Access 8 (2020),
pp. 133330–133348. issn: 21693536. doi: 10.1109/ACCESS.2020.3010248.

[23] COCO Benchmark (Keypoint Detection) | Papers With Code. url: https://paperswithcode.
com/sota/keypoint-detection-on-coco.
15