<a href="https://colab.research.google.com/github/MMathisLab/DeepLabCut/blob/main/examples/COLAB/COLAB_YOURDATA_SuperAnimal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DeepLabCut Model Zoo: SuperAnimal models

![alt text](https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1616492373700-PGOAC72IOB6AUE47VTJX/ke17ZwdGBToddI8pDm48kB8JrdUaZR-OSkKLqWQPp_YUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYwL8IeDg6_3B-BRuF4nNrNcQkVuAT7tdErd0wQFEGFSnBqyW03PFN2MN6T6ry5cmXqqA9xITfsbVGDrg_goIDasRCalqV8R3606BuxERAtDaQ/modelzoo.png?format=1000w)

# 🦄 SuperAnimal in DeepLabCut PyTorch! 🔥

This notebook demos how to use our SuperAnimal models within DeepLabCut 3.0! Please read more in [Ye et al. Nature Communications 2024](https://www.nature.com/articles/s41467-024-48792-2) about the available SuperAnimal models, and follow along below!

### **Let's get going: install the latest version of DeepLabCut into COLAB:**

*Also, be sure you are connected to a GPU: go to menu, click Runtime > Change Runtime Type > select "GPU"*


In [None]:
!pip install git+https://github.com/DeepLabCut/DeepLabCut.git@mwm/humanbody


**PLEASE, click "restart runtime" from the output above before proceeding!**

In [None]:
import os
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image

import deeplabcut
import deeplabcut.utils.auxiliaryfunctions as auxiliaryfunctions
from deeplabcut.pose_estimation_pytorch.apis import (
    superanimal_analyze_images,
)
from deeplabcut.modelzoo import build_weight_init
from deeplabcut.modelzoo.utils import (
    create_conversion_table,
    read_conversion_table_from_csv,
)
from deeplabcut.modelzoo.video_inference import video_inference_superanimal
from deeplabcut.utils.pseudo_label import keypoint_matching

## Zero-shot Image & Video Inference
SuperAnimal models are foundation animal pose models. They can be used for zero-shot predictions without further training on the data.
In this section, we show how to use SuperAnimal models to predict pose from images (given an image folder) and output the predicted images (with pose) into another destination folder.

### Zero-shot image inference

If you have a single Image you want to test, upload it here!

#### Upload the images you want to predict

In [None]:
from google.colab import files

uploaded = files.upload()
for filepath, content in uploaded.items():
    print(f"User uploaded file '{filepath}' with length {len(content)} bytes")
image_path = os.path.abspath(filepath)
image_name = os.path.splitext(image_path)[0]

# If this cell fails (e.g., when using Safari in place of Google Chrome),
# manually upload your video via the Files menu to the left
# and define `image_path` yourself with right click > copy path on the image:
#
# image_path = "/path/to/my/image.png"
# image_name = os.path.splitext(image_path)[0]

#### Select a SuperAnimal name and corresponding model architecture

Check Our Docs on [SuperAnimals](https://github.com/DeepLabCut/DeepLabCut/blob/main/docs/ModelZoo.md) to learn more!

In [None]:
# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_topviewmouse" #@param ["superanimal_topviewmouse", "superanimal_quadruped"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

# @markdown ---
# @markdown What is the maximum number of animals you expect to have in an image
max_individuals = 3  # @param {type:"slider", min:1, max:30, step:1}

In [None]:
# Note you need to enter max_individuals correctly to get the correct number of predictions in the image.
_ = superanimal_analyze_images(
    superanimal_name,
    model_name,
    detector_name,
    image_path,
    max_individuals,
    out_folder="/content/",
)

### Zero-shot Video Inference

This can be done with or without video adaptation (faster, but not self-supervised fine-tuned on your data!).

#### Upload a video you want to predict

In [18]:
from google.colab import files

uploaded = files.upload()
for filepath, content in uploaded.items():
    print(f"User uploaded file '{filepath}' with length {len(content)} bytes")
video_path = os.path.abspath(filepath)
video_name = os.path.splitext(video_path)[0]

# If this cell fails (e.g., when using Safari in place of Google Chrome),
# manually upload your video via the Files menu to the left
# and define `video_path` yourself with right click > copy path on the video.

Saving jasper.mov to jasper.mov
User uploaded file 'jasper.mov' with length 3471938 bytes


#### Choose the superanimal and the model name

In [21]:
# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_quadruped" #@param ["superanimal_topviewmouse", "superanimal_quadruped", "superanimal_superbird", "superanimal_humanbody"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50", "rtmpose_x"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

# @markdown ---
# @markdown What is the maximum number of animals you expect to have in an image
max_individuals = 3  # @param {type:"slider", min:1, max:30, step:1}

#### Zero-shot Video Inference without video adaptation

The labeled video (and pose predictions for the video) are saved in `"/content/"`, with the labeled video name being `{your_video_name}_superanimal_{superanimal_name}_hrnetw32_labeled.mp4`.

In [22]:
from pickle import FALSE
_ = video_inference_superanimal(
    videos=video_path,
    superanimal_name=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    pcutoff=0.4,
    video_adapt=False,
    max_individuals=max_individuals,
    dest_folder="/content/",
)

Running video inference on /content/jasper.mov with superanimal_quadruped_hrnet_w32
Using pytorch for model hrnet_w32
Loading.... superanimal_quadruped_fasterrcnn_resnet50_fpn_v2


(…)_quadruped_fasterrcnn_resnet50_fpn_v2.pt:   0%|          | 0.00/173M [00:00<?, ?B/s]

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_v2_coco-dd69338a.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_v2_coco-dd69338a.pth
100%|██████████| 167M/167M [00:00<00:00, 178MB/s]


Processing video /content/jasper.mov
Starting to analyze /content/jasper.mov
Video metadata: 
  Overall # of frames:    108
  Duration of video [s]:  3.60
  fps:                    30.0
  resolution:             w=720, h=1280

Running detector with batch size 1


Detector:   0%|          | 0/108 [00:00<?, ?it/s]

DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:   2%|▏         | 2/108 [00:00<00:29,  3.57it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0


Detector:   3%|▎         | 3/108 [00:00<00:24,  4.37it/s]

DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:   5%|▍         | 5/108 [00:01<00:19,  5.26it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:   6%|▋         | 7/108 [00:01<00:17,  5.63it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:   8%|▊         | 9/108 [00:01<00:16,  5.84it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  10%|█         | 11/108 [00:02<00:16,  5.90it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  12%|█▏        | 13/108 [00:02<00:15,  5.95it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  14%|█▍        | 15/108 [00:02<00:15,  5.99it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  16%|█▌        | 17/108 [00:03<00:15,  5.95it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  18%|█▊        | 19/108 [00:03<00:14,  5.94it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  19%|█▉        | 21/108 [00:03<00:14,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  21%|██▏       | 23/108 [00:04<00:14,  5.84it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format


Detector:  22%|██▏       | 24/108 [00:04<00:14,  5.74it/s]

DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  24%|██▍       | 26/108 [00:04<00:14,  5.65it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format


Detector:  25%|██▌       | 27/108 [00:04<00:14,  5.64it/s]

DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  27%|██▋       | 29/108 [00:05<00:14,  5.53it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format


Detector:  28%|██▊       | 30/108 [00:05<00:14,  5.35it/s]

DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)


Detector:  29%|██▊       | 31/108 [00:05<00:14,  5.35it/s]

DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  31%|███       | 33/108 [00:05<00:13,  5.47it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)


Detector:  31%|███▏      | 34/108 [00:06<00:13,  5.59it/s]

DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  33%|███▎      | 36/108 [00:06<00:12,  5.77it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  35%|███▌      | 38/108 [00:06<00:11,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  37%|███▋      | 40/108 [00:07<00:11,  5.92it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  39%|███▉      | 42/108 [00:07<00:11,  5.94it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  41%|████      | 44/108 [00:07<00:10,  6.03it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  43%|████▎     | 46/108 [00:08<00:10,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  44%|████▍     | 48/108 [00:08<00:10,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  46%|████▋     | 50/108 [00:08<00:09,  6.01it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  48%|████▊     | 52/108 [00:09<00:09,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  50%|█████     | 54/108 [00:09<00:08,  6.01it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  52%|█████▏    | 56/108 [00:09<00:08,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  54%|█████▎    | 58/108 [00:10<00:08,  5.99it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  56%|█████▌    | 60/108 [00:10<00:08,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  57%|█████▋    | 62/108 [00:10<00:07,  5.99it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  59%|█████▉    | 64/108 [00:11<00:07,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  61%|██████    | 66/108 [00:11<00:07,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  63%|██████▎   | 68/108 [00:11<00:06,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  65%|██████▍   | 70/108 [00:12<00:06,  6.00it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  67%|██████▋   | 72/108 [00:12<00:06,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  69%|██████▊   | 74/108 [00:12<00:05,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  70%|███████   | 76/108 [00:13<00:05,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  72%|███████▏  | 78/108 [00:13<00:05,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  74%|███████▍  | 80/108 [00:13<00:04,  5.99it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  76%|███████▌  | 82/108 [00:14<00:04,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  78%|███████▊  | 84/108 [00:14<00:04,  5.98it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  80%|███████▉  | 86/108 [00:14<00:03,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  81%|████████▏ | 88/108 [00:15<00:03,  5.97it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  83%|████████▎ | 90/108 [00:15<00:03,  5.93it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  85%|████████▌ | 92/108 [00:15<00:02,  5.93it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  87%|████████▋ | 94/108 [00:16<00:02,  5.88it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0


Detector:  88%|████████▊ | 95/108 [00:16<00:02,  5.72it/s]

DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  90%|████████▉ | 97/108 [00:16<00:01,  5.66it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format


Detector:  91%|█████████ | 98/108 [00:16<00:01,  5.64it/s]

DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  93%|█████████▎| 100/108 [00:17<00:01,  5.73it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0


Detector:  94%|█████████▎| 101/108 [00:17<00:01,  5.60it/s]

DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  95%|█████████▌| 103/108 [00:17<00:00,  5.61it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)


Detector:  96%|█████████▋| 104/108 [00:17<00:00,  5.59it/s]

DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector:  98%|█████████▊| 106/108 [00:18<00:00,  5.65it/s]

DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])


Detector: 100%|██████████| 108/108 [00:18<00:00,  5.79it/s]


DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
DEBUG: Device: cuda:0
DEBUG: Model threshold: 0.9
DEBUG: Processing image 0
DEBUG: Image size: (720, 1280)
DEBUG: Using transforms preprocessing
DEBUG: Batch tensor shape: torch.Size([3, 1280, 720])
DEBUG: Raw predictions type: <class 'dict'>
DEBUG: Raw predictions keys: dict_keys([])
DEBUG: Unexpected output format
Running pose prediction with batch size 1


Pose: 100%|██████████| 108/108 [00:01<00:00, 76.68it/s]


Saving results to /content/


ValueError: No objects were detected in the video. This can happen if:
1. The video doesn't contain the type of objects the model was trained to detect
2. The objects are too small, blurry, or occluded
3. The detector confidence threshold is too high
4. The video quality is poor

Try:
- Using a different video with clearer objects
- Adjusting the detector confidence threshold
- Checking if the model is appropriate for your use case

#### Zero-shot Video Inference with video adaptation (unsupervised)

The labeled video (and pose predictions for the video) are saved in `"/content/"`, with the labeled video name being `{your_video_name}_superanimal_{superanimal_name}_hrnetw32_labeled_after_adapt.mp4`.

In [None]:
_ = video_inference_superanimal(
    videos=[video_path],
    superanimal_name=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    video_adapt=True,
    max_individuals=max_individuals,
    pseudo_threshold=0.1,
    bbox_threshold=0.9,
    detector_epochs=1,
    pose_epochs=1,
    dest_folder="/content/"
)

## Training with SuperAnimal

In this section, we compare different ways to train models in DeepLabCut 3.0, with or without using SuperAnimal-pretrained models.
You can compare the evaluation results and get a sense of each baseline. We have following baselines:

- ImageNet transfer learning (training without superanimal)
- SuperAnimal transfer learning (baseline 1)
- SuperAnimal naive fine-tuning (baseline 2)
- SuperAnimal memory-replay fine-tuning (baseline3)

This is done on one of your DeepLabCut projects! If you don't have a DeepLabCut project that you can use SuperAnimal models with, you can always using the example openfield dataset [available in the DeepLabCut repository](https://github.com/DeepLabCut/DeepLabCut/tree/main/examples/openfield-Pranav-2018-10-30) or the Tri-Mouse dataset available on [Zenodo](https://zenodo.org/records/5851157).

### Preparing the DeepLabCut Project

First, place your DeepLabCut project folder into you google drive! "i.e. move the folder named "Project-YourName-TheDate" into Google Drive.

In [None]:
# Now, let's link to your GoogleDrive. Run this cell and follow the
# authorization instructions:

from google.colab import drive
drive.mount('/content/drive')

You will need to edit the project path in the config.yaml file to be set to your Google Drive link!

Typically, this will be in the format: `/content/drive/MyDrive/yourProjectFolderName`. You can obtain this path by going to the file navigator in the left pane, finding your DeepLabCut project folder, clicking on the vertical `...` next to the folder name and selecting "Copy path".

If the `drive` folder is not immediately visible after mounting the drive, refresh the available files!

In [None]:
# TODO: Update the `project_path` to be the path of your DeepLabCut project!
project_path = Path("/content/drive/MyDrive/my-project-2024-07-17")
config_path = str(project_path / "config.yaml")

Then, use the panel below to select the appropriate SuperAnimal model for your project (don't forget to run the cell)!

In [None]:
# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_topviewmouse" #@param ["superanimal_topviewmouse", "superanimal_quadruped"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

### Comparison between different training baselines


Definition of data split: the unique combination of training images and testing images.
We create a data split named split 0. All baselines will share the data split to make fair comparisons.
- split 0 -> shared by all baselines
- shuffle 0 (split0) -> imagenet transfer learning
- shuffle 1 (split0) -> superanimal transfer learning
- shuffle 2 (split0) -> superanimal naive fine-tuning
- shuffle 3 (split0) -> superanimal memory-replay fine-tuning

### What is the difference between baselines?

**Transfer learning** For canonical task-agnostic transfer learning,
the encoder learns universal visual features from a large pre-training dataset, and a randomly
initialized decoder is used to learn the pose from the downstream dataset.

**Fine-tuning** For task aware
fine-tuning, both encoder and decoder learn task-related visual-pose features
in the pre-training datasets, and the decoder is fine-tuned to update pose
priors in downstream datasets. Crucially, the network has pose-estimation-specific
weights

**ImageNet transfer-learning** The encoder was pre-trained from ImageNet. The decoder is trained from scratch in the downstream tasks

**SuperAnimal transfer-learning** The encoder was pre-trained first from ImageNet, then in pose datasets we colleceted. Then decoder is trained from scratch in downstream tasks.

**SuperAnimal naive fine-tuning** Both the encoder and the decoder were pre-trained in pose datasets we collected. In downstream datasets, we only finetune convolutional channels that correspond to the annotated keypoints in the downstream datasets. This introduces catastrophic forgetting in keypoints that are not annotated in the downstream datasets.

**SuperAnimal memory-replay fine-tuning** If we apply fine-tuning with SuperAnimal without further cares, the models will forget about keypoints that are not annotated in the downstream datasets. To mitigate this, we mix the annotations and zero-shot predictions of SuperAnimal models to create a dataset that 'replays' the memory of the SuperAnimal keypoints.




In [None]:
imagenet_transfer_learning_shuffle = 0
superanimal_transfer_learning_shuffle = 1
superanimal_naive_finetune_shuffle = 2
superanimal_memory_replay_shuffle = 3

In [None]:
deeplabcut.create_training_dataset(
    config_path,
    Shuffles=[imagenet_transfer_learning_shuffle],
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    engine=deeplabcut.Engine.PYTORCH,
    userfeedback=False,
)

### ImageNet transfer learning

Historically, the transfer learning using ImageNet weights strategies assumed no “animal pose task priors” in the pretrained
model, a paradigm adopted from previous task-agnostic transfer learning.

You can change the number of epochs you want to train for. How long training will take depends on many parameters, including the number of images in your dataset, the resolution of the images, and the number of epochs you train for.

In [None]:
# Note we skip the detector training to save time.
# For Top-Down models, the evaluation is by default using ground-truth bounding
#  boxes. But to train a model that can be used to inference videos and images,
#  you have to set detector_epochs > 0.

deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=imagenet_transfer_learning_shuffle,
)

Now let's evaluate the performance of our trained models.

In [None]:
deeplabcut.evaluate_network(config_path, Shuffles=[imagenet_transfer_learning_shuffle])

### Transfer learning with SuperAnimal weights

First, we prepare training shuffle for transfer-learning with SuperAnimal weights. As we've already create a shuffle with a train/test split that we want to reuse, we use `deeplabcut.create_training_dataset_from_existing_split` to keep the same train/test indices as in the ImageNet transfer learning shuffle.

We specify that we want to initialize the model weights with the selected SuperAnimal model, but without keeping the decoding layers (this is called transfer learning)!



In [None]:
weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path),
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=False,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_transfer_learning_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

Then, we launch the training for transfer-learning with SuperAnimal weights.

In [None]:
deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_transfer_learning_shuffle,
)

Finally, we evaluate the model obtained by transfer-learning with SuperAnimal weights.

In [None]:
deeplabcut.evaluate_network(config_path, Shuffles=[superanimal_transfer_learning_shuffle])

### Fine-tuning with SuperAnimal (without keeping full SuperAnimal keypoints)

#### Setup the weight init and dataset

First we do keypoint matching. This steps make it possible to understand the correspondence between the existing annotations and SuperAnimal annotations. This step produces 3 outputs
- The confusion matrix
- The conversion table
- Pseudo predictions over the whole dataset

#### What is keypoint matching?

Because SuperAnimal models have their pre-defined keypoints that are potentially different from your annotations, we proposed this algorithm to minimize the gap between the model and the dataset. We use our model to perform zero-shot inference on the whole dataset. This gives pairs of predictions and ground truth for every image. Then, we cast the matching between models’ predictions (2D coordinates)
and ground truth as bipartitematching using the Euclidean distance as the cost between paired of keypoints. We then solve the matching using the Hungarian algorithm. Thus for every image, we end up getting a matching matrix where 1 counts formatch and 0 counts for non-matching. Because the models’ predictions can be noisy from image to image, we average the aforementioned matching matrix across all the images and perform another bipartite matching, resulting in the final keypoint conversion table between the model and the dataset. Note that the quality of thematching will impact the performance
of the model, especially for zero-shot. In the case where, e.g., the annotation nose is mistakenly converted to keypoint tail and vice versa, the model will have to unlearn the channel that corresponds to nose and tail (see also case study in Mathis et al.).

In [None]:
keypoint_matching(
    config_path,
    superanimal_name,
    model_name,
    detector_name,
    copy_images=True,
)

conversion_table_path = project_path / "memory_replay" / "conversion_table.csv"
confusion_matrix_path = project_path / "memory_replay" / "confusion_matrix.png"

# You can visualize the pseudo predictions, or do pose embedding clustering etc.
pseudo_prediction_path = project_path / "memory_replay" / "pseudo_predictions.json"

#### Display the confusion matrix

The x axis lists the keypoints in the existing annotations. The y axis lists the keypoints in SuperAnimal keypoint space. Darker color encodes stronger correspondence between the human annotation and SuperAnimal annotations.

In [None]:
confusion_matrix_image = Image.open(confusion_matrix_path)

plt.imshow(confusion_matrix_image)
plt.axis('off')  # Hide the axes for better view
plt.show()

#### Display the conversion table
The gt columns represents the keypoint names in the existing dataset. The MasterName represents the corresponding keypoints in SuperAnimal keypoint space.

In [None]:
df = pd.read_csv(conversion_table_path)
df = df.dropna()

df

#### Adding the Conversion Table to your project's `config.yaml` file

Once you've run keypoint matching, you can add the conversion table to your project's `config.yaml` file, and edit it if there are some matches you think are wrong. As an example, for a top-view mouse dataset with 4 bodyparts labeled (`'snout', 'leftear', 'rightear', 'tailbase'`), the conversion table mapping project bodyparts to SuperAnimal bodyparts would be added as:

```yaml
# Conversion tables to fine-tune SuperAnimal weights
SuperAnimalConversionTables:
  superanimal_topviewmouse:
    snout: nose
    leftear: left_ear
    rightear: right_ear
    tailbase: tail_base
```


In [None]:
create_conversion_table(
    config=config_path,
    super_animal=superanimal_name,
    project_to_super_animal=read_conversion_table_from_csv(
        conversion_table_path
    ),
)

#### Prepare the training shuffle and weight initialization for (naive) fine-tuning with SuperAnimal weights

Then, when you call `build_weight_init` with `with_decoder=True`, the conversion table in your project's `config.yaml` is used to get predictions for the correct bodyparts.

In [None]:
weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path),
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=True,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_naive_finetune_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

#### Launch the training for (naive) fine-tuning with SuperAnimal

In [None]:
deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_naive_finetune_shuffle,
)

#### Evaluate the model obtained by (naive) fine-tuning with SuperAnimal

In [None]:
deeplabcut.evaluate_network(
    config_path,
    Shuffles=[superanimal_naive_finetune_shuffle],
)

### Memory-replay fine-tuning with SuperAnimal (keeping full SuperAnimal keypoints)

**Catastrophic forgetting** describes a
classic problemin continual learning. Indeed, amodel gradually loses
its ability to solve previous tasks after it learns to solve new ones.
Fine-tuning a SuperAnimal models falls into the category of continual
learning: the downstream dataset defines potentially different
keypoints than those learned by the models. Thus, the models might
forget the keypoints they learned and only pick up those defined in the
target dataset. Here, retraining with the original dataset and the new
one, is not a feasible option as datasets cannot be easily shared and
more computational resources would be required.
To counter that, we treat zero-shot inference of the model as a
memory buffer that stores knowledge from the original model. When
we fine-tune a SuperAnimal model, we replace the model predicted
keypoints with the ground-truth annotations, resulting in hybrid
learning of old and new knowledge. The quality of the zero-shot predictions
can vary and we use the confidence of prediction (0.7) as a
threshold to filter out low-confidence predictions. With the threshold
set to 1, memory replay fine-tuning becomes naive-fine-tuning.

#### Prepare training shuffle and weight initialization for memory-replay finetuning with SuperAnimal

In [None]:
weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path),
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=True,
    memory_replay=True,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_memory_replay_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

#### Launch the training for memory-replay fine-tuning with SuperAnimal

In [None]:
deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_memory_replay_shuffle,
)

#### Evaluate the model obtained by memory-replay finetuning with SuperAnimal

In [None]:
deeplabcut.evaluate_network(config_path, Shuffles=[superanimal_memory_replay_shuffle])