# OpenPose PyTorch Implementation
**Notebook Authors:** 
- **Segato Pietro** (2122209)  
- **Vezzosi Giacomo** (2104369)  
- **Vitali Giovanni** (2119998)

# Introduction
This notebook provides an implementation of OpenPose using PyTorch. The goal is to perform human pose estimation using OpenPose architecture over VITONHD images. <br>
The code is adapted from the GitHub repository by Hzzone ([pytorch-openpose](https://github.com/Hzzone/pytorch-openpose)). It is based on CMU-Perceptual-Computing-Lab work ([CMU-OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose))

## Execution Environment  
This notebook is designed to be executed on **Google Colab** to ensure compatibility with the necessary dependencies and GPU acceleration. Before running the code, ensure that the runtime environment is set to **GPU** (Runtime → Change runtime type → GPU).

# Repository and dependecies

In [1]:
# Repository Cloning
!git clone https://github.com/Hzzone/pytorch-openpose.git
%cd pytorch-openpose

Cloning into 'pytorch-openpose'...
remote: Enumerating objects: 154, done.[K
remote: Counting objects: 100% (154/154), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 154 (delta 69), reused 152 (delta 67), pack-reused 0 (from 0)[K
Receiving objects: 100% (154/154), 20.18 MiB | 17.02 MiB/s, done.
Resolving deltas: 100% (69/69), done.
/content/pytorch-openpose


In [2]:
# Install dependencies
!pip install -r requirements.txt
!pip install torch torchvision numpy opencv-python matplotlib

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [3]:
import os
import sys
import gdown
import cv2
import torch
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter
import json
from src import util
from src.util import *
from src.body import Body
from src.hand import Hand
from google.colab import files
from src.model import handpose_model
import zipfile
import shutil
from skimage.metrics import structural_similarity as ssim

  from scipy.ndimage.filters import gaussian_filter


In [4]:
# Download weights from gdrive
folder_id = "1JsvI4M4ZTg98fmnCZLFM-3TeovnCRElG"

destination_folder = "model"
os.makedirs(destination_folder, exist_ok=True)

!gdown --folder --id {folder_id} -O {destination_folder}

Retrieving folder contents
Processing file 1j7yJMFsMR96EnkLhzbm_tB_rbdM26I8p body_pose_deploy.prototxt
Processing file 1EULkcH_hhSU28qVc1jSJpCh2hGOrzpjK body_pose_model.pth
Processing file 1XU6nNcnH5xmUFgnvYRYVxU65JtC90C1S body_pose.caffemodel
Processing file 1x1a7v7N_yH9k54as0eQfYfFllrmq2KJB hand_pose_deploy.prototxt
Processing file 1yVyIsOD32Mq28EHrVVlZbISDN7Icgaxw hand_pose_model.pth
Processing file 1HkHfIma4uLaYT8im1h7zp_x1vwb4Lzi8 hand_pose.caffemodel
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1j7yJMFsMR96EnkLhzbm_tB_rbdM26I8p
To: /content/pytorch-openpose/model/body_pose_deploy.prototxt
100% 46.4k/46.4k [00:00<00:00, 65.5MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1EULkcH_hhSU28qVc1jSJpCh2hGOrzpjK
From (redirected): https://drive.google.com/uc?id=1EULkcH_hhSU28qVc1jSJpCh2hGOrzpjK&confirm=t&uuid=95066a14-c1e6-4d42-840e-09026b362314
To: /content

In [5]:
%cd /content/pytorch-openpose

/content/pytorch-openpose


## Dataset
The dataset is crucial for training and evaluation. Here, we handle dataset loading, preprocessing, and any necessary transformations to ensure compatibility with the OpenPose model.


In [6]:
files.upload() #NOTE: a proprietary Kaggle API Key is to be uploaded here

!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d tinkukalluri/zalando-hd-resized

Saving kaggle.json to kaggle.json
Dataset URL: https://www.kaggle.com/datasets/tinkukalluri/zalando-hd-resized
License(s): MIT
Downloading zalando-hd-resized.zip to /content/pytorch-openpose
100% 4.54G/4.54G [04:05<00:00, 22.8MB/s]
100% 4.54G/4.54G [04:05<00:00, 19.8MB/s]


In [7]:
# Paths
zip_path = '/content/pytorch-openpose/zalando-hd-resized.zip'
images_dest = '/content/pytorch-openpose/datasets/VITONHD/image'
pose_dest = '/content/pytorch-openpose/datasets/VITONHD/openpose_img'
json_dest = '/content/pytorch-openpose/datasets/VITONHD/openpose_json'

os.makedirs(images_dest, exist_ok=True)
os.makedirs(pose_dest, exist_ok=True)
os.makedirs(json_dest, exist_ok=True)

# Open zip file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    for file in zip_ref.namelist():
        # Getting images
        if file.startswith('test/image/') and (file.endswith('.jpg') or file.endswith('.png')):
            dest_file = os.path.join(images_dest, os.path.basename(file))
            with zip_ref.open(file) as source, open(dest_file, 'wb') as target:
                shutil.copyfileobj(source, target)

        # getting original pose image files
        elif file.startswith('test/openpose_img/') and (file.endswith('.jpg') or file.endswith('.png')):
            dest_file = os.path.join(pose_dest, os.path.basename(file))
            with zip_ref.open(file) as source, open(dest_file, 'wb') as target:
                shutil.copyfileobj(source, target)

        # getting original JSON files
        elif file.startswith('test/openpose_json/') and file.endswith('.json'):
            # Costruisci il percorso di destinazione
            dest_file = os.path.join(json_dest, os.path.basename(file))
            with zip_ref.open(file) as source, open(dest_file, 'wb') as target:
                shutil.copyfileobj(source, target)

print("Images ok")
print("Pose imgs ok")
print("json files ok")

Images ok
Pose imgs ok
json files ok


## Utilities
This section includes helper functions for data handling, visualization, and model utilities. With this function we modify and provide essential tools for preprocessing and postprocessing. In particular, we edited or designed functions to compute: bounding boxes and compute OpenPose model for hands is defined, json convertion function to write the final JSON file, and functions to draw the pose including hands. 

In [8]:
def get_bounding_boxes(candidate, subset, oriImg):
  ratioWristElbow = 0.33
  detect_result = []
  image_height = oriImg.shape[0]
  image_width = oriImg.shape[1]
  for person in subset.astype(int):
      has_left = np.sum(person[[5, 6, 7]] == -1) == 0
      has_right = np.sum(person[[2, 3, 4]] == -1) == 0
      if not (has_left or has_right):
          continue
      # if any of three not detected
      if has_left:
          left_shoulder_index, left_elbow_index, left_wrist_index = person[[5, 6, 7]]
          # pos_hand = pos_wrist + ratio * (pos_wrist - pos_elbox) = (1 + ratio) * pos_wrist - ratio * pos_elbox
          # handRectangle.x = posePtr[wrist*3] + ratioWristElbow * (posePtr[wrist*3] - posePtr[elbow*3]);
          # handRectangle.y = posePtr[wrist*3+1] + ratioWristElbow * (posePtr[wrist*3+1] - posePtr[elbow*3+1]);
          # const auto distanceWristElbow = getDistance(poseKeypoints, person, wrist, elbow);
          # const auto distanceElbowShoulder = getDistance(poseKeypoints, person, elbow, shoulder);
          # handRectangle.width = 1.5f * fastMax(distanceWristElbow, 0.9f * distanceElbowShoulder);
          x1, y1 = candidate[left_shoulder_index][:2]
          x2, y2 = candidate[left_elbow_index][:2]
          x3, y3 = candidate[left_wrist_index][:2]

          # cv2.line(canvas, (int(x2), int(y2)), (int(x3), int(y3)), colors[0], thickness=4)
          # cv2.line(canvas, (int(x2), int(y2)), (int(x1), int(y1)), colors[1], thickness=4)

          # cv2.circle(canvas, (int(x1), int(y1)), 4, colors[2], thickness=-1)
          # cv2.circle(canvas, (int(x2), int(y2)), 4, colors[3], thickness=-1)
          # cv2.circle(canvas, (int(x3), int(y3)), 4, colors[4], thickness=-1)

          x = x3 + ratioWristElbow*(x3-x2)
          y = y3 + ratioWristElbow*(y3-y2)
          distanceWristElbow = math.sqrt((x3-x2)**2+(y3-y2)**2)
          distanceElbowShoulder = math.sqrt((x2-x1)**2+(y2-y1)**2)
          width = 1.5*max(distanceWristElbow, 0.9*distanceElbowShoulder)
          # x-y refers to the center --> offset to topLeft point
          # handRectangle.x -= handRectangle.width / 2.f;
          # handRectangle.y -= handRectangle.height / 2.f;
          x -= width/2
          y -= width/2 # width = height
          is_left = True
          if x<0: x=0
          if y<0: y=0
          width1 = width
          width2 = width
          if x+width>image_width: width1=image_width-x
          if y+width>image_height: width2=image_height-y
          width = min(width1, width2)
          # detect_result.append([int(x), int(y), int(width), is_left])
          detect_result.append([int(x), int(y), int(x+width), int(y+width)])

      # right hand
      if has_right:
          right_shoulder_index, right_elbow_index, right_wrist_index = person[[2, 3, 4]]
          x1, y1 = candidate[right_shoulder_index][:2]
          x2, y2 = candidate[right_elbow_index][:2]
          x3, y3 = candidate[right_wrist_index][:2]

          # cv2.line(canvas, (int(x2), int(y2)), (int(x3), int(y3)), colors[2], thickness=4)
          # cv2.line(canvas, (int(x2), int(y2)), (int(x1), int(y1)), colors[3], thickness=4)

          # cv2.circle(canvas, (int(x1), int(y1)), 4, colors[2], thickness=-1)
          # cv2.circle(canvas, (int(x2), int(y2)), 4, colors[3], thickness=-1)
          # cv2.circle(canvas, (int(x3), int(y3)), 4, colors[4], thickness=-1)

          x = x3 + ratioWristElbow*(x3-x2)
          y = y3 + ratioWristElbow*(y3-y2)
          distanceWristElbow = math.sqrt((x3-x2)**2+(y3-y2)**2)
          distanceElbowShoulder = math.sqrt((x2-x1)**2+(y2-y1)**2)
          width = 1.5*max(distanceWristElbow, 0.9*distanceElbowShoulder)
          x -= width/2
          y -= width/2 # width = height
          is_left = False
          if x<0: x=0
          if y<0: y=0
          width1 = width
          width2 = width
          if x+width>image_width: width1=image_width-x
          if y+width>image_height: width2=image_height-y
          width = min(width1, width2)
          # detect_result.append([int(x), int(y), int(width), is_left])
          detect_result.append([int(x), int(y), int(x+width), int(y+width)])

  return detect_result


In [9]:
def process_hand(image, bbox):
    x1, y1, x2, y2 = bbox
    hand_img = image[y1:y2, x1:x2].copy()

    # Preprocessing
    scale_search = [0.5, 1.0, 1.5, 2.0]
    boxsize = 368
    stride = 8
    padValue = 128
    thre = 0.05
    multiplier = [x * boxsize / hand_img.shape[0] for x in scale_search]
    heatmap_avg = np.zeros((hand_img.shape[0], hand_img.shape[1], 22))

    for scale in multiplier:
        img_resized = cv2.resize(hand_img, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
        img_padded, pad = util.padRightDownCorner(img_resized, stride, padValue)

        img_input = np.transpose(np.float32(img_padded[:, :, :, np.newaxis]), (3, 2, 0, 1)) / 256 - 0.5
        img_input = np.ascontiguousarray(img_input)

        data = torch.from_numpy(img_input).float()
        if torch.cuda.is_available():
            data = data.cuda()
            model.cuda()

        with torch.no_grad():
            output = model(data).cpu().numpy()

        heatmap = np.transpose(np.squeeze(output), (1, 2, 0))
        heatmap = cv2.resize(heatmap, (hand_img.shape[1], hand_img.shape[0]), interpolation=cv2.INTER_CUBIC)
        heatmap_avg += heatmap / len(multiplier)

    # Estrazione dei keypoints
    keypoints = []
    for i in range(heatmap_avg.shape[2]):
        map_ori = heatmap_avg[:, :, i]
        map_smoothed = gaussian_filter(map_ori, sigma=3)
        y, x = np.unravel_index(np.argmax(map_smoothed), map_smoothed.shape)
        confidence = map_smoothed[y, x]
        if confidence > thre:
            keypoints.append((x + x1, y + y1, confidence))  # Riporta alle coordinate originali
    return keypoints

In [10]:
def convert_to_json(candidate, subset, all_hand_keypoints, output_file):
    people = []

    for person in subset:
        pose_keypoints_2d = []

        # Body keypoints extraction
        for i in range(18):  # OpenPose with 18 keypoints
            index = int(person[i])
            if index == -1:
                pose_keypoints_2d.extend([0, 0, 0])  # placeholders
            else:
                x, y, score = candidate[index][:3]
                pose_keypoints_2d.extend([x, y, score])

        # Compatibility modification
        right_hip_idx = (24, 25, 26)
        left_hip_idx = (33, 34, 35)
        if pose_keypoints_2d[right_hip_idx[0]]!=0 or pose_keypoints_2d[right_hip_idx[1]]!=0:
            if pose_keypoints_2d[left_hip_idx[0]]!=0 and pose_keypoints_2d[left_hip_idx[1]]!=0:
                mid_hip_x = (pose_keypoints_2d[right_hip_idx[0]] + pose_keypoints_2d[left_hip_idx[0]]) / 2
                mid_hip_y = (pose_keypoints_2d[right_hip_idx[1]] + pose_keypoints_2d[left_hip_idx[1]]) / 2
                mid_hip_conf = (pose_keypoints_2d[right_hip_idx[2]] + pose_keypoints_2d[left_hip_idx[2]]) / 2
                pose_keypoints_2d.insert(24, mid_hip_x)
                pose_keypoints_2d.insert(25, mid_hip_y)
                pose_keypoints_2d.insert(26, mid_hip_conf)

        for i in range(6):
          pose_keypoints_2d.extend([0, 0, 0])

        hand_left_keypoints_2d = []
        hand_right_keypoints_2d = []

        if all_hand_keypoints is not None:
            # Hand keypoints check
            for idx, hand in enumerate(all_hand_keypoints):
              if idx==0 and len(hand)>0:
                hand_left_keypoints_2d = [[int(x), int(y), float(conf)] for x, y, conf in hand]
                if person[7]!=-1: #if left wrist is present
                    hand_left_keypoints_2d[0] = [pose_keypoints_2d[21], pose_keypoints_2d[22], pose_keypoints_2d[23]]
              else:
                if len(hand)>0:
                  hand_right_keypoints_2d = [[int(x), int(y), float(conf)] for x, y, conf in hand]
                  if person[4]!=-1: #if right wrist is present
                      hand_right_keypoints_2d[0] = [pose_keypoints_2d[12], pose_keypoints_2d[13], pose_keypoints_2d[14]]


        # gathering data
        person_data = {
            "person_id": [-1],  # Placeholder
            "pose_keypoints_2d": pose_keypoints_2d,
            "face_keypoints_2d": [],  # Empty
            "hand_left_keypoints_2d": hand_left_keypoints_2d,
            "hand_right_keypoints_2d": hand_right_keypoints_2d,
        }

        people.append(person_data)

    data = {
        "version": 1.3,
        "people": people
    }

    with open(output_file, "w") as f:
        json.dump(data, f, indent=4)

In [11]:
def extract_keypoints(data):
    person = data["people"][0]  # We assume just one person as VITONHD dataset
    pose_keypoints = np.array(person["pose_keypoints_2d"]).reshape(-1, 3)  # (x, y, conf)
    left_hand_keypoints = np.array(person["hand_left_keypoints_2d"])
    right_hand_keypoints = np.array(person["hand_right_keypoints_2d"])

    return pose_keypoints, left_hand_keypoints, right_hand_keypoints

In [12]:
limbSeq_body25 = [
    [1, 8], [1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7],
    [8, 9], [9, 10], [10, 11], [8, 12], [12, 13], [13, 14],
    [1, 0], [0, 15], [15, 17], [0, 16], [16, 18],
    [11, 24], [14, 21], [8, 24], [8, 21]
]

limbSeq_hand = [
    [0, 1], [1, 2], [2, 3], [3, 4], # Thumb
    [0, 5], [5, 6], [6, 7], [7, 8], # Index
    [0, 9], [9, 10], [10, 11], [11, 12], # Middle
    [0, 13], [13, 14], [14, 15], [15, 16], # Ring
    [0, 17], [17, 18], [18, 19], [19, 20] # Pinky
]

In [13]:
custom_colors = {
    0: (0, 0, 255),      # Corpo, connessione [1,8] (red)
    1: (0, 140, 255),    # Corpo, connessione [1,2] (darkorange)
    2: (0, 255, 127),    # Corpo, connessione [1,5] (chartreuse)
    3: (0, 165, 255),    # Corpo, connessione [2,3] (light orange)
    4: (0, 255, 255),    # Corpo, connessione [3,4] (yellow)
    5: (0, 255, 0),      # Corpo, connessione [5,6] (lime)
    6: (0, 255, 0),      # Corpo, connessione [6,7] (lime)
    7: (0, 255, 0),      # Corpo, connessione [8,9] (lime)
    8: (113, 179, 60),   # Corpo, connessione [9,10] (mediumspringgreen)
    9: (204, 209, 72),   # Corpo, connessione [10,11] (mediumturquoise)
    10: (255, 144, 30),  # Corpo, connessione [8,12] (dodgerblue)
    11: (205, 0, 0),     # Corpo, connessione [12,13] (mediumblue)
    12: (255, 0, 0),     # Corpo, connessione [13,14] (blue)
    13: (60, 60, 220),   # Corpo, connessione [1,0] (crimson)
    14: (147, 20, 255),  # Corpo, connessione [0,15] (deeppink)
    15: (255, 0, 255),   # Corpo, connessione [15,17] (magenta)
    16: (204, 50, 153),  # Corpo, connessione [0,16] (darkorchid)
    17: (255, 0, 0),     # Corpo, connessione [16,18] (blue)
    18: (204, 209, 72),  # Corpo, connessione [11,24] (mediumturquoise)
    19: (255, 0, 0),     # Corpo, connessione [14,21] (blue)
    20: (255, 144, 30),  # Corpo, connessione [8,24] (dodgerblue)
    21: (255, 144, 30),  # Corpo, connessione [8,21] (dodgerblue)
}

In [14]:
def draw_pose_ellipses(img, keypoints, limbSeq, is_hand=False, alpha_keypoints=0.6, alpha_hands=0.6, thickness=-1):

    num_edges = len(limbSeq)
    overlay_hands = img.copy()  # Hand Overlay
    overlay_keypoints = img.copy()  # Keypoints overlay

    # keypoint colour dictionary
    keypoint_colors = {}

    for i, (start, end) in enumerate(limbSeq):
      if (len(keypoints)-1)>=start and (len(keypoints)-1)>=end:
        if keypoints[start][2] > 0.1 and keypoints[end][2] > 0.1:
            x1, y1 = int(keypoints[start][0]), int(keypoints[start][1])
            x2, y2 = int(keypoints[end][0]), int(keypoints[end][1])

            center = ((x1 + x2) // 2, (y1 + y2) // 2)
            length = int(np.hypot(x2 - x1, y2 - y1) / 2)
            angle = np.degrees(np.arctan2(y2 - y1, x2 - x1))

            color = custom_colors.get(i, (255, 255, 255))

            # Ellipses
            axes = (length, 5)
            if is_hand:
                cv2.ellipse(overlay_hands, center, axes, angle, 0, 360, color, thickness)
            else:
                cv2.ellipse(img, center, axes, angle, 0, 360, color, thickness)

            # keypoints colour start and end
            keypoint_colors[start] = color
            keypoint_colors[end] = color

    # Draw keypoints
    for i, (x, y, conf) in enumerate(keypoints):
        if conf > 0.1 and i in keypoint_colors:
            cv2.circle(overlay_keypoints, (int(x), int(y)), 6, keypoint_colors[i], -1)

    # Keypoints alpha
    cv2.addWeighted(overlay_keypoints, alpha_keypoints, img, 1 - alpha_keypoints, 0, img)

    # Alpha for hands, not body
    if is_hand:
        cv2.addWeighted(overlay_hands, alpha_hands, img, 1 - alpha_hands, 0, img)

    return img

In [15]:
def plot_pose(json_path):
    with open(json_path, "r") as f:
        data = json.load(f)

    img = np.zeros((1024, 768, 3), dtype=np.uint8)  # Sfondo nero
    pose_kp, left_hand_kp, right_hand_kp = extract_keypoints(data)

    # Corpo (senza trasparenza nei segmenti)
    draw_pose_ellipses(img, pose_kp, limbSeq_body25, alpha_keypoints=0.6, alpha_hands=0.6, thickness=-1)

    # Mani (con trasparenza nei segmenti)
    if len(left_hand_kp) > 0:
        draw_pose_ellipses(img, left_hand_kp, limbSeq_hand, is_hand=True, alpha_keypoints=0.6, alpha_hands=0.4, thickness=-1)
    if len(right_hand_kp) > 0:
        draw_pose_ellipses(img, right_hand_kp, limbSeq_hand, is_hand=True, alpha_keypoints=0.6, alpha_hands=0.4, thickness=-1)

    cv2_imshow(img)

# Pose Estimation

In [16]:
# Body model
body_estimation = Body('model/body_pose_model.pth')
# Hands model
model = handpose_model()
model_dict = torch.load('model/hand_pose_model.pth')
model.load_state_dict(util.transfer(model, model_dict))
model.eval()

  model_dict = util.transfer(self.model, torch.load(model_path))
  model_dict = torch.load('model/hand_pose_model.pth')


handpose_model(
  (model1_0): Sequential(
    (conv1_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv1_1): ReLU(inplace=True)
    (conv1_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv1_2): ReLU(inplace=True)
    (pool1_stage1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv2_1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv2_1): ReLU(inplace=True)
    (conv2_2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv2_2): ReLU(inplace=True)
    (pool2_stage1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3_1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv3_1): ReLU(inplace=True)
    (conv3_2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu_conv3_2): ReLU(inplace=True)
    (conv3_3): Conv2d(256, 256, kernel_siz

In [17]:
dest_path = "results/openpose_json"
os.makedirs(dest_path, exist_ok=True)

In [18]:
def pose_estimation(img_path, dest_path, indices=None):

    if not os.path.exists(dest_path):
        os.makedirs(dest_path)

    img_files = sorted(os.listdir(img_path))
    img_files = [f for f in img_files if f.lower().endswith(('png', 'jpg', 'jpeg'))]

    if indices is not None:
        img_files = [img_files[i] for i in indices if 0 <= i < len(img_files)]

    for img_name in img_files:
        img_file = os.path.join(img_path, img_name)
        oriImg = cv2.imread(img_file)
        if oriImg is None:
            print(f"Error loading {img_file}")
            continue

        test_image = cv2.cvtColor(oriImg, cv2.COLOR_BGR2RGB)
        candidate, subset = body_estimation(test_image)

        thresh = 0.6
        if subset[0][4] != -1:
          if candidate[int(subset[0][4])][2] < thresh:
            subset[0][4] = -1

        if subset[0][7] != -1:
          if candidate[int(subset[0][7])][2] < thresh:
            subset[0][7] = -1

        hand_bboxes = get_bounding_boxes(candidate, subset, oriImg)
        all_hand_keypoints = []
        for bbox in hand_bboxes:
            keypoints = process_hand(oriImg, bbox)
            all_hand_keypoints.append(keypoints)

        output_file = os.path.join(dest_path, f"{os.path.splitext(img_name)[0]}_keypoints.json")
        convert_to_json(candidate, subset, all_hand_keypoints, output_file)
        print(f"Saved: {output_file}")

In [19]:
pose_estimation("datasets/VITONHD/image", dest_path, [3])

Saved: results/openpose_json/00017_00_keypoints.json


In [20]:
dest_path_img = "results/openpose_img"
os.makedirs(dest_path_img, exist_ok=True)

In [21]:
def render_poses(json_path, dest_path):

    json_files = [f for f in os.listdir(json_path) if f.endswith('.json')]

    for json_name in json_files:
        json_path_ = os.path.join(json_path, json_name)
        with open(json_path_, "r") as f:
            data = json.load(f)

        img = np.zeros((1024, 768, 3), dtype=np.uint8)  # Black BG
        pose_kp, left_hand_kp, right_hand_kp = extract_keypoints(data)

        draw_pose_ellipses(img, pose_kp, limbSeq_body25, alpha_keypoints=0.6, alpha_hands=0.6, thickness=-1)

        if len(left_hand_kp) > 0:
            draw_pose_ellipses(img, left_hand_kp, limbSeq_hand, is_hand=True, alpha_keypoints=0.6, alpha_hands=0.4, thickness=-1)
        if len(right_hand_kp) > 0:
            draw_pose_ellipses(img, right_hand_kp, limbSeq_hand, is_hand=True, alpha_keypoints=0.6, alpha_hands=0.4, thickness=-1)

        output_img_path = os.path.join(dest_path, f"{os.path.splitext(json_name)[0].replace('_keypoints', '')}_rendered.png")
        cv2.imwrite(output_img_path, img)
        print(f"Salvato: {output_img_path}")

In [22]:
render_poses("results/openpose_json", dest_path_img)

Salvato: results/openpose_img/00017_00_rendered.png


## Metrics
To evaluate the model’s performance, we compute several metrics that quantify the quality of pose estimation. <br><br>
Key Evaluation Metrics:
- **(relative) Percentage of Correct Keypoints (PCK)**: Measures the accuracy of keypoint localization within a given threshold with respect to VITONHD keypoints.
- **(relative) Mean Per Joint Position Error (MPJPE)**:  It measures the average distance between the predicted joints of a human skeleton and the "ground truth" computed keypoints of VITONHD.
- **Structural Similarity Index Measure (SSIM)**.

These metrics help assess how well the model generalizes to different poses.


In [23]:
def compute_mpjpe(kp1, kp2):
    kp1, kp2 = np.array(kp1), np.array(kp2)
    return np.mean(np.linalg.norm(kp1 - kp2, axis=1))

def compute_pck(kp1, kp2, threshold=0.1):
    kp1, kp2 = np.array(kp1), np.array(kp2)
    distances = np.linalg.norm(kp1 - kp2, axis=1)
    torso_diameter = np.linalg.norm(kp1[1] - kp1[8])
    return np.mean(distances < threshold * torso_diameter)

def compute_ssi(img1_path, img2_path):
    img1 = cv2.imread(img1_path, cv2.IMREAD_GRAYSCALE)
    img2 = cv2.imread(img2_path, cv2.IMREAD_GRAYSCALE)
    return ssim(img1, img2)

In [24]:
original_json_path = "datasets/VITONHD/openpose_json"
generated_json_path = "results/openpose_json"
original_img_path = "datasets/VITONHD/openpose_img"
generated_img_path = "results/openpose_img"

In [25]:
def evaluate_pose_metrics(original_json_path, generated_json_path, original_img_path, generated_img_path, indices=[i for i in range(100)]):

    json_files = sorted(os.listdir(original_json_path))
    img_files = sorted(os.listdir(original_img_path))

    if indices is not None:
        json_files = [json_files[i] for i in indices if 0 <= i < len(json_files)]
        img_files = [img_files[i] for i in indices if 0 <= i < len(img_files)]

    mpjpe_scores = []
    pck_scores = []
    ssi_scores = []

    for json_name, img_name in zip(json_files, img_files):
        original_json = os.path.join(original_json_path, json_name)
        generated_json = os.path.join(generated_json_path, json_name)
        original_img = os.path.join(original_img_path, img_name)
        generated_img = os.path.join(generated_img_path, img_name)

        with open(original_json, "r") as f:
            original_data = json.load(f)
        with open(generated_json, "r") as f:
            generated_data = json.load(f)

        original_kp, _, _ = extract_keypoints(original_data)
        generated_kp, _, _ = extract_keypoints(generated_data)

        mpjpe_scores.append(compute_mpjpe(original_kp, generated_kp))
        pck_scores.append(compute_pck(original_kp, generated_kp))
        ssi_scores.append(compute_ssi(original_img, generated_img))

    return {
        "MPJPE": np.mean(mpjpe_scores),
        "PCK": np.mean(pck_scores),
        "SSI": np.mean(ssi_scores)
    }


In [None]:
evaluate_pose_metrics(original_json_path, generated_json_path, original_img_path, generated_img_path)

{'MPJPE': 37.555420009904005,
 'PCK': 0.9455999999999999,
 'SSI': 0.933126973148265}