# POSEIDON: Pose Estimation & Activity Recognition using GNNs

Team Members (Group 16): 
1. Chong Jun Rong Brian (A0290882U)
2. Parashara Ramesh (A0285647M)
3. Ng Wei Jie Brandon (A0184893L)

<h2><u> Table of contents </u></h2>

1. What is this project about?
<br> 1.1. Project Motivation
<br> 1.2. Project Description
<br> 1.3. Project Setup   
2. Understanding the Human 3.6M Dataset
3. Dataset preparation
4. Models
5. Baseline 1 - SimplePose (Simple ML model without using GNNs)
6. Baseline 2 - SimplePoseGNN (Simple ML model using GNNs) 
7. Improvement 1 - SemGCN model (Reimplementation of Semantic GCN)
8. Improvement 2 - PoseGCN model (Tweaks of SemGCN)
9. Evaluation & Analysis of models
10. Creating our own custom dataset
11. Evaluation on custom dataset
12. Conclusion
13. Video presentation & Resources


<h2><u>1. What is this project about?</u></h2>
<h3><u>1.1 Project Motivation</u></h3>

Accurately predicting 3D human poses from 2D keypoints is a critical task for many applications such as motion capture and activity recognition. Traditional methods that use direct regression or lifting techniques often struggle to fully capture the complex spatial relationships between body joints. By treating the 2D pose keypoints as graphs, we can leverage the underlying connectivity between joints to improve the 3D pose estimation. Additionally, recognizing and classifying human activities from these poses is an essential task in fields like surveillance and healthcare. Therefore, this project seeks to explore how GNNs can enhance 3D pose estimation and activity recognition.

<h3><u>1.2 Project Description</h3></u>

The primary objective of this project is to predict 3D human poses from 2D pose keypoints accurately using GNNs. 
* Firstly, we will develop two baseline models: one using standard Neural Network (NN) & Convolutional Neural Network (CNN) followed by a simple GNN based model both for 3d pose estimation 
* Secondly, we will reimplement the SemGCN model, which treats the body joints of a 2D pose as nodes in a graph, with edges representing the connectivity between them. 
* Finally, we will design an improved version of the SemGCN model by exploring different GNN architectures and modifications to enhance its performance.

The secondary objective is to classify human activities based on 2D pose keypoints. We will use custom datasets to validate this task, allowing us to assess the generalization capabilities of GNN-based models for activity recognition.

<h3><u>1.3 Project Setup</u></h3>

1. Install the dependencies from requirements.txt (TODO.all to fix later)


In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import os
import numpy as np
from tqdm import tqdm

<h2><u>2. Understanding the Human 3.6M Dataset</u></h2>

TODO.brandon :
- Add a small brief about the human 3.6M dataset + how we plan to use it
- add h3 tags for each subsection + make changes in table of contents + only present the story points here, main code can go to appropriate folders

EDA points will also come here as individual cell blocks but will be called as one function directly in the dataset preparation

<h2><u>3. Dataset preparation </u></h2>

TODO.parash: create it once, and then mention that the code blocks under this section need not be run as they are present in this (drive/sharepoint folder)?

<h2><u>4. Models</u></h2>

TODO.all - write a convincing story on our approach + high level thoughts on why the following models are worth building and what we hope to gain from it

In [7]:
# Simple Model based on A Simple yet effective baseline for 3D Pose Estimation
class LinearBaselineModel(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.input_linear = nn.Linear(2, 1024) # 2D input shape is B x 16 x 2
        self.block1 = nn.Sequential(
            nn.Linear(1024, 1024),
            nn.BatchNorm2d(1),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(1024, 1024),
            nn.BatchNorm2d(1),
            nn.ReLU(),
            nn.Dropout(0.5),
        )
        self.block2 = nn.Sequential(
            nn.Linear(1024, 1024),
            nn.BatchNorm2d(1),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(1024, 1024),
            nn.BatchNorm2d(1),
            nn.ReLU(),
            nn.Dropout(0.5),
        )
        self.output_linear = nn.Linear(1024, 3) # 3D output shape is B x 16 x 3
        
    def forward(self, x):
        x = self.input_linear(x)
        x = self.block1(x) + x # First Residual connection
        x = self.block2(x) + x # Second Residual Connection
        x = self.output_linear(x)
        return x
        

<h2><u>5. Baseline 1 - SimplePose (Simple ML model without using GNNs)</u></h2>

TODO.brian

In [8]:
# Dataset Class
class Human36MDataset(Dataset):
    def __init__(self, two_d_dataset_path, three_d_dataset_path, label_dataset_path):
        self.two_d_dataset_path = two_d_dataset_path
        self.three_d_dataset_path = three_d_dataset_path
        self.label_dataset_path = label_dataset_path
        self.input_data = np.load(self.two_d_dataset_path)
        self.output_data = np.load(self.three_d_dataset_path)
        self.labels = np.load(self.label_dataset_path)
        unique_labels, tags = np.unique(self.labels, return_inverse=True)
        self.labels = tags
        self.labels_map = dict(zip(range(len(unique_labels)),unique_labels))
        assert len(self.input_data) == len(self.labels) == len(self.output_data)
    
    def get_labels_map(self):
        return self.labels_map
    
    def __len__(self):
        return len(self.input_data)
    
    def __getitem__(self, index):
        return np.expand_dims(self.input_data[index], axis=0), np.expand_dims(self.output_data[index], axis=0), self.labels[index]

In [9]:

# Parameters
LEARNING_RATE = 1e-3
BATCH_SIZE = 64
NUM_EPOCHS = 200
DEVICE = 'cuda' if torch.cuda.is_available() else ('mps' if torch.backends.mps.is_available() else 'cpu') 

training_2d_dataset_path = os.path.join('datasets', 'h36m', 'Processed', 'train_2d_poses.npy')
training_3d_dataset_path = os.path.join('datasets', 'h36m', 'Processed', 'train_3d_poses.npy')
training_label_path  = os.path.join('datasets', 'h36m', 'Processed', 'train_actions.npy')
training_data = Human36MDataset(training_2d_dataset_path, training_3d_dataset_path, training_label_path)
train_dataloader = DataLoader(training_data, batch_size=BATCH_SIZE)
testing_2d_dataset_path = os.path.join('datasets', 'h36m', 'Processed', 'test_2d_poses.npy')
testing_3d_dataset_path = os.path.join('datasets', 'h36m', 'Processed', 'test_3d_poses.npy')
testing_label_path  = os.path.join('datasets', 'h36m', 'Processed', 'test_actions.npy')
testing_data = Human36MDataset(testing_2d_dataset_path, testing_3d_dataset_path, testing_label_path)
test_dataloader = DataLoader(testing_data, batch_size=BATCH_SIZE)

def kaiming_weights_init(m):
    if isinstance(m, nn.Linear):
        torch.nn.init.kaiming_normal_(m.weight)

# Declare Model
model = LinearBaselineModel().to(DEVICE)
# Apply Kaiming Init on Linear Layers
model.apply(kaiming_weights_init)

print(f"Model Parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")

# Declare Optimizer
optimizer = torch.optim.Adam(params=model.parameters(), lr=LEARNING_RATE)

# Declare Scheduler
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.96) # Value used by the original authors

# Loss Function
loss_fn = nn.MSELoss()

# Training Loop
global_losses = []
for epoch in tqdm(range(NUM_EPOCHS)):
    local_losses = []
    for data in tqdm(train_dataloader):
        two_d_input_data, three_d_output_data, _= data
        two_d_input_data = two_d_input_data.to(DEVICE)
        three_d_output_data = three_d_output_data.to(DEVICE)
        optimizer.zero_grad()
        predicted_3d_outputs = model(two_d_input_data)
        loss = loss_fn(predicted_3d_outputs, three_d_output_data)
        local_losses.append(loss)
        loss.backward()
        optimizer.step()
        scheduler.step()
    
    current_loss = sum(local_losses) / len(train_dataloader)
    global_losses.append(current_loss)
    print(f"Loss: {current_loss}")
    
# Save model
state_dict = {
    'optimizer': optimizer.state_dict(),
    'model': model.state_dict(),
    'scheduler': scheduler.state_dict(),
}
weight_save_path = os.path.join('weights', 'linear_baseline_model')
if not os.path.exists(weight_save_path):
    os.makedirs(weight_save_path)

torch.save(state_dict, os.path.join(weight_save_path, 'weights.pth'))

Model Parameters: 4204555


100%|██████████| 24372/24372 [02:02<00:00, 198.51it/s]
100%|██████████| 1/1 [02:03<00:00, 123.59s/it]

Loss: 0.8471620082855225





<h2><u>6. Baseline 2 - SimplePoseGNN (Simple ML model using GNNs)</u></h2>

TODO.brandon

<h2><u>7. Improvement 1 - SemGCN model (Reimplementation of Semantic GCN)</u></h2>

TODO.parash


<h2><u>8. Improvement 2 - PoseGCN model (Tweaks of SemGCN)</u></h2>

TODO.all

<h2><u>9. Evaluation & Analysis of models<u></h2>

<h2><u>10. Creating our own custom dataset</u></h2>

<h2><u>11. Evaluation on custom dataset</u></h2>

<h2><u>12. Conclusion</u></h2>

<h2><u>13. Video presentation & Resources</u></h2>