## Description
---
Estimate the 3D pose of a person given their 2D pose. Turns out, just regressing the 3D pose coordinates using the 2D pose works pretty well
(you can find the paper [here](https://arxiv.org/pdf/1705.03098.pdf)).

The 2D pose of a person is represented as a set of 2D coordinates for each of their $n = 32$ joints
i.e $P^{2D}_{i }$= {$(x^{1}_{i}, y^{1}_{i}), ...,(x^{32}_{i}, y^{32}_{i})$}, where $(x^{j}_{i}, y^{j}_{i})$ are the 2D coordinates of the $j$’th joint of the i’th sample. Similarly, the 3D pose of a person is 
$P^{3D}_{i}$ = {$(x^{1}_{i}, y^{1}_{i}, z^{1}_{i}), ...,(x^{32}_{i}, y^{32}_{i}, z^{32}_{i})$}, 
where $(x^{j}_{i}, y^{j}_{i}, z^{j}_{i})$ are the 3D coordinates of the $j$’th joint of the $i$’th sample. The only data available is the ground
truth 3D pose and the 2D pose calculated using the camera parameters. 
We are going to train a network $f_{θ} : R^{2n} → R^{3n}$ that takes as input the $P^{2D}_{i}$ and tries to regress the ground truth 3D pose $P^{3D}_{i}$. The loss function to train this network would be the $L2$ loss between the ground truth and the predicted pose
\begin{equation*}
L(\theta) = \sum^{M}_{i=1}(P^{3D}_{i} - f_{\theta}(P^{2D}_{i}))^{2} ;\;\;\;\;\;\;\;\; \text{for a minibatch of size M} \;\;\;\;\;\;\;\; (2)
\end{equation*}

Download the Human3.6M Dataset [here](https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip).

In [None]:
from __future__ import print_function, absolute_import, division

import os
import sys
import time
from pprint import pprint
import numpy as np

import torch
import torch.nn as nn
import torch.optim
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from torch.autograd import Variable

# Locate to the src folder to import the following functions.
from procrustes import get_transformation
import data_process as data_process
import data_utils
import progress.progress.bar as pBar
import utils as utils
import misc as misc
import log as log
import cameras

from pykalman import KalmanFilter
from sklearn.metrics import mean_squared_error 
import matplotlib.pyplot as plt
from torch.utils.data import Dataset

In [None]:
# Define actions
actions = data_utils.define_actions("All")

In [None]:
# Load camera parameters
SUBJECT_IDS = [1,5,6,7,8,9,11]
cameras_path = '../data/h36m/cameras.h5'
rcams = cameras.load_cameras(cameras_path, SUBJECT_IDS)

In [None]:
# Load data
data_dir = '../data/h36m/'
camera_frame = True
predict_14 = False
# Load 3d data and load (or create) 2d projections
train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data(
    actions, data_dir, camera_frame, rcams, predict_14 )

Reading subject 1, action Directions
../data/h36m/S1/MyPoses/3D_positions/Directions*.h5
../data/h36m/S1/MyPoses/3D_positions/Directions.h5
../data/h36m/S1/MyPoses/3D_positions/Directions 1.h5
Reading subject 1, action Discussion
../data/h36m/S1/MyPoses/3D_positions/Discussion*.h5
../data/h36m/S1/MyPoses/3D_positions/Discussion 1.h5
../data/h36m/S1/MyPoses/3D_positions/Discussion.h5
Reading subject 1, action Eating
../data/h36m/S1/MyPoses/3D_positions/Eating*.h5
../data/h36m/S1/MyPoses/3D_positions/Eating 2.h5
../data/h36m/S1/MyPoses/3D_positions/Eating.h5
Reading subject 1, action Greeting
../data/h36m/S1/MyPoses/3D_positions/Greeting*.h5
../data/h36m/S1/MyPoses/3D_positions/Greeting.h5
../data/h36m/S1/MyPoses/3D_positions/Greeting 1.h5
Reading subject 1, action Phoning
../data/h36m/S1/MyPoses/3D_positions/Phoning*.h5
../data/h36m/S1/MyPoses/3D_positions/Phoning 1.h5
../data/h36m/S1/MyPoses/3D_positions/Phoning.h5
Reading subject 1, action Photo
../data/h36m/S1/MyPoses/3D_positions/Ph

  complete_train = copy.deepcopy( np.vstack( train_set.values() ))


In [None]:
# Read stacked hourglass 2D predictions if use_sh, otherwise use groundtruth 2D projections
use_sh = False
if use_sh:
    train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, data_dir)
else:
    train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, data_dir, rcams )
print( "done reading and normalizing data." )

stat_3d = {}
stat_3d['mean'] = data_mean_3d
stat_3d['std'] = data_std_3d
stat_3d['dim_use'] = dim_to_use_3d

Reading subject 1, action Directions
../data/h36m/S1/MyPoses/3D_positions/Directions*.h5
../data/h36m/S1/MyPoses/3D_positions/Directions.h5
../data/h36m/S1/MyPoses/3D_positions/Directions 1.h5
Reading subject 1, action Discussion
../data/h36m/S1/MyPoses/3D_positions/Discussion*.h5
../data/h36m/S1/MyPoses/3D_positions/Discussion 1.h5
../data/h36m/S1/MyPoses/3D_positions/Discussion.h5
Reading subject 1, action Eating
../data/h36m/S1/MyPoses/3D_positions/Eating*.h5
../data/h36m/S1/MyPoses/3D_positions/Eating 2.h5
../data/h36m/S1/MyPoses/3D_positions/Eating.h5
Reading subject 1, action Greeting
../data/h36m/S1/MyPoses/3D_positions/Greeting*.h5
../data/h36m/S1/MyPoses/3D_positions/Greeting.h5
../data/h36m/S1/MyPoses/3D_positions/Greeting 1.h5
Reading subject 1, action Phoning
../data/h36m/S1/MyPoses/3D_positions/Phoning*.h5
../data/h36m/S1/MyPoses/3D_positions/Phoning 1.h5
../data/h36m/S1/MyPoses/3D_positions/Phoning.h5
Reading subject 1, action Photo
../data/h36m/S1/MyPoses/3D_positions/Ph

  complete_train = copy.deepcopy( np.vstack( train_set.values() ))


done reading and normalizing data.


In [None]:
# ============================
#   Define Train/Test Methods
# ============================
def train(train_loader, model, criterion, optimizer,
          lr_init=None, lr_now=None, glob_step=None, lr_decay=None, gamma=None,
          max_norm=True):

    start = time.time()
    losses = utils.AverageMeter()
    # For training
    model.train()
    # Iterate over the points
    for i, (inp_x, train_y) in enumerate(train_loader):
        glob_step += 1
        # Learning rate decay
        if glob_step % lr_decay == 0 or glob_step == 1:
            lr_now = utils.lr_decay(optimizer, glob_step, lr_init, lr_decay, gamma)
        train_x = Variable(inp_x.cuda())
        actual_y = Variable(train_y.cuda(async=True))
        # Obtaining predictions
        predicted_y = model(train_x)
        # Calculating the loss value
        optimizer.zero_grad()
        loss_val = criterion(predicted_y, actual_y)
        losses.update(loss_val.item(), train_x.size(0))
        # Backward propogation
        loss_val.backward()
        
        # To prevent exploding gradients
        if max_norm:
            nn.utils.clip_grad_norm_(model.parameters(), max_norm=1)
        optimizer.step()

    return glob_step, lr_now, losses.avg, time.time()-start

# Given function
def test(test_loader, model, criterion, stat_3d, procrustes=False):
    losses = utils.AverageMeter()

    model.eval()

    all_dist = []
    start = time.time()
    batch_time = 0
    # bar = pBar.Bar('>>>', fill='>', max=len(test_loader))

    for i, (inps, tars) in enumerate(test_loader):
        inputs = Variable(inps.cuda())
        # inputs = Variable()

        targets = Variable(tars.cuda(async=True))
        # targets = Variable()

        outputs = model(inputs)

        # calculate loss
        outputs_coord = outputs
        loss = criterion(outputs_coord, targets)

        losses.update(loss.item(), inputs.size(0))

        tars = targets

        # calculate erruracy
        targets_unnorm = data_process.unNormalizeData(tars.data.cpu().numpy(), stat_3d['mean'], stat_3d['std'], stat_3d['dim_use'])
        outputs_unnorm = data_process.unNormalizeData(outputs.data.cpu().numpy(), stat_3d['mean'], stat_3d['std'], stat_3d['dim_use'])

        # remove dim ignored
        dim_use = np.hstack((np.arange(3), stat_3d['dim_use']))

        outputs_use = outputs_unnorm[:, dim_use]
        targets_use = targets_unnorm[:, dim_use]

        if procrustes:
            for ba in range(inps.size(0)):
                gt = targets_use[ba].reshape(-1, 3)
                out = outputs_use[ba].reshape(-1, 3)
                _, Z, T, b, c = get_transformation(gt, out, True)
                out = (b * out.dot(T)) + c
                outputs_use[ba, :] = out.reshape(1, 51)

        sqerr = (outputs_use - targets_use) ** 2

        distance = np.zeros((sqerr.shape[0], 17))
        dist_idx = 0
        for k in np.arange(0, 17 * 3, 3):
            distance[:, dist_idx] = np.sqrt(np.sum(sqerr[:, k:k + 3], axis=1))
            dist_idx += 1
        all_dist.append(distance)

        # update summary
        if (i + 1) % 100 == 0:
            batch_time = time.time() - start
            start = time.time()

        # bar.suffix = '({batch}/{size}) | batch: {batchtime:.4}ms | Total: {ttl} | ETA: {eta:} | loss: {loss:.6f}' \
        #     .format(batch=i + 1,
        #             size=len(test_loader),
        #             batchtime=batch_time * 10.0,
        #             ttl=bar.elapsed_td,
        #             eta=bar.eta_td,
        #             loss=losses.avg)
        # bar.next()

    all_dist = np.vstack(all_dist)
    joint_err = np.mean(all_dist, axis=0)
    ttl_err = np.mean(all_dist)
    # bar.finish()
    print (">>> error: {} <<<".format(ttl_err))
    return losses.avg, ttl_err

In [None]:
# ==================
#   Dataset class
# ==================
TRAIN_SUBJECTS = [1, 5, 6, 7, 8]
TEST_SUBJECTS = [9, 11]

class Human36M(Dataset):
    # def __init__(self, actions, data_path, use_hg=True, is_train=True):
    def __init__(self, actions, train_set_2d, test_set_2d, train_set_3d, test_set_3d, use_hg=True, is_train=True):
        """
        :param actions: list of actions to use
        :param data_path: path to dataset
        :param use_hg: use stacked hourglass detections
        :param is_train: load train/test dataset
        """

        # Initialize variables
        self.actions = actions
        self.is_train = is_train
        self.train_inp, self.train_out, self.test_inp, self.test_out = [], [], [], []

        # # loading data
        # self.data_path = data_path
        # self.use_hg = use_hg
        # self.camera_frame = True
        # self.predict_14 = False
        # if self.use_hg:
        #     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, data_path)
        # else:
        #     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, data_path, rcams )

        # # Load 3d data and load (or create) 2d projections
        # train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data(
        #     actions, data_path, camera_frame, rcams, predict_14 )

        self.train_set_2d = train_set_2d
        self.test_set_2d = test_set_2d
        self.train_set_3d = train_set_3d
        self.test_set_3d = test_set_3d
        # Train data
        if self.is_train:
            for key_in_2d in self.train_set_2d.keys():
                # if use_hg is true
                if key_in_2d[2].endswith('-sh'):
                    key_in_3d = (key_in_2d[0], key_in_2d[1], key_in_2d[2][:-3])  
                else:
                   key_in_3d = key_in_2d
                # Check for size mismatch
                if self.train_set_3d[key_in_3d].shape[0] != self.train_set_2d[key_in_2d].shape[0]:
                  assert False, 'Shapes of 3d points and 2d points did not match in training'
                # Iterate over frames and add the points
                for i in range(self.train_set_2d[key_in_2d].shape[0]):
                    self.train_inp.append(self.train_set_2d[key_in_2d][i])
                    self.train_out.append(self.train_set_3d[key_in_3d][i])
        # Test data
        else:
            for key_in_2d in self.test_set_2d.keys():
                # Select data for provided action only
                if key_in_2d[1] not in self.actions:
                    continue
                # if use_hg is true
                if key_in_2d[2].endswith('-sh'):
                    key_in_3d = (key_in_2d[0], key_in_2d[1], key_in_2d[2][:-3])  
                else:
                   key_in_3d = key_in_2d
                # Check for size mismatch
                if self.test_set_3d[key_in_3d].shape[0] != self.test_set_2d[key_in_2d].shape[0]:
                  assert False, 'Shapes of 3d points and 2d points did not match in testing'
                # Iterate over frames and add the points
                for i in range(self.test_set_2d[key_in_2d].shape[0]):
                    self.test_inp.append(self.test_set_2d[key_in_2d][i])
                    self.test_out.append(self.test_set_3d[key_in_3d][i])

    def __getitem__(self, index):
      # Return tensors according to index value
        if self.is_train:
            inputs = torch.from_numpy(self.train_inp[index]).float()
            outputs = torch.from_numpy(self.train_out[index]).float()
        else:
            inputs = torch.from_numpy(self.test_inp[index]).float()
            outputs = torch.from_numpy(self.test_out[index]).float()

        return inputs, outputs

    def __len__(self):
        if self.is_train:
            return len(self.train_inp)
        else:
            return len(self.test_inp)

In [None]:
# ==========================================
#       Define Network Architecture
# ==========================================
# Given function
def weight_init(m):
    if isinstance(m, nn.Linear):
        nn.init.kaiming_normal_(m.weight)
        
# Residual block
class ResidualBlock(nn.Module):
    def __init__(self, layer_size, p_dropout=0.5):
        super(ResidualBlock, self).__init__()
        self.layer_size = layer_size
        self.residual_model = nn.Sequential(
            nn.Linear(self.layer_size, self.layer_size),
            nn.BatchNorm1d(self.layer_size),
            nn.ReLU(inplace=True),
            nn.Dropout(p_dropout),
            nn.Linear(self.layer_size, self.layer_size),
            nn.BatchNorm1d(self.layer_size),
            nn.ReLU(inplace=True),
            nn.Dropout(p_dropout)
        )
    # Forward pass
    def forward(self, inp_):
        out_ = self.residual_model(inp_)
        out_ = inp_  + out_

        return out_
# Main model
class MainModel(nn.Module):
    def __init__(self, layer_size=1024, stages_count=2, p_dropout=0.5):
        super(MainModel, self).__init__()
        # Initialization
        self.layer_size = layer_size
        self.p_dropout = p_dropout
        self.stages_count = stages_count

        # 2d joint points
        self.input_size =  16 * 2
        # 3d joint points
        self.output_size = 16 * 3

        # increase input dimensionality to 1024
        self.linear_layer_in = nn.Linear(self.input_size, self.layer_size)
        self.batch_norm_linear_layer_in = nn.BatchNorm1d(self.layer_size)
        self.relu = nn.ReLU(inplace=True)
        self.dropout = nn.Dropout(self.p_dropout)

        self.residual_blocks = []
        for i in range(stages_count):
            self.residual_blocks.append(ResidualBlock(self.layer_size, self.p_dropout))
        self.residual_blocks = nn.ModuleList(self.residual_blocks)

        # post processing : convert to size 3n
        self.linear_layer_out = nn.Linear(self.layer_size, self.output_size)

        self.main_model = nn.Sequential(
            # increase input dimensionality to 1024
            nn.Linear(self.input_size, self.layer_size),
            nn.BatchNorm1d(self.layer_size),
            nn.ReLU(inplace=True),
            nn.Dropout(self.p_dropout),
            *self.residual_blocks,
            # post processing : convert to size 3n
            nn.Linear(self.layer_size, self.output_size)
        )
    # Forward pass
    def forward(self, in_):
        out_ = self.main_model(in_)
        return out_

In [None]:
# ==========================================
#         load dadasets for training
# ==========================================
job = 8
use_hg = False

training_batch = 64
testing_batch = 64
# Test data
test_loader = DataLoader(
        dataset=Human36M(actions, train_set_2d, test_set_2d, train_set_3d, test_set_3d, use_hg=use_hg, is_train=False),
        batch_size=testing_batch,
        shuffle=False,
        num_workers=job,
        pin_memory=True)
# Train data
train_loader = DataLoader(
    dataset=Human36M(actions, train_set_2d, test_set_2d, train_set_3d, test_set_3d, use_hg=use_hg),
    batch_size=training_batch,
    shuffle=True,
    num_workers=job,
    pin_memory=True)

print(">>> data loaded !")

>>> data loaded !


In [None]:
# ==========================================
#         Optimize/Train Network
# ==========================================

epochs = 30
start_epoch = 0
err_best = 1000
glob_step = 0
learning_rate = 0.001
learning_rate_current = 0.001
learning_rate_decay = 100000
gamma = 0.97
criterion = nn.MSELoss(size_average=True).cuda()

# Model create
model = MainModel()
model = model.cuda()
model.apply(weight_init)
# Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Run epochs
for epoch in range(start_epoch, epochs):
    start_time = time.time()

    # Training
    glob_step, learning_rate_current, loss_train, time_taken = train(train_loader, model, criterion, optimizer,
        lr_init=learning_rate, lr_now=learning_rate_current, glob_step=glob_step, lr_decay=learning_rate_decay, gamma=gamma, max_norm=True)

    # To select the best model
    print("Epoch {}. Time taken is: {:.4f} secs".format(epoch+1, time_taken))
    loss_test, err_test = test(test_loader, model, criterion, stat_3d, procrustes=True)
    # Compare the performance
    is_best = err_test < err_best
    err_best = min(err_test, err_best)

    if is_best:
      best_perf_weights = model.state_dict()
# load the best model parameters
model.load_state_dict(best_perf_weights)



Epoch 1. Time taken is: 199.5940 secs
>>> error: 43.83085583545667 <<<
Epoch 2. Time taken is: 197.7567 secs
>>> error: 41.29663178405925 <<<
Epoch 3. Time taken is: 196.6538 secs
>>> error: 40.4642063035396 <<<
Epoch 4. Time taken is: 197.0231 secs
>>> error: 40.27308246269491 <<<
Epoch 5. Time taken is: 199.9356 secs
>>> error: 39.29357517541263 <<<
Epoch 6. Time taken is: 197.7937 secs
>>> error: 38.97877314208413 <<<
Epoch 7. Time taken is: 195.7583 secs
>>> error: 38.514982825601685 <<<
Epoch 8. Time taken is: 196.7283 secs
>>> error: 37.511434401550765 <<<
Epoch 9. Time taken is: 196.4194 secs
>>> error: 38.280378677928304 <<<
Epoch 10. Time taken is: 196.7167 secs
>>> error: 38.15350118178464 <<<
Epoch 11. Time taken is: 196.5334 secs
>>> error: 37.56247854747339 <<<
Epoch 12. Time taken is: 196.5191 secs
>>> error: 37.33392317853856 <<<
Epoch 13. Time taken is: 196.3963 secs
>>> error: 37.370699314446476 <<<
Epoch 14. Time taken is: 196.3497 secs
>>> error: 38.50790925560678 <<

<All keys matched successfully>

In [None]:
# ==========================================
#            Evaluating Network
# ==========================================

err_set = []
for action in actions:
    print (">>> TEST on _{}_".format(action))
    test_loader = DataLoader(
                dataset=Human36M(action, train_set_2d, test_set_2d, train_set_3d, test_set_3d, use_hg=use_hg, is_train=False),
                batch_size=testing_batch,
                shuffle=False,
                num_workers=job,
                pin_memory=True)
    _, err_test = test(test_loader, model, criterion, stat_3d, procrustes=True)
    err_set.append(err_test)

print (">>>>>> TEST results:")
for action in actions:
    print ("{}".format(action), end='\t')
print ("\n")
for err in err_set:
    print ("{:.4f}".format(err), end='\t')
print (">>>\nERRORS: {}".format(np.array(err_set).mean()))

>>> TEST on _Directions_
>>> error: 29.793236258033954 <<<
>>> TEST on _Discussion_
>>> error: 34.64274026923888 <<<
>>> TEST on _Eating_
>>> error: 33.61134947821925 <<<
>>> TEST on _Greeting_
>>> error: 35.401170935886675 <<<
>>> TEST on _Phoning_
>>> error: 36.864767189323985 <<<
>>> TEST on _Photo_
>>> error: 41.57010801298586 <<<
>>> TEST on _Posing_
>>> error: 35.093111786082176 <<<
>>> TEST on _Purchases_
>>> error: 31.471059400744092 <<<
>>> TEST on _Sitting_
>>> error: 42.44898488232392 <<<
>>> TEST on _SittingDown_
>>> error: 44.546045196110526 <<<
>>> TEST on _Smoking_
>>> error: 37.63381668264558 <<<
>>> TEST on _Waiting_
>>> error: 36.36756976475463 <<<
>>> TEST on _WalkDog_
>>> error: 38.40662768396693 <<<
>>> TEST on _Walking_
>>> error: 30.671166692338005 <<<
>>> TEST on _WalkTogether_
>>> error: 33.19411780670144 <<<
>>>>>> TEST results:
Directions	Discussion	Eating	Greeting	Phoning	Photo	Posing	Purchases	Sitting	SittingDown	Smoking	Waiting	WalkDog	Walking	WalkTogether

# References
[1] J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3d human pose
estimation,” in ICCV, 2017.

<!--Write your report here in markdown or html-->
