# Converting Data for Visualization

Although we've managed to extract a few examples of both dabs and tposes, it's now time to figure out what our data looks like. 

The easiest way to manipulate and visualize data in Python is via tools like Pandas and Seaborn. 

But first, we'll need to convert our numpy raw arrays into something that's a bit more readable. So let's do that by converting them into labeled CSV files.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
dabDataset = np.load('data/dabs.npy')
tposeDataset = np.load('data/tposes.npy')
otherDataset = np.load('data/other.npy')

In [3]:
dabDataset[0]

array([[5.8832416e+02, 2.9433704e+02, 7.2265184e-01],
       [5.8239331e+02, 3.5126093e+02, 8.0205584e-01],
       [5.0984329e+02, 3.4919385e+02, 7.5316119e-01],
       [4.1784265e+02, 3.1985785e+02, 8.1164622e-01],
       [3.6101605e+02, 2.9243521e+02, 8.0296052e-01],
       [6.5091376e+02, 3.6097537e+02, 6.4161348e-01],
       [6.3724268e+02, 2.7274924e+02, 7.8188539e-01],
       [4.9614203e+02, 2.4154723e+02, 8.3243752e-01],
       [5.4315808e+02, 6.4114813e+02, 4.4807938e-01],
       [4.8636816e+02, 6.2938318e+02, 3.6906898e-01],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [6.0191382e+02, 6.4702966e+02, 3.8946095e-01],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
       [5.7648334e+02, 2.7475522e+02, 6.1822432e-01],
       [6.0389270e+02, 2.8454663e+02, 4.1854110e-01],
       [5.5686536e+02, 2.6891223e+02, 2.7014270e-01],
       [6.1959991e+02, 2.924

In [4]:
dabDataset[0].shape

(25, 3)

# Adding our Labels

Our labels come from the [BODY_25 Pose Output format](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/output.md#pose-output-format-body_25) available at the repo. 

We can tell because when we looked at each of our poses, we saw a `dataset[0].shape` of 25. This matches the number of labels below.

In [5]:
labels = ["Nose", "Neck", "RShoulder", "RElbow", "RWrist", "LShoulder", "LElbow",
 "LWrist", "MidHip", "RHip", "RKnee", "RAnkle", "LHip", "LKnee", "LAnkle",
 "REye", "LEye", "REar", "LEar", "LBigToe", "LSmallToe", "LHeel", "RBigToe",
 "RSmallToe", "RHeel", "Background"]

Each of our labels comes as an `X`, `Y`, and `Confidence`. Let's add those labels and flatten this array for our CSV file:

In [6]:
properLabels = []
for label in labels:
    properLabels.append(label + 'X')
    properLabels.append(label + 'Y')
    properLabels.append(label + 'Confidence')

In [7]:
import csv

with open('data/dabs.csv', 'w+') as dabcsv:
    dabwriter = csv.writer(dabcsv, delimiter=',')
    dabwriter.writerow(properLabels)
    for cell in dabDataset:
        dabwriter.writerow(cell.flatten())
        
with open('data/tposes.csv', 'w+') as tposecsv:
    tposewriter = csv.writer(tposecsv, delimiter=',')
    tposewriter.writerow(properLabels)
    for cell in tposeDataset:
        tposewriter.writerow(cell.flatten())
        
with open('data/other.csv', 'w+') as othercsv:
    otherwriter = csv.writer(othercsv, delimiter=',')
    otherwriter.writerow(properLabels)
    for cell in otherDataset:
        otherwriter.writerow(cell.flatten())

## Sanity Checking our Data

We can now open up our CSV files and see what they look like. How many samples did we collect? Is it enough? 

Once we check, we can hop on to the next step, bringing all the data into a single format and file for training.

## Creating a Labeled Dataset for Training and Testing

Now that we've got our data (mostly) sorted out, we need to convert it into a set. 

We'll use `0` for `other` poses, `1` for `dabs`, and `2` for `tposes`.



In [8]:
labels = np.zeros(len(otherDataset))
labels = np.append(labels, np.full((len(dabDataset)), 1))
labels = np.append(labels, np.full((len(tposeDataset)), 2))
print(labels)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2. 2. 2.]


In [9]:
# now let's label them for 'one hot'
from keras.utils.np_utils import to_categorical
one_hot_labels = to_categorical(labels, 3)

Using TensorFlow backend.


In [10]:
dataset = np.append(otherDataset, dabDataset, axis=0)
dataset = np.append(dataset, tposeDataset, axis=0)
print(dataset)

[[[488.3213     147.51425      0.83340967]
  [494.22372    284.5734       0.8012297 ]
  [386.4863     270.83716      0.66853976]
  ...
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]]

 [[515.7737     112.18818      0.83487195]
  [478.48004    274.7029       0.8005627 ]
  [368.77948    257.2105       0.6782713 ]
  ...
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]]

 [[547.1316     112.15151      0.79948723]
  [464.79065    268.88403      0.73338044]
  [360.98135    243.43745      0.62600124]
  ...
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]
  [  0.           0.           0.        ]]

 ...

 [[509.97504    257.06958      0.892523  ]
  [460.9663     351.17117      0.7867987 ]
  [372.75305    333.54434      0.6111988 ]
  ...
  [  0.           0.           0.        ]
  [  0.           

In [11]:
dataset.shape

(56, 25, 3)

In [12]:
dataset[0]

array([[4.88321289e+02, 1.47514252e+02, 8.33409667e-01],
       [4.94223724e+02, 2.84573395e+02, 8.01229715e-01],
       [3.86486298e+02, 2.70837158e+02, 6.68539762e-01],
       [3.37498718e+02, 4.31440033e+02, 8.06459844e-01],
       [2.76727325e+02, 5.92155334e+02, 6.95721209e-01],
       [6.01926575e+02, 2.96297577e+02, 6.77372575e-01],
       [6.17621460e+02, 4.47154175e+02, 8.15527081e-01],
       [6.33207092e+02, 6.13721497e+02, 7.50288665e-01],
       [4.49166534e+02, 6.23515259e+02, 3.33123893e-01],
       [3.68768433e+02, 6.15664124e+02, 2.96909660e-01],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [5.29464417e+02, 6.35266663e+02, 3.00662249e-01],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [4.72626495e+02, 1.25848732e+02, 7.93599010e-01],
       [5.09897217e+02, 1.25915306e+02, 8.75047982e-01],
       [4.47158661e+02, 1.31820

In [13]:
# now, let's shuffle labels and the array, the same way
from sklearn.utils import shuffle
X, y = shuffle(dataset, labels)
print(y)

[1. 2. 0. 0. 1. 1. 1. 2. 1. 0. 0. 0. 0. 1. 1. 2. 2. 0. 2. 0. 2. 2. 0. 2.
 0. 0. 0. 2. 2. 0. 1. 0. 1. 1. 0. 1. 0. 0. 1. 0. 0. 2. 0. 0. 2. 2. 0. 2.
 0. 1. 1. 2. 0. 0. 0. 2.]


True