<h1>Install mediapipe</h1>

<h1>Initiate path for Yoga Train Dataset + saving our mediapipe pose keypoints for later</h1>

In [46]:
yoga_train_images_dir = 'D:\FPT\AIP391\DATASET\TRAIN'
keypoint_outputs_dir = './keypoints/'

<h1>Generate pose keypoints for each image in train dataset...</h1>

Note: This part is inspired by https://www.kaggle.com/code/venkatkumar001/pose-prediction-generate-csv-keypoints-mediapipe

In [47]:
import cv2
import numpy as np
import os
import tqdm
from mediapipe.python.solutions import drawing_utils as mp_drawing
from mediapipe.python.solutions import pose as mp_pose

In [48]:
pose_class_names = sorted([n for n in os.listdir(yoga_train_images_dir)])

for pose_class_name in pose_class_names:
    image_names = sorted([n for n in os.listdir(os.path.join(yoga_train_images_dir, pose_class_name))])
    
    try:
        os.makedirs(os.path.join(keypoint_outputs_dir, pose_class_name))
    except:
        break
    
    print("Bootstrapping", pose_class_name)
    for image_name in tqdm.tqdm(image_names):
        input_frame = cv2.imread(os.path.join(yoga_train_images_dir, pose_class_name, image_name))
        input_frame = cv2.cvtColor(input_frame, cv2.COLOR_BGR2RGB)
        
        with mp_pose.Pose() as pose_tracker:
            result = pose_tracker.process(image=input_frame)
            pose_landmarks = result.pose_landmarks
        
        output_frame = input_frame.copy()
        mp_drawing.draw_landmarks(image=output_frame, landmark_list=pose_landmarks, connections=mp_pose.POSE_CONNECTIONS)
        
        output_frame = cv2.cvtColor(output_frame, cv2.COLOR_RGB2BGR)
        # cv2.imwrite(os.path.join(train_outputs_dir, image_name), output_frame)
        
        if pose_landmarks is not None: 
            pose_landmarks = [[landmark.x, landmark.y, landmark.z] for landmark in pose_landmarks.landmark]
            frame_height, frame_width = output_frame.shape[:2]
            
            # question: does de-normalizing keypoint coordinates affect training? Later, try training network using [0,1] normalization of coordinates instead of absolute
            pose_landmarks *= np.array([frame_height, frame_height, frame_width])
            
            pose_landmarks = np.around(pose_landmarks, 5).flatten().astype(np.float32).tolist()
            
            npy_savepath = os.path.join(keypoint_outputs_dir, pose_class_name, image_name[0:-4]) # remove any .jpg, .png, etc suffix
            np.save(npy_savepath, pose_landmarks)


<h1>Generate our train/test datasets</h1>

Generated from previously saved .npy files(aka the keypoints but squished into 1D vectors, 33(media pipe generated 33 points)*3(x,y,z coords)
Note: Dataloading and transforms inspired by Nicholas Renotte's video on Sign Language detection with MediaPose: https://youtu.be/doDUihpj6ro

In [49]:
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from glob import glob

In [50]:
label_map = {label:num for num,label in enumerate(pose_class_names)}
print(label_map)

{'downdog': 0, 'goddess': 1, 'plank': 2, 'tree': 3, 'warrior2': 4}


In [51]:
sequences, labels = [], []

for pose_class_name in pose_class_names:
    keypoint_names = glob(os.path.join(keypoint_outputs_dir, pose_class_name, "*.npy"))
    
    print("searching through {}".format(pose_class_name))
    for keypoint_name in tqdm.tqdm(keypoint_names):
        file = np.load(keypoint_name)
        sequences.append(file)
        labels.append(label_map[pose_class_name])
    
print(sequences)
print(labels)

searching through downdog


100%|██████████| 199/199 [00:00<00:00, 1183.73it/s]


searching through goddess


100%|██████████| 172/172 [00:00<00:00, 1724.25it/s]


searching through plank


100%|██████████| 263/263 [00:00<00:00, 1269.05it/s]


searching through tree


100%|██████████| 159/159 [00:00<00:00, 1681.41it/s]


searching through warrior2


100%|██████████| 249/249 [00:00<00:00, 1502.23it/s]


[array([ 249.11863708,  391.52960205,  -82.06987   ,  234.87469482,
        392.30105591, -116.82182312,  233.39265442,  389.78201294,
       -116.79383087,  231.76834106,  387.01077271, -116.82102966,
        235.04542542,  392.8543396 ,  -58.94342041,  233.82600403,
        390.92822266,  -58.88739014,  232.4425354 ,  388.93740845,
        -58.81082153,  222.62327576,  360.95666504, -219.50427246,
        222.99667358,  363.9670105 ,   47.26697159,  251.87045288,
        376.68643188, -114.00479126,  252.66848755,  377.42504883,
        -36.69503021,  233.89077759,  306.71520996, -298.11331177,
        237.74488831,  302.87249756,  157.64915466,  168.18821716,
        401.02835083, -539.69769287,  167.36857605,  389.0171814 ,
        242.68400574,   89.1470108 ,  472.72494507, -544.36590576,
        101.41313171,  450.78338623,   12.86056995,   65.48316193,
        475.68267822, -612.95544434,   78.28784943,  456.51553345,
          3.66364002,   61.49057007,  472.20822144, -569.7590

In [52]:
print(len(sequences))
print(len(labels))

print(np.array(sequences).shape) # (1042, 99), 1042 images, 99 for 33(keypoints)x3(coordinates, x,y,z captured)
print(np.array(labels).shape) # (1042, 99), 1042 images, 99 for 33(keypoints)x3(coordinates, x,y,z captured)

1042
1042
(1042, 99)
(1042,)


In [53]:
X = np.array(sequences)
y = to_categorical(labels).astype(int)
print(y.shape) # 5 categories for currently recorded 5 poses(downdog, goddess, plank, tree, warrior2)

(1042, 5)


5% split to test is good enough : )

In [54]:
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.05)

In [55]:
print(X_train.shape)
print(len(X_test))
print(X_test)

(989, 99)
53
[[ 304.07476807  199.45861816  -54.92324829 ...  247.57652283
   492.22753906 -141.03450012]
 [ 340.89141846  187.78334045 -191.8351593  ...  186.80903625
   539.85583496  -71.97081757]
 [1134.48937988  491.03384399 -908.64849854 ...  676.36804199
  1996.86889648 -124.88321686]
 ...
 [ 213.09675598  344.16790771   -7.57246017 ...  339.32299805
   411.94741821  167.23490906]
 [ 281.5296936   239.41119385 -204.52641296 ...  281.46688843
   648.355896     48.83076096]
 [ 122.10247803   38.52347946  134.83117676 ...  216.70974731
   217.91682434  123.6020813 ]]


<h1>Generate our relatively simple but effective model...</h1>

If anyone has suggestions to improve the model, please let me know : )

In [56]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, InputLayer, Dropout

In [57]:
model2 = Sequential([
    InputLayer(input_shape=(99,)),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(5, activation='softmax')
])

In [58]:
model2.compile(optimizer="Adam", loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 64)                6400      
                                                                 
 dense_4 (Dense)             (None, 32)                2080      
                                                                 
 dense_5 (Dense)             (None, 5)                 165       
                                                                 
Total params: 8,645
Trainable params: 8,645
Non-trainable params: 0
_________________________________________________________________


<h1>Train our network!</h1>

Runs pretty fast even on kaggle without an accelerator...might even be able to scale up to 1000 epochs...

In [59]:
model2.fit(X_train, Y_train, epochs=500)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.callbacks.History at 0x2846c678c40>

<h1>Let's do some predictions!</h1>

In [76]:
poses_mapping = {0: 'downdog', 1: 'goddess', 2: 'plank', 3: 'tree', 4: 'warrior2'}
result = model2.predict(X_test)
print(result[5][0])

print("Predicted pose:", poses_mapping[np.argmax(result[5])])
print("Actual pose:", poses_mapping[np.argmax(Y_test[5])])

1.0
Predicted pose: downdog
Actual pose: downdog


<h3>And then save our model for later use...</h3>

In [61]:
model2.save('./model_save/tripleDense_500steps.h5')