# Data Processing

The images that are being processed depict people each showing a letter from the ASL alphabet. The images used in this project are not published, the resulting datapoints from the images are published (data_unprocessed.pickle). If somebody wants to complete the entire project, the data_collection file can be used to create picture to process them in this file.   

The library Mediapipe provides methods to extract landmarks from one or multiple hands. The landmarks are:

![alt text](hand-landmarks.png "Title")

From each picture, the landmarks of the hand signing a letter from the ASL alphabet are extracted. The label indicating which letter is being signed is extracted from the folder name. All pictures showing a person signing the letter "A" are saved in the folder 0, all pictures showing a person signing the letter "B" are saved in the folder 1, and so on and forth. A data file is created where 63 landmarks per hand are stored, for each dimension in the 3D word (x,y,z-axis) 21 landmarks per hand are stored, which corresponds to 63 landmarks per hand. For each hand, a label is saved, indicating which letter is being signed. 

## Import Libraries

In [1]:
import os
import pickle
import numpy as np
import mediapipe as mp
import cv2
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image, display

2023-07-03 10:53:37.338945: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Extract landmarks from images and save them in pickle file

In [None]:
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

hands = mp_hands.Hands(static_image_mode=True) #input is static img

DATA_DIR = './data_collection'
landmarks = []
labels = []

for dir_name in os.listdir(DATA_DIR):
    if dir_name != ".DS_Store": # exclude metadata file
        dir_path = os.path.join(DATA_DIR, dir_name)
        if os.path.isdir(dir_path): # check if it is a directory
            for img_name in os.listdir(dir_path):
                img_path = os.path.join(dir_path, img_name)
                if os.path.isfile(img_path): # check if it is a file
                    img_rgb = cv2.imread(img_path)
                    
                    try:
                        # extract hand landmark; returns object that contains a list of 21 3D hand landmarks 
                        results = hands.process(img_rgb)
                        coordinates = []
                        if results.multi_hand_landmarks:
                            for hand_landmarks in results.multi_hand_landmarks:
                                for i in range(len(hand_landmarks.landmark)):
                                    x = hand_landmarks.landmark[i].x
                                    y = hand_landmarks.landmark[i].y
                                    z = hand_landmarks.landmark[i].z
                                    coordinates.append(x)
                                    coordinates.append(y)
                                    coordinates.append(z)
                            if len(coordinates) == 63: #only append data that has the right amount of landmarkers (3x21)
                                landmarks.append(coordinates) #landmarks, x,y,z-coorindates, for each coordinate 21 points
                                labels.append(dir_name) #label between 0 and 25, correspondening to the alphabet
                    
                    except AttributeError:
                        print("Attribute Error - Skip Image")

    f = open('data_unprocessed.pickle', 'wb') #data
    pickle.dump({'landmarks': landmarks, 'labels': labels}, f) #save data
    f.close()    

## Transform pickle file into structured csv file

In [None]:
data_dict = pickle.load(open('data_unprocessed.pickle', 'rb'))
data = np.asarray(data_dict['landmarks']) # all landmarks
labels = np.asarray(data_dict['labels']) # label 

#Transform the data into shape (amount of pics, 64) -> 63 landmarks per pic and 1 label
data_transformed = pd.DataFrame()
for x in range(0,len(data)):
    data_transformed[x] = data[x]
data_transformed = data_transformed.T
data_transformed["label"] = labels
data_transformed.shape
data_transformed.to_csv('data.csv', index=False)