### Image Preprocessing

We all understand that a deep learning model expects homogenious inputs inorder to attain it's intended functionality. 


    Keeping this in mind, we shall standardize the preprocessing phase for all model using this script: 

- Import all necessary modules:

In [12]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing import image


Preprocessing Images:
    
    Here's what should happen within this method:
        ->  Load all image paths.
        ->  Create image array.
        ->  normalize the values.
        ->  save this to a list of images. 

- Note: Size should be fixed to 224, 244
    

In [13]:
def preprocess_images_batch(img_paths, target_size=(224, 224)):
    processed_images = []
    for img_path in img_paths:
        img = image.load_img(img_path, target_size=target_size)
        img_array = image.img_to_array(img)
        img_array /= 255.0  
        processed_images.append(img_array)
    img_batch = np.stack(processed_images, axis=0)
    return img_batch

Get the labels:

    We have the labels for both train and test sets, within csv files. Each of these csv files contains following features:

    - filename  //**.jpg
    - label     //class of action.

In [14]:
df = pd.read_csv('datasets/Training_set.csv')

Hold out data for testing:

In [15]:
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)


Get the lists of image paths:

In [16]:
train_img_paths = 'datasets/train/' + train_df['filename'].values
train_labels = train_df.drop('filename', axis=1).values  # Assuming all other columns are labels

test_img_paths = 'datasets/train/' + test_df['filename'].values  # Note: Still pulling from the 'train' directory
test_labels = test_df.drop('filename', axis=1).values



Preprocess these images:

In [17]:
X_train = preprocess_images_batch(train_img_paths)
X_test = preprocess_images_batch(test_img_paths)

y_train = np.array(train_labels)
y_test = np.array(test_labels)

X_train[:5]

array([[[[0.4117647 , 0.4117647 , 0.40392157],
         [0.6745098 , 0.6745098 , 0.6666667 ],
         [0.70980394, 0.70980394, 0.7019608 ],
         ...,
         [0.77254903, 0.8627451 , 0.8862745 ],
         [0.78431374, 0.8784314 , 0.89411765],
         [0.78431374, 0.8784314 , 0.89411765]],

        [[0.36862746, 0.36862746, 0.36078432],
         [0.75686276, 0.75686276, 0.7490196 ],
         [0.6392157 , 0.6392157 , 0.6313726 ],
         ...,
         [0.7764706 , 0.85882354, 0.8784314 ],
         [0.76862746, 0.8509804 , 0.8627451 ],
         [0.76862746, 0.8509804 , 0.8627451 ]],

        [[0.32156864, 0.32156864, 0.3137255 ],
         [0.8235294 , 0.8235294 , 0.8156863 ],
         [0.5019608 , 0.5019608 , 0.49411765],
         ...,
         [0.627451  , 0.6901961 , 0.6901961 ],
         [0.61960787, 0.68235296, 0.68235296],
         [0.61960787, 0.68235296, 0.68235296]],

        ...,

        [[1.        , 1.        , 1.        ],
         [1.        , 1.        , 1.        ]

Save the preprocessed images:

In [19]:
np.save('X_train.npy', X_train)
np.save('X_val.npy', X_test)
np.save('y_train.npy', y_train)
np.save('y_val.npy', y_test)
