<a href="https://colab.research.google.com/github/RudyMartin/dsai-2024/blob/main/MVPS/Camp-Rock-Paper-Scissors/team_GG_AI/predict_image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Decision: Re-size Image or Alter CNN Input Layer Shape

Background: (Need clear problem statement here)

When deciding whether to change the input layer or adjust the image dimensions, you should consider the following factors:

### 1. **Recommendation for Gaming: Using the Pre-trained Model as Is (Adjusting the Image Dimensions)**:
   - **Pre-trained Models**: The MobileNetV2 model is pre-trained on the ImageNet dataset, where images typically have dimensions of \(224 \times 224 \times 3\), but in your code, you're using \(128 \times 128 \times 3\). This means the model expects square images with specific characteristics.
   - **Benefits of Adjusting the Image**:
     - **Preserving Pre-trained Weights**: By resizing your \(32 \times 128 \times 3\) images to \(128 \times 128 \times 3\), you maintain the integrity of the pre-trained weights. The model is designed to work with square images, and resizing non-square images to fit this shape is a common practice.
     - **Consistency with Pre-training**: The model's filters and features are designed around the input shape it was trained on. Resizing to square shapes allows you to take full advantage of these pre-trained features.

   - **How to Adjust the Image**:
     ```python
     image = cv2.resize(image, (128, 128))  # Resize to the required input shape
     ```

   - **Potential Downsides**:
     - **Distortion**: Resizing non-square images to square dimensions can introduce distortion, which might affect the model's performance if the aspect ratio is important for your task. However, this can often be mitigated by using appropriate data augmentation techniques.

     **See code below for solution to distortion by adding padding to existing image to square it.**

### 2. **Changing the Input Layer (Keeping the Image Dimensions as Is)**:
   - **Custom Input Shape**: Alternatively, you could modify the input shape of the model to accept \(32 \times 128 \times 3\) images. This would involve adjusting the `input_shape` parameter when loading the MobileNetV2 base model.
   - **Implications of Changing Input Shape**:
     - **Re-training or Fine-tuning**: By changing the input shape to a non-square shape like \(32 \times 128 \times 3\), you are altering the architecture that the pre-trained weights were designed for. This means that the pre-trained weights may not work as effectively, and you may need to fine-tune the model on your specific dataset.
     - **Loss of Pre-trained Model Benefits**: The pre-trained filters in the early layers are designed to work with square images, so changing the input shape could reduce the effectiveness of these pre-trained features.

   - **How to Change the Input Layer**: (Not recommended)
     ```python
     base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(32, 128, 3))
     ```

   - **Potential Benefits**:
     - **No Image Distortion**: You avoid distorting the image by keeping its original aspect ratio.
     - **Direct Processing**: The model processes the images in their natural form, which might be beneficial if the aspect ratio is critical for your task.

   - **Potential Downsides**:
     - **Need for Fine-tuning**: You might need to fine-tune the entire model on your dataset to adjust for the new input shape.

### Recommendation:

**Adjusting the Image** to fit the input shape expected by the pre-trained model is generally the better approach, especially if:
- You want to leverage the pre-trained weights effectively.
- You have limited data for fine-tuning and don't want to lose the benefits of pre-training.

However, **Changing the Input Layer** might be more appropriate if:
- The aspect ratio of your images is critical, and you want to preserve it without distortion.
- You are prepared to fine-tune or even re-train the model to account for the new input shape.

If you choose to resize your images, consider using data augmentation techniques (like padding or cropping) to help mitigate any potential distortion effects. If you decide to change the input layer, be prepared for a more extensive fine-tuning process.

### Example 1. Predict Gesture from Path

In [None]:
# Libraries and imports needed
import cv2
import numpy as np
from tensorflow.keras.models import load_model

# Define class labels
class_labels = {
0: "Up",
1: "Down",
2: "Left",
3: "Right",
4: "Straight" # Add more labels as needed
}

# Load your pre-trained model

model_dir = ''
model_name = 'mobilenetv2_head_gesture_model.keras'
model = load_model(f'{model_dir}{model_name}')

# Function to pad image to make it square and not distort on resize
def pad_to_square(image, desired_size=128):
    old_size = image.shape[:2]  # (height, width)
    ratio = float(desired_size) / max(old_size)

    # Compute new size to maintain aspect ratio
    new_size = tuple([int(x * ratio) for x in old_size])

    # Resize image
    image_resized = cv2.resize(image, (new_size[1], new_size[0]))

    # Compute padding
    delta_w = desired_size - new_size[1]
    delta_h = desired_size - new_size[0]
    top, bottom = delta_h // 2, delta_h - (delta_h // 2)
    left, right = delta_w // 2, delta_w - (delta_w // 2)

    # Add padding
    color = [0, 0, 0]  # Black padding
    new_image = cv2.copyMakeBorder(image_resized, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)

    return new_image

# Function to preprocess and predict gesture from path
def predict_gesture_pth(image_path):
    # Load image
    image = cv2.imread(image_path)

    # Pad image to make it square
    padded_image = pad_to_square(image, desired_size=128)

    # Normalize to [0, 1]
    padded_image = padded_image.astype('float32') / 255.0

    # Add batch dimension
    padded_image = np.expand_dims(padded_image, axis=0)

    # Predict using the model
    predictions = model.predict(padded_image)
    predicted_class = np.argmax(predictions, axis=1)[0]

    return predicted_class

# Example usage
image_path = 'example_image.png'  # Replace with the actual path to your image
predicted_class = predict_gesture_pth(image_path)

# Print the predicted class
print(f'Predicted class: {predicted_class}')

## Example 2: Predict Gesture from Image

In [None]:
import cv2
import numpy as np
from tensorflow.keras.models import load_model

# Load your pre-trained model
model_dir = ''
model_name = 'mobilenetv2_head_gesture_model.keras'
model = load_model(f'{model_dir}{model_name}')

# Define class labels
class_labels = {
    0: "Up",
    1: "Down",
    2: "Left",
    3: "Right",
    4: "Straight"  # Add more labels as needed
}

# Function to pad image to make it square
def pad_to_square(image, desired_size=128):
    old_size = image.shape[:2]  # (height, width)
    ratio = float(desired_size) / max(old_size)

    # Compute new size to maintain aspect ratio
    new_size = tuple([int(x * ratio) for x in old_size])

    # Resize image
    image_resized = cv2.resize(image, (new_size[1], new_size[0]))

    # Compute padding
    delta_w = desired_size - new_size[1]
    delta_h = desired_size - new_size[0]
    top, bottom = delta_h // 2, delta_h - (delta_h // 2)
    left, right = delta_w // 2, delta_w - (delta_w // 2)

    # Add padding
    color = [0, 0, 0]  # Black padding
    new_image = cv2.copyMakeBorder(image_resized, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)

    return new_image

# Function to preprocess and predict gesture from image
def predict_gesture_img(image):
    # Pad image to make it square
    padded_image = pad_to_square(image, desired_size=128)

    # Normalize to [0, 1]
    padded_image = padded_image.astype('float32') / 255.0

    # Add batch dimension
    padded_image = np.expand_dims(padded_image, axis=0)

    # Predict using the model
    predictions = model.predict(padded_image)
    predicted_class = np.argmax(predictions, axis=1)[0]

    return predicted_class

# Initialize camera
cap = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()

    # Flip the frame horizontally
    frame = cv2.flip(frame, 1)

    # Predict gesture
    predicted_class = predict_gesture_img(frame)
    gesture_label = class_labels[predicted_class]

    # Display the resulting frame with prediction
    cv2.putText(frame, f"Gesture: {gesture_label}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow('Head Gesture Recognition', frame)

    # Exit on 'q' key press
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
