# Emotion Classification: Data Preparation

In this notebook, you'll:

- Upload up to 10 face images per emotion class.
- Organize them into folders by emotion.
- Split the dataset into training and testing sets.

**Emotions to capture:** `['happy', 'sad', 'angry', 'surprised', 'neutral']`

## 1. Setup and Imports
Install dependencies and import required modules:

In [None]:
#sample
#!pip install scikit-learn pandas
#import os
#from google.colab import files
#from sklearn.model_selection import train_test_split
#import pandas as pd

# Emotion classes
#emotions = ['happy', 'sad', 'angry', 'surprised', 'neutral']
#data_dir = 'data'

# Create data directories
#os.makedirs(data_dir, exist_ok=True)
#for emo in emotions:
    #os.makedirs(os.path.join(data_dir, emo), exist_ok=True)
#print("Setup complete. Data directories ready.")

In [1]:
#All of the needed dependencies listed above
!pip install scikit-learn pandas
import os
import cv2
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import pandas as pd




In [2]:
# Conecting my google drive to the notebook so i can access the images folders training data
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## 2. Upload Images per Emotion
For each emotion, upload up to 10 images. After selecting files, the dialog will close.

In [9]:
# i uploaded the images folder to my google drive and this sets up the path to the dataset images
data_dir = '/content/drive/MyDrive/Colab Notebooks/images'

# The emotions folders
emotions = ['happy', 'sad',  'neutral']


In [10]:
for emo in emotions:
    emo_path = os.path.join(data_dir, emo)
    if os.path.exists(emo_path):
        print(f"{emo}: {len(os.listdir(emo_path))} images found.")
    else:
        print(f"{emo}: ❌ folder not found!")


happy: 10 images found.
sad: 10 images found.
neutral: 10 images found.


## 3. Prepare Train/Test Split
Gather all image paths and labels, then split:

In [11]:
# We gathered all image file paths and associate them with their emotion labels.
image_paths = []
labels = []

for emo in emotions:
    folder = os.path.join(data_dir, emo)
    for fname in os.listdir(folder):
        if fname.lower().endswith(('.png', '.jpg', '.jpeg')):
            image_paths.append(os.path.join(folder, fname))
            labels.append(emo)

print(f"Total images collected: {len(image_paths)}")


Total images collected: 30


In [15]:
# splits the  data 80% train / 20% tes
train_paths, test_paths, train_labels, test_labels = train_test_split(
    image_paths, labels, test_size=0.2, stratify=labels, random_state=42)

print(f"Training samples: {len(train_paths)}")
print(f"Testing samples: {len(test_paths)}")


Training samples: 24
Testing samples: 6


## 4. Save Split Lists
Export file lists and labels to CSV for future use:

In [14]:
# Create DataFrames
train_df = pd.DataFrame({'image_path': train_paths, 'label': train_labels})
test_df = pd.DataFrame({'image_path': test_paths, 'label': test_labels})

# Save to CSV
train_df.to_csv('train_split.csv', index=False)
test_df.to_csv('test_split.csv', index=False)

print("CSV files saved successfully.")


CSV files saved successfully.


Aswer reflection in markdown

1. **Hidden Bias:**  
   Identify one scenario where your current images might lead the model to learn a spurious signal (e.g. background, lighting). How would you test for and eliminate it?

2. **Edge Cases:**  
   Describe a face or expression that your dataset likely fails to capture. What impact could that have on real-world performance, and how would you address it?

3. **Generalization Strategy:**  
   With only 10 images per class, what’s one concrete augmentation or data-collection strategy you’d use to improve robustness—and why that choice?






