# FER-2013 Dataset Cleaning & Summary

This notebook performs the following tasks:

1. **Read the Excel/CSV dataset** containing image names, emotions, and usage (train/test).  
2. **Check for missing values** to ensure all rows are complete.  
3. **Inspect unique values** in `Emotion` and `Usage` to detect typos or invalid entries.  
4. **Clean data** by stripping extra spaces and removing duplicates if any.  
5. **Summarize dataset** with counts per emotion and per usage.  
6. **Export cleaned CSV** ready for analysis or dashboard creation in Excel/Power BI.
## Author: Lilian Alhalabi


In [2]:
# Import libraries
import os  # For handling folders and file paths
import pandas as pd  # For working with tables (DataFrame)

# Set the base folder path of the dataset
base_path = "FER-2013"

# Create an empty list to store image metadata
data = []

# Loop through the main dataset folders: "train" and "test"
for usage in ["train", "test"]:
    # Build the full path to the current usage folder
    usage_path = os.path.join(base_path, usage)
    
    # Loop through each emotion folder inside the current usage folder
    for emotion in os.listdir(usage_path):
        # Build the full path to the emotion folder
        emotion_path = os.path.join(usage_path, emotion)
        
        # Check if the path is indeed a folder (skip any files)
        if os.path.isdir(emotion_path):
            
            # Loop through all image files inside the emotion folder
            for img in os.listdir(emotion_path):
                # Add a dictionary with image info to the data list
                data.append({
                    "Image_Name": img,      # Name of the image file
                    "Emotion": emotion,     # Emotion label from folder name
                    "Usage": usage          # Train or Test
                })

# Convert the list of dictionaries into a DataFrame (table)
df = pd.DataFrame(data)

# Save the DataFrame as a CSV file (without the index column)
df.to_csv("fer2013_images_metadata.csv", index=False)

# Print a success message when done
print("CSV file created successfully!")


CSV file created successfully!


In [3]:
pip install openpyxl


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


### Preprocess step

In [4]:
import pandas as pd


df = pd.read_excel("fer2013_images_metadata.xlsx")

print(df.head())

print(df.isnull().sum())

print(df['Emotion'].unique())
print(df['Usage'].unique())


              Image_Name Emotion  Usage
0  Training_10118481.jpg   angry  train
1  Training_10120469.jpg   angry  train
2  Training_10131352.jpg   angry  train
3  Training_10161559.jpg   angry  train
4   Training_1021836.jpg   angry  train
Image_Name    0
Emotion       0
Usage         0
dtype: int64
['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
['train' 'test']


### No missing values
### No errors
### Ready to use and process