<a href="https://colab.research.google.com/github/chasslayy/Melanin-Match-AI/blob/main/Melanin_Match_AI_Colab_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Melanin Match AI
**Machine Learning Final Project — Mercy University (CISC 550)**  
**Student:** Chastity Lewis  
**Semester:** Fall 2025  

### Objective
Build a supervised learning model that predicts foundation shades for diverse skin tones using image data and classification models (SVM, kNN, and a CNN-based deep learning model).


## 1. Setup & Imports

In [None]:
# Core Libraries
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Models
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Deep Learning
import tensorflow as tf
from tensorflow.keras import layers, models

# Misc
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully.")

Libraries imported successfully.


## 2. Data Loading

In this section, you will connect to your dataset.  
You can either:

- Mount Google Drive and point to a folder of images organized by skin tone labels, or  
- Clone your GitHub repo that contains the dataset.

Update the `data_path` variable below to match your dataset location.


In [None]:
# If using Google Drive, uncomment and run this:

from google.colab import drive
drive.mount('/content/drive')

# TODO: Update this path to your actual dataset
# Example structure:
# data_path/light, data_path/tan, data_path/medium, data_path/deep, data_path/dark

data_path = "/content/drive/MyDrive/MelaninMatchAI/data/images"  # CHANGE THIS TO YOUR FOLDER

categories = ['light', 'tan', 'medium', 'deep', 'dark']

print("Data path set to:", data_path)
print("Categories:", categories)

Mounted at /content/drive
Data path set to: /content/drive/MyDrive/MelaninMatchAI/data/images
Categories: ['light', 'tan', 'medium', 'deep', 'dark']


### 2.1 Preview a Sample Image

In [None]:
# This will try to preview one sample image from each category (if available)
for label in categories:
    folder = os.path.join(data_path, label)
    if not os.path.isdir(folder):
        print(f"Folder not found for category '{label}':", folder)
        continue

    files = os.listdir(folder)
    if len(files) == 0:
        print(f"No images found in folder for '{label}'")
        continue

    img_path = os.path.join(folder, files[0])
    img = cv2.imread(img_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    plt.imshow(img_rgb)
    plt.title(f"Sample Image - {label}")
    plt.axis('off')
    plt.show()

Folder not found for category 'light': /content/drive/MyDrive/MelaninMatchAI/data/images/light
Folder not found for category 'tan': /content/drive/MyDrive/MelaninMatchAI/data/images/tan
Folder not found for category 'medium': /content/drive/MyDrive/MelaninMatchAI/data/images/medium
Folder not found for category 'deep': /content/drive/MyDrive/MelaninMatchAI/data/images/deep
Folder not found for category 'dark': /content/drive/MyDrive/MelaninMatchAI/data/images/dark


## 3. Preprocessing

In [None]:
IMG_SIZE = (128, 128)
X, y = [], []

for label in categories:
    folder = os.path.join(data_path, label)
    if not os.path.isdir(folder):
        print(f"[WARNING] Skipping missing folder for category '{label}':", folder)
        continue

    for img_name in os.listdir(folder):
        img_path = os.path.join(folder, img_name)
        try:
            img = cv2.imread(img_path)
            if img is None:
                print(f"[WARNING] Could not read image: {img_path}")
                continue
            img = cv2.resize(img, IMG_SIZE)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            X.append(img)
            y.append(label)
        except Exception as e:
            print(f"[ERROR] Failed on {img_path}: {e}")

X = np.array(X) / 255.0
y = np.array(y)

print("Dataset size:", X.shape, y.shape)

# Encode labels
le = LabelEncoder()
y_encoded = le.fit_transform(y)
print("Encoded classes:", le.classes_)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
)

print("Train set:", X_train.shape, "Test set:", X_test.shape)

Dataset size: (0,) (0,)
Encoded classes: []


ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

## 4. Baseline Model — SVM

In [None]:
# Flatten images for classical machine learning models
X_train_flat = X_train.reshape(len(X_train), -1)
X_test_flat = X_test.reshape(len(X_test), -1)

svm_model = SVC(kernel='linear', C=1)
svm_model.fit(X_train_flat, y_train)

y_pred_svm = svm_model.predict(X_test_flat)

print("=== SVM Classification Report ===")
print(classification_report(y_test, y_pred_svm, target_names=le.classes_))

print("=== SVM Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred_svm))

## 5. Baseline Model — kNN

In [None]:
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train_flat, y_train)

y_pred_knn = knn_model.predict(X_test_flat)

print("=== kNN Classification Report ===")
print(classification_report(y_test, y_pred_knn, target_names=le.classes_))

print("=== kNN Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred_knn))

## 6. Deep Learning Model — CNN

In [None]:
cnn_model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_SIZE[0], IMG_SIZE[1], 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(len(categories), activation='softmax')
])

cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

cnn_model.summary()

### 6.1 Training the CNN

In [None]:
EPOCHS = 10  # You can increase this for better performance

history = cnn_model.fit(
    X_train, y_train,
    epochs=EPOCHS,
    validation_data=(X_test, y_test)
)

### 6.2 Training & Validation Curves

In [None]:
plt.figure()
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

plt.figure()
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

### 6.3 CNN Evaluation

In [None]:
y_pred_cnn_probs = cnn_model.predict(X_test)
y_pred_cnn = np.argmax(y_pred_cnn_probs, axis=1)

print("=== CNN Classification Report ===")
print(classification_report(y_test, y_pred_cnn, target_names=le.classes_))

print("=== CNN Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred_cnn))

## 7. Save Model

In [None]:
model_path = "melanin_match_ai_cnn.h5"
cnn_model.save(model_path)
print(f"Model saved to {model_path}")

NameError: name 'cnn_model' is not defined

## 8. Results & Discussion

- Compare the performance of **SVM**, **kNN**, and the **CNN model** using accuracy and the classification reports.
- Discuss which model performs best overall and which classes (skin tone categories) are hardest to classify.
- Reflect on any class imbalance or misclassification patterns you observe in the confusion matrices.
- You can copy key numbers and insights from this notebook into your final project report.


## 9. Conclusion & Future Work

Summarize:

- The goal of Melanin Match AI and what you achieved.
- Which model is currently your best-performing model.
- How this project supports fairness and inclusivity in shade matching.

Future work ideas:

- Use **transfer learning** with models like EfficientNet or ResNet for better performance.
- Add **data augmentation** (brightness, contrast, rotations) to make the model more robust to lighting changes.
- Collect more images for underrepresented skin tone categories to reduce bias.
- Deploy the model through a simple web app so users can upload a photo and receive shade recommendations.
