### Note
To run the cells in this notebook, ensure that you have the necessary datasets for image classification. These include the original banana and mango ripeness dataset, as well as the processed versions: one with the background removed and another with both the background removed and cropped. If you only have the original dataset, please run the banana_mango_ripe_image_processing.py file to process the images.

In [6]:
import os
import cv2
import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV


# Dataset Loading

In [5]:
image_folders = {
    "original": "../../../../datasets/banana_mango_ripe/train/images",
    "bg_removed": "../../../../datasets/banana_mango_ripe/images_background_removed",
    "bg_removed_cropped": "../../../../datasets/banana_mango_ripe/images_background_removed_cropped"
}

dataset_variations = {
    "original": {"banana_ripe": [], "banana_raw": [], "mango_ripe": [], "mango_raw": []},
    "bg_removed": {"banana_ripe": [], "banana_raw": [], "mango_ripe": [], "mango_raw": []},
    "bg_removed_cropped": {"banana_ripe": [], "banana_raw": [], "mango_ripe": [], "mango_raw": []},
}

def load_images(folder_key):
    folder_path = image_folders[folder_key]

    for file in os.listdir(folder_path):
        img_path = os.path.join(folder_path, file)
        img = cv2.imread(img_path)
        
        IMG_SIZE = (64, 64)
        img = cv2.resize(img, IMG_SIZE)
        img = img.flatten()

        if "Ripe_Banana" in file:
            dataset_variations[folder_key]["banana_ripe"].append(img)
        elif "Raw_Banana" in file:
            dataset_variations[folder_key]["banana_raw"].append(img)
        elif "Ripe_Mango" in file:
            dataset_variations[folder_key]["mango_ripe"].append(img)
        elif "Raw_Mango" in file:
            dataset_variations[folder_key]["mango_raw"].append(img)


for key in dataset_variations.keys():
    load_images(key)

for key, categories in dataset_variations.items():
    print(f"\nDataset: {key}")
    for category, images in categories.items():
        print(f"{category}: {len(images)} images")


Dataset: original
banana_ripe: 1000 images
banana_raw: 999 images
mango_ripe: 1000 images
mango_raw: 1000 images

Dataset: bg_removed
banana_ripe: 1000 images
banana_raw: 999 images
mango_ripe: 1000 images
mango_raw: 1000 images

Dataset: bg_removed_cropped
banana_ripe: 1000 images
banana_raw: 999 images
mango_ripe: 1000 images
mango_raw: 1000 images


# Model Training Functions

In [9]:
def create_svc_grid_search():
    param_grid = {
        "C": [0.1, 1, 10],
        "kernel": ["linear", "rbf"],
        "gamma": ["scale", "auto"]
    }
    return GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)

def create_random_forest_grid_search():
    param_grid = {
        "n_estimators": [100, 200, 300],
        "max_depth": [5, 10, 20],
        "random_state": [42]
    }
    return GridSearchCV(RandomForestClassifier(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)

def train_model_treated(X, y, model_name, model_type):
    label_encoder = LabelEncoder()
    y_encoded = label_encoder.fit_transform(y)

    X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

    grid_search = None
    if model_type == "SVC":
        grid_search = create_svc_grid_search()
    elif model_type == "RandomForest":
        grid_search = create_random_forest_grid_search()
    
    grid_search.fit(X_train, y_train)

    best_model = grid_search.best_estimator_
    y_pred = best_model.predict(X_test)

    report = classification_report(y_test, y_pred, target_names=label_encoder.classes_)
    print(f"{model_name} Best Params: {grid_search.best_params_}")
    print(f"{model_name} Classification Report:\n{report}")

    return best_model

def train_model(dataset, model_type):
    banana_ripe, banana_raw = np.array(dataset["banana_ripe"]), np.array(dataset["banana_raw"])
    mango_ripe, mango_raw = np.array(dataset["mango_ripe"]), np.array(dataset["mango_raw"])

    X_banana = np.vstack((banana_ripe, banana_raw))
    y_banana = ["ripe"] * len(banana_ripe) + ["raw"] * len(banana_raw)
    train_model_treated(X_banana, y_banana, "Banana", model_type)

    X_mango = np.vstack((mango_ripe, mango_raw))
    y_mango = ["ripe"] * len(mango_ripe) + ["raw"] * len(mango_raw)
    train_model_treated(X_mango, y_mango, "Mango", model_type)

# SVC

### SVC with noisy background

In [10]:
train_model(dataset_variations["original"], "SVC")

Banana Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.95      0.97      0.96       193
        ripe       0.97      0.95      0.96       207

    accuracy                           0.96       400
   macro avg       0.96      0.96      0.96       400
weighted avg       0.96      0.96      0.96       400

Mango Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.97      0.98      0.97       201
        ripe       0.98      0.96      0.97       199

    accuracy                           0.97       400
   macro avg       0.97      0.97      0.97       400
weighted avg       0.97      0.97      0.97       400



### SVC with background removed

In [11]:
train_model(dataset_variations["bg_removed"], "SVC")

Banana Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.89      0.84      0.87       193
        ripe       0.86      0.90      0.88       207

    accuracy                           0.88       400
   macro avg       0.88      0.87      0.87       400
weighted avg       0.88      0.88      0.87       400

Mango Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.93      0.91      0.92       201
        ripe       0.91      0.93      0.92       199

    accuracy                           0.92       400
   macro avg       0.92      0.92      0.92       400
weighted avg       0.92      0.92      0.92       400



### SVC background removed and cropped

In [12]:
train_model(dataset_variations["bg_removed_cropped"], "SVC")

Banana Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.92      0.92      0.92       193
        ripe       0.92      0.93      0.93       207

    accuracy                           0.92       400
   macro avg       0.92      0.92      0.92       400
weighted avg       0.92      0.92      0.92       400

Mango Best Params: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.97      0.97      0.97       201
        ripe       0.97      0.97      0.97       199

    accuracy                           0.97       400
   macro avg       0.97      0.97      0.97       400
weighted avg       0.97      0.97      0.97       400



# Random Forest

### Random Forest with noisy background

In [13]:
train_model(dataset_variations["original"], "RandomForest")

Banana Best Params: {'max_depth': 10, 'n_estimators': 300, 'random_state': 42}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.94      0.95      0.95       193
        ripe       0.96      0.94      0.95       207

    accuracy                           0.95       400
   macro avg       0.95      0.95      0.95       400
weighted avg       0.95      0.95      0.95       400

Mango Best Params: {'max_depth': 20, 'n_estimators': 300, 'random_state': 42}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.95      0.98      0.96       201
        ripe       0.97      0.94      0.96       199

    accuracy                           0.96       400
   macro avg       0.96      0.96      0.96       400
weighted avg       0.96      0.96      0.96       400



### Random Forest background removed

In [14]:
train_model(dataset_variations["bg_removed"], "RandomForest")

Banana Best Params: {'max_depth': 20, 'n_estimators': 200, 'random_state': 42}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.87      0.83      0.85       193
        ripe       0.85      0.89      0.87       207

    accuracy                           0.86       400
   macro avg       0.86      0.86      0.86       400
weighted avg       0.86      0.86      0.86       400

Mango Best Params: {'max_depth': 20, 'n_estimators': 200, 'random_state': 42}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.92      0.94      0.93       201
        ripe       0.93      0.92      0.93       199

    accuracy                           0.93       400
   macro avg       0.93      0.93      0.93       400
weighted avg       0.93      0.93      0.93       400



### Random Forest background removed and cropped

In [15]:
train_model(dataset_variations["bg_removed_cropped"], "RandomForest")

Banana Best Params: {'max_depth': 10, 'n_estimators': 300, 'random_state': 42}
Banana Classification Report:
              precision    recall  f1-score   support

         raw       0.86      0.90      0.88       193
        ripe       0.90      0.87      0.88       207

    accuracy                           0.88       400
   macro avg       0.88      0.88      0.88       400
weighted avg       0.88      0.88      0.88       400

Mango Best Params: {'max_depth': 20, 'n_estimators': 300, 'random_state': 42}
Mango Classification Report:
              precision    recall  f1-score   support

         raw       0.96      0.98      0.97       201
        ripe       0.98      0.96      0.97       199

    accuracy                           0.97       400
   macro avg       0.97      0.97      0.97       400
weighted avg       0.97      0.97      0.97       400



# Summary
Simpler machine learning techniques, rather than deep learning methods, were employed to assess the dataset’s performance. Specifically, Support Vector Classification (SVC) and Random Forest were used to verify whether the dataset could yield positive results despite the use of less complex models.

The results demonstrated that even with simpler models, the dataset produced highly positive outcomes. The SVC model yielded the best performance, although both SVC and Random Forest delivered similar results, with an accuracy of 97% in correctly identifying the ripeness of both bananas and mangoes. All other performance metrics were also strong, especially when optimized using GridSearch.

To enhance model performance, images were preprocessed through background removal and cropping to focus primarily on the fruit. While this approach generally resulted in acceptable images, some contained noise, such as foreign objects like hands, laptops, and tables. Interestingly, the use of these processed images led to a decrease in performance, suggesting that background information may contribute to classification accuracy.