In [19]:
pip install imbalanced-learn

Note: you may need to restart the kernel to use updated packages.


The imbalanced-learn library provides a variety of tools to handle class imbalance issues in datasets, which is common in classification tasks. An imbalanced dataset may lead to models that perform poorly, particularly on the minority class.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, accuracy_score
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler

loading required libraries

In [29]:
df = pd.read_csv(r"C:\Users\hegde\OneDrive\Desktop\Dataset\grayscale_images.csv")

In [31]:
X = df.drop(columns=['label']).values  
scaler = StandardScaler()
X = scaler.fit_transform(X)

After executing this code, the variable X will be a standardized NumPy array containing the feature data from the original DataFrame df. The target variable ('label') is excluded from this feature matrix.
Standardizing the features is a common preprocessing step that can improve the performance of machine learning algorithms, particularly those sensitive to the scale of the input features, such as logistic regression, k-nearest neighbors, and support vector machines.

In [4]:
labels = ['apple frits', 'banana', 'bicycle', 'car', 'cat', 
          'elephant', 'mango frits', 'person', 'pizza', 'tiger', 
          'train', 'truck']

different types of labels of images

In [35]:
results = {}

In [45]:
for label in labels:
    y = np.where(df['label'] == label, 1, 0)    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    smote = SMOTE(random_state=42)
    X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
    model = MLPClassifier(hidden_layer_sizes=(5,), activation='relu', max_iter=1000, random_state=42)
    model.fit(X_train_res, y_train_res)
    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    accuracy = accuracy_score(y_test, y_pred)
    results[label] = {
        'precision': report['1']['precision'] if '1' in report else 0,
        'recall': report['1']['recall'] if '1' in report else 0,
        'f1-score': report['1']['f1-score'] if '1' in report else 0,
        'accuracy': accuracy
    }

This code effectively trains an MLP classifier for each label in a multi-label classification problem, handling class imbalance using SMOTE. The results are stored in a dictionary for later evaluation or reporting. This approach allows for better model performance on imbalanced datasets by ensuring that the classifier is exposed to enough examples of the minority class during training.

Synthetic Minority Over-sampling Technique
SMOTE is a resampling technique that generates synthetic examples of the minority class in an imbalanced dataset. Instead of simply duplicating existing instances of the minority class (which can lead to overfitting), SMOTE creates new instances by interpolating between existing ones.

Identify Minority Class Instances:
SMOTE first identifies the instances of the minority class in the dataset.
Select Neighbors:
For each instance of the minority class, SMOTE selects a certain number of its nearest neighbors (commonly using the Euclidean distance metric).
Generate Synthetic Instances:
It then generates synthetic instances by creating new examples along the line segments connecting the minority class instance to its selected neighbors.
This is done by randomly selecting one of the neighbors and creating a new instance with a linear combination of the features of the instance and the selected neighbor.
This process is repeated until the desired balance between classes is achieved.

Many machine learning algorithms perform poorly when trained on imbalanced datasets because they tend to be biased towards the majority class. SMOTE helps to alleviate this issue by increasing the number of instances of the minority class.

In [46]:
for label, metrics in results.items():
    print(f"Results for '{label}':")
    print(f"  Precision: {metrics['precision']:.2f}")
    print(f"  Recall: {metrics['recall']:.2f}")
    print(f"  F1 Score: {metrics['f1-score']:.2f}")
    print(f"  Accuracy: {metrics['accuracy']:.2f}\n")

Results for 'apple frits':
  Precision: 0.15
  Recall: 0.29
  F1 Score: 0.20
  Accuracy: 0.78

Results for 'banana':
  Precision: 0.17
  Recall: 0.20
  F1 Score: 0.18
  Accuracy: 0.88

Results for 'bicycle':
  Precision: 0.00
  Recall: 0.00
  F1 Score: 0.00
  Accuracy: 0.82

Results for 'car':
  Precision: 0.22
  Recall: 0.29
  F1 Score: 0.25
  Accuracy: 0.84

Results for 'cat':
  Precision: 0.21
  Recall: 0.50
  F1 Score: 0.30
  Accuracy: 0.81

Results for 'elephant':
  Precision: 0.67
  Recall: 0.44
  F1 Score: 0.53
  Accuracy: 0.90

Results for 'mango frits':
  Precision: 0.08
  Recall: 0.20
  F1 Score: 0.11
  Accuracy: 0.78

Results for 'person':
  Precision: 0.16
  Recall: 0.60
  F1 Score: 0.25
  Accuracy: 0.75

Results for 'pizza':
  Precision: 0.17
  Recall: 0.20
  F1 Score: 0.18
  Accuracy: 0.88

Results for 'tiger':
  Precision: 0.18
  Recall: 0.29
  F1 Score: 0.22
  Accuracy: 0.81

Results for 'train':
  Precision: 0.33
  Recall: 0.43
  F1 Score: 0.38
  Accuracy: 0.86

Result

hidden layer with 5 nueron

In [49]:
for label in labels:
    y = np.where(df['label'] == label, 1, 0)    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    smote = SMOTE(random_state=42)
    X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
    model = MLPClassifier(hidden_layer_sizes=(15,), activation='relu', max_iter=1000, random_state=42)
    model.fit(X_train_res, y_train_res)
    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    accuracy = accuracy_score(y_test, y_pred)
    results[label] = {
        'precision': report['1']['precision'] if '1' in report else 0,
        'recall': report['1']['recall'] if '1' in report else 0,
        'f1-score': report['1']['f1-score'] if '1' in report else 0,
        'accuracy': accuracy
    }

hidde layer with 15 neurons

In [51]:
for label, metrics in results.items():
    print(f"Results for '{label}':")
    print(f"  Precision: {metrics['precision']:.2f}")
    print(f"  Recall: {metrics['recall']:.2f}")
    print(f"  F1 Score: {metrics['f1-score']:.2f}")
    print(f"  Accuracy: {metrics['accuracy']:.2f}\n")

Results for 'apple frits':
  Precision: 0.50
  Recall: 0.43
  F1 Score: 0.46
  Accuracy: 0.90

Results for 'banana':
  Precision: 0.20
  Recall: 0.20
  F1 Score: 0.20
  Accuracy: 0.89

Results for 'bicycle':
  Precision: 0.00
  Recall: 0.00
  F1 Score: 0.00
  Accuracy: 0.90

Results for 'car':
  Precision: 0.00
  Recall: 0.00
  F1 Score: 0.00
  Accuracy: 0.85

Results for 'cat':
  Precision: 0.14
  Recall: 0.17
  F1 Score: 0.15
  Accuracy: 0.85

Results for 'elephant':
  Precision: 0.60
  Recall: 0.33
  F1 Score: 0.43
  Accuracy: 0.89

Results for 'mango frits':
  Precision: 0.25
  Recall: 0.20
  F1 Score: 0.22
  Accuracy: 0.90

Results for 'person':
  Precision: 0.15
  Recall: 0.40
  F1 Score: 0.22
  Accuracy: 0.81

Results for 'pizza':
  Precision: 0.17
  Recall: 0.20
  F1 Score: 0.18
  Accuracy: 0.88

Results for 'tiger':
  Precision: 0.40
  Recall: 0.29
  F1 Score: 0.33
  Accuracy: 0.89

Results for 'train':
  Precision: 0.50
  Recall: 0.43
  F1 Score: 0.46
  Accuracy: 0.90

Result

In [53]:
for label in labels:
    y = np.where(df['label'] == label, 1, 0)    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    smote = SMOTE(random_state=42)
    X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
    model = MLPClassifier(hidden_layer_sizes=(20,), activation='relu', max_iter=1000, random_state=42)
    model.fit(X_train_res, y_train_res)
    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    accuracy = accuracy_score(y_test, y_pred)
    results[label] = {
        'precision': report['1']['precision'] if '1' in report else 0,
        'recall': report['1']['recall'] if '1' in report else 0,
        'f1-score': report['1']['f1-score'] if '1' in report else 0,
        'accuracy': accuracy
    }

hidden layer with 20 neurons

In [55]:
for label, metrics in results.items():
    print(f"Results for '{label}':")
    print(f"  Precision: {metrics['precision']:.2f}")
    print(f"  Recall: {metrics['recall']:.2f}")
    print(f"  F1 Score: {metrics['f1-score']:.2f}")
    print(f"  Accuracy: {metrics['accuracy']:.2f}\n")

Results for 'apple frits':
  Precision: 0.33
  Recall: 0.43
  F1 Score: 0.38
  Accuracy: 0.86

Results for 'banana':
  Precision: 0.33
  Recall: 0.40
  F1 Score: 0.36
  Accuracy: 0.90

Results for 'bicycle':
  Precision: 0.00
  Recall: 0.00
  F1 Score: 0.00
  Accuracy: 0.90

Results for 'car':
  Precision: 0.33
  Recall: 0.29
  F1 Score: 0.31
  Accuracy: 0.88

Results for 'cat':
  Precision: 0.00
  Recall: 0.00
  F1 Score: 0.00
  Accuracy: 0.81

Results for 'elephant':
  Precision: 0.67
  Recall: 0.22
  F1 Score: 0.33
  Accuracy: 0.89

Results for 'mango frits':
  Precision: 0.12
  Recall: 0.20
  F1 Score: 0.15
  Accuracy: 0.85

Results for 'person':
  Precision: 0.17
  Recall: 0.40
  F1 Score: 0.24
  Accuracy: 0.82

Results for 'pizza':
  Precision: 0.33
  Recall: 0.40
  F1 Score: 0.36
  Accuracy: 0.90

Results for 'tiger':
  Precision: 0.14
  Recall: 0.14
  F1 Score: 0.14
  Accuracy: 0.84

Results for 'train':
  Precision: 0.43
  Recall: 0.43
  F1 Score: 0.43
  Accuracy: 0.89

Result

In this percepttron model we  ca obser that as the neurons are increasing in no. there is a increament in the resulte also like in accuracy, preccession ,recall and f1-score. but we also can observe some patterns that the neurons might have an threshold value for this kind of model 
in 15 neuron model work more better than the 5 neuron and 20 neuron model. so which leads to an idea about there will be axistance of threshold value for neurons