<font color="red" size="6">Hybrid Model</font>
<p><font color="Yellow" size="5"><b>1_SMOTEENN (SMOTE + Edited Nearest Neighbors)</b> </font>

<font color="pink" size=4>SMOTEENN is a hybrid technique that combines two methods to address class imbalance in datasets:</font>
<ol>
    <li><font color="orange">SMOTE (Synthetic Minority Over-sampling Technique):</font> This technique creates synthetic samples for the minority class by interpolating between existing minority class samples.</li>
    <li><font color="orange">ENN (Edited Nearest Neighbors):</font> This is an under-sampling technique that removes noisy samples and borderline samples from both classes by looking at their nearest neighbors. It cleans up the dataset by removing ambiguous samples that could confuse a classifier.</li></ol>
<p>
The combination of SMOTE and ENN aims to balance the dataset by generating synthetic minority samples (using SMOTE) and cleaning the dataset by removing noisy, borderline instances (using ENN).</p>

<font color="pink" size=4>How SMOTEENN Works:</font>
<ol>
    <li><font color="orange">Generate Synthetic Minority Samples (SMOTE):</font>
        <ol>The algorithm first applies SMOTE to create synthetic samples for the minority class. This increases the number of minority class samples by generating synthetic samples based on the nearest neighbors.</ol></li>
    <li><font color="orange">Remove Noisy and Borderline Samples (ENN):</font>
        <ol>After generating synthetic samples, ENN is applied to remove noisy or borderline samples. It identifies samples whose neighbors belong to a different class and removes them from the dataset.</ol></li>
    <li><font color="orange">Balanced Dataset:</font>
        <ol>The result is a dataset where the minority class is augmented with synthetic samples, and both classes are cleaned of noisy, ambiguous samples.</ol></li></ol>

In [1]:
import numpy as np
from imblearn.combine import SMOTEENN
from sklearn.datasets import make_classification
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, 
                            n_redundant=10, n_classes=2, weights=[0.9, 0.1], 
                            random_state=42)

# Step 2: Check the class distribution before applying SMOTEENN
print("Class distribution before SMOTEENN:", Counter(y))

# Step 3: Apply SMOTEENN to balance the dataset
smote_enn = SMOTEENN(random_state=42)
X_resampled, y_resampled = smote_enn.fit_resample(X, y)

# Step 4: Check the class distribution after applying SMOTEENN
print("Class distribution after SMOTEENN:", Counter(y_resampled))

# Step 5: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, 
                                                    test_size=0.3, random_state=42)

# Step 6: Train a classifier (RandomForest) on the resampled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Step 7: Evaluate the classifier
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


Class distribution before SMOTEENN: Counter({0: 898, 1: 102})
Class distribution after SMOTEENN: Counter({1: 892, 0: 746})
              precision    recall  f1-score   support

           0       0.97      0.97      0.97       235
           1       0.97      0.97      0.97       257

    accuracy                           0.97       492
   macro avg       0.97      0.97      0.97       492
weighted avg       0.97      0.97      0.97       492



<font color="pink" size=4>Advantages of SMOTEENN:</font>
<ol>
    <li><font color="orange">Hybrid Approach:</font> SMOTEENN combines the benefits of SMOTE (over-sampling) and ENN (under-sampling), leading to a more balanced and cleaner dataset.</li>
    <li><font color="orange">Noise Reduction:</font> The ENN step helps reduce noisy or ambiguous samples, leading to a cleaner dataset and potentially better model performance.</li>
    <li><font color="orange">Improved Performance:</font> By both increasing the minority class size and removing noisy majority class samples, SMOTEENN can help improve the performance of classifiers on imbalanced datasets.</li></ol>

<font color="pink" size=4>Drawbacks of SMOTEENN:</font>
<ol>
    <li><font color="orange">Computational Complexity:</font> The combination of both over-sampling and under-sampling methods can be computationally expensive, especially for large datasets.</li>
    <li><font color="orange">Loss of Information:</font> The ENN step can lead to the removal of potentially useful majority class samples, which might impact the performance in some cases.</li>
    <li><font color="orange">Not Always Effective:</font> If the dataset doesn't have a significant amount of noise or borderline samples, SMOTEENN might not provide substantial benefits over simpler techniques like SMOTE or RandomUnderSampler.</li></ol>