<font color="red" size="6">Techniques for handling imbalanced datasets</font>
<P><font color="yELLOW" size="5"><B>4_SVMSMOTE (SVM Synthetic Minority Over-sampling Technique)</font>

SVMSMOTE is a variation of SMOTE that combines the Support Vector Machine (SVM) algorithm with the SMOTE technique to generate synthetic samples for the minority class. The idea is to use the SVM classifier to identify the support vectors (the most informative points near the decision boundary) and create synthetic samples around these points.

Unlike standard SMOTE, which generates synthetic samples from random minority class samples, SVMSMOTE focuses on generating synthetic samples near the decision boundary and support vectors. This can help improve model performance by creating more meaningful synthetic samples.

<font color="pink" size=4>How SVMSMOTE Works:</font>
<ol>
     <li><font color="orange">Support Vectors Identification</font>
<ol>
       <li>Support Vectors are the data points that lie closest to the decision boundary between classes and have the greatest impact on the decision boundary of the classifier.</li> 
        <li>In SVMSMOTE, SVM is used to identify these support vectors.</li></ol></li> 
       <li><font color="orange">Synthetic Sample Generation:</font>
         Once the support vectors are identified, synthetic samples are generated along the line connecting the support vectors of the minority class and other points, rather than randomly in the feature space.</li> 
      <li><font color="orange"> Focus on Hard-to-Classify Points:</font>
         Since support vectors are hard-to-classify points, SVMSMOTE focuses on generating synthetic samples that lie near these critical points, making it more likely to improve the classifier's ability to generalize.</li></ol>  

In [1]:
import numpy as np
from imblearn.over_sampling import SVMSMOTE
from sklearn.datasets import make_classification
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, 
                            n_redundant=10, n_classes=2, weights=[0.9, 0.1], 
                            random_state=42)

# Step 2: Check the class distribution before applying SVMSMOTE
print("Class distribution before SVMSMOTE:", Counter(y))

# Step 3: Apply SVMSMOTE to oversample the minority class
svm_smote = SVMSMOTE(random_state=42)
X_resampled, y_resampled = svm_smote.fit_resample(X, y)

# Step 4: Check the class distribution after applying SVMSMOTE
print("Class distribution after SVMSMOTE:", Counter(y_resampled))

# Step 5: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, 
                                                    test_size=0.3, random_state=42)

# Step 6: Train a classifier (RandomForest) on the resampled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Step 7: Evaluate the classifier
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


Class distribution before SVMSMOTE: Counter({0: 898, 1: 102})
Class distribution after SVMSMOTE: Counter({0: 898, 1: 898})
              precision    recall  f1-score   support

           0       0.97      0.97      0.97       265
           1       0.97      0.97      0.97       274

    accuracy                           0.97       539
   macro avg       0.97      0.97      0.97       539
weighted avg       0.97      0.97      0.97       539



<font color="pink" size=4>Advantages of SVMSMOTE:</font>
<ol>
    <li><font color="orange">Focuses on Support Vectors:</font> By generating synthetic samples near the support vectors, SVMSMOTE targets the most important samples that are closest to the decision boundary.</li>
    <li><font color="orange">Improved Classifier Performance:</font> By generating more meaningful synthetic samples, SVMSMOTE can help improve the classifier's generalization ability.</li>
    <li><font color="orange">Less Noise:</font> Compared to SMOTE, which randomly generates synthetic samples across the entire minority class, SVMSMOTE focuses on the most informative samples, potentially reducing noise.</li></ol>

<font color="pink" size=4>Drawbacks of SVMSMOTE:</font>
<ol>
    <li><font color="orange">Computational Complexity:</font> SVMSMOTE is computationally expensive because it requires training an SVM classifier to identify the support vectors before generating synthetic samples.</li>
    <li><font color="orange">Risk of Overfitting:</font> By focusing too heavily on the support vectors, there's a risk of overfitting the model to the minority class.</li>
    <li><font color="orange">Requires SVM:</font> This method requires the use of an SVM to find the support vectors, which might not be ideal in all scenarios, especially with large datasets.</li></ol>