## Support Vector Machine (SVM) com Kernel RBF

O **SVM** (Support Vector Machine) é um algoritmo de aprendizado supervisionado usado tanto para classificação quanto para regressão. Seu objetivo principal é encontrar um hiperplano que melhor separa as classes no espaço de características.

Quando os dados não são linearmente separáveis no espaço original, o SVM utiliza um **kernel** para mapear os dados para um espaço dimensional maior, onde é mais provável encontrar uma separação linear. O kernel **RBF** (Radial Basis Function) é amplamente usado devido à sua capacidade de modelar limites de decisão não lineares.

### Fórmula do Kernel RBF
O kernel RBF é definido como:

\[
K(x_i, x_j) = \exp\left(-\gamma ||x_i - x_j||^2\right)
\]

Onde:
- \(x_i, x_j\): Vetores no espaço de entrada.
- \(||x_i - x_j||^2\): Distância euclidiana entre os vetores \(x_i\) e \(x_j\).
- \(\gamma\): Parâmetro que controla a influência de um único exemplo de treinamento. Valores altos de \(\gamma\) focam em exemplos próximos, enquanto valores baixos consideram exemplos mais distantes.


In [1]:
import numpy as np
import pandas as pd
import optuna

from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from models.utils import f1_score, accuracy, precision, recall
from sklearn.metrics import classification_report

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
data = pd.read_csv('../data/processed/breast.csv')

X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [3]:
param_grid = {
    'C': [2**i for i in range(-5, 16, 2)],
    'gamma': [2**i for i in range(-15, 4, 2)],
}

## SVM Grid Search

In [4]:
grid_search = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

In [5]:
print("Melhores hiperparâmetros (Grid Search):", grid_search.best_params_)
print("Melhor acurácia (Grid Search):", grid_search.best_score_)

Melhores hiperparâmetros (Grid Search): {'C': 128, 'gamma': 3.0517578125e-05}
Melhor acurácia (Grid Search): 0.9520886075949366


In [6]:
y_pred = grid_search.best_estimator_.predict(X_test)
f1 = f1_score(y_test, y_pred)
accuracy_grid = accuracy(y_test, y_pred)
precision_grid = precision(y_test, y_pred)
recall_grid = recall(y_test, y_pred)

In [7]:
print(f'f1_score: {f1}')
print(f'accuracy: {accuracy_grid}')
print(f'precision: {precision_grid}')
print(f'recall: {recall_grid}')

f1_score: 0.9736842105263158
accuracy: 0.9649122807017544
precision: 0.9568965517241379
recall: 0.9910714285714286


## SVM com optuna

In [8]:
def objective(trial):
    C = trial.suggest_loguniform('C', 2**-5, 2**15)
    gamma = trial.suggest_loguniform('gamma', 2**-15, 2**3)

    model = SVC(kernel='rbf', C=C, gamma=gamma)
    
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    return scores.mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

print("\n Melhores hiperparâmetros (Optuna):", study.best_params)
print("\n Melhor acurácia (Optuna):", study.best_value)

best_model = SVC(kernel='rbf', **study.best_params)
best_model.fit(X_train, y_train)
y_pred_optuna = best_model.predict(X_test)

print("\n Relatório de classificação (Optuna):")
print(classification_report(y_test, y_pred_optuna))

[I 2025-01-19 19:53:20,711] A new study created in memory with name: no-name-447735e6-168f-479b-8261-92b0be842399
  C = trial.suggest_loguniform('C', 2**-5, 2**15)
  gamma = trial.suggest_loguniform('gamma', 2**-15, 2**3)
[I 2025-01-19 19:53:20,772] Trial 0 finished with value: 0.6171518987341772 and parameters: {'C': 2.362346090355948, 'gamma': 0.7115885421530479}. Best is trial 0 with value: 0.6171518987341772.
  C = trial.suggest_loguniform('C', 2**-5, 2**15)
  gamma = trial.suggest_loguniform('gamma', 2**-15, 2**3)
[I 2025-01-19 19:53:20,830] Trial 1 finished with value: 0.6171518987341772 and parameters: {'C': 8.445117354972814, 'gamma': 0.19361146649268154}. Best is trial 0 with value: 0.6171518987341772.
  C = trial.suggest_loguniform('C', 2**-5, 2**15)
  gamma = trial.suggest_loguniform('gamma', 2**-15, 2**3)
[I 2025-01-19 19:53:20,895] Trial 2 finished with value: 0.6171518987341772 and parameters: {'C': 2573.0197537171225, 'gamma': 1.6860455304562996}. Best is trial 0 with va


 Melhores hiperparâmetros (Optuna): {'C': 105.78699833878832, 'gamma': 5.352734121119584e-05}

 Melhor acurácia (Optuna): 0.9570886075949367

 Relatório de classificação (Optuna):
              precision    recall  f1-score   support

         0.0       0.98      0.92      0.95        59
         1.0       0.96      0.99      0.97       112

    accuracy                           0.96       171
   macro avg       0.97      0.95      0.96       171
weighted avg       0.97      0.96      0.96       171



In [9]:
best_model = SVC(kernel='rbf', **study.best_params)
best_model.fit(X_train, y_train)
y_pred_optuna = best_model.predict(X_test)

## Métricas optuna

In [10]:
f1 = f1_score(y_test, y_pred_optuna)
accuracy = accuracy(y_test, y_pred_optuna)
precision = precision(y_test, y_pred_optuna)
recall = recall(y_test, y_pred_optuna)

In [11]:
print(f'f1_score: {f1}')
print(f'accuracy: {accuracy}')
print(f'precision: {precision}')
print(f'recall: {recall}')

f1_score: 0.9736842105263158
accuracy: 0.9649122807017544
precision: 0.9568965517241379
recall: 0.9910714285714286
