<a href="https://colab.research.google.com/github/davibarbosabdj/Aulas-de-POO-/blob/main/Exercicios_SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Classificação com k-Nearest Neighbors (KNN) usando Scikit-Learn**

O k-Nearest Neighbors (KNN) é um algoritmo de aprendizado de máquina utilizado para problemas de classificação e regressão. No contexto de classificação, o KNN classifica um ponto de dados com base na maioria das classes dos k vizinhos mais próximos. O SVM é uma técnica poderosa para classificação, especialmente quando há uma clara separação entre as classes. A escolha do kernel e outros parâmetros é essencial para adaptar o modelo aos dados.

## Scikit-Learn e KNN

Scikit-Learn oferece uma implementação eficiente do KNN através da classe `KNeighborsClassifier` para problemas de classificação.

```python
# Importando as bibliotecas necessárias
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Criando dados de exemplo
X = np.array([[1], [2], [3], [4]])   # Variáveis independentes
y = np.array([0, 0, 1, 1])             # Variável dependente (classes)

# Criando e treinando o modelo KNN
modelo = KNeighborsClassifier(n_neighbors=3)
modelo.fit(X, y)


### **Importando Bibliotecas**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV

### **1º Exercicio - Iris**

In [None]:
iris= sns.load_dataset('iris')

In [None]:
iris.keys()

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

In [None]:
iris['species']

0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: species, Length: 150, dtype: object

In [None]:
df_target=pd.DataFrame(iris['species'],columns=['species'])
df_target.head()

Unnamed: 0,species
0,setosa
1,setosa
2,setosa
3,setosa
4,setosa


#### **Train Test Split**


In [None]:
X= iris.drop('species',axis=1)
y =iris['species']

In [None]:
X_train, X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=101)

In [None]:
model = SVC()
model.get_params()

{'C': 1.0,
 'break_ties': False,
 'cache_size': 200,
 'class_weight': None,
 'coef0': 0.0,
 'decision_function_shape': 'ovr',
 'degree': 3,
 'gamma': 'scale',
 'kernel': 'rbf',
 'max_iter': -1,
 'probability': False,
 'random_state': None,
 'shrinking': True,
 'tol': 0.001,
 'verbose': False}

In [None]:
model.fit(X_train, y_train)

In [None]:
predictions  = model.predict(X_test)

In [None]:
print(confusion_matrix(y_test, predictions))

[[13  0  0]
 [ 0 19  1]
 [ 0  0 12]]


In [None]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        13
  versicolor       1.00      0.95      0.97        20
   virginica       0.92      1.00      0.96        12

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



In [None]:
param_grid = {'C':[0.1,1,10,100,1000], 'gamma':[1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf']}

In [None]:
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)

In [None]:
grid.fit(X_train, y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.905 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=1.000 total time=   0.0s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.905 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.905 total time=   0.0s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.952 total time=   0.0s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.857 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.714 total time=   0.0s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.857 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.810 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.810 total time=   0.0s
[CV 1/5] END .....C=0.1, gamma=0.01, kernel=rbf;, score=0.714 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=0.01, kernel=rbf

In [None]:
grid.best_params_

{'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}

In [None]:
grid.best_estimator_

In [None]:
grid_predctions = grid.predict(X_test)

In [None]:
print(confusion_matrix(y_test, grid_predctions))

[[13  0  0]
 [ 0 19  1]
 [ 0  0 12]]


In [None]:
print(classification_report(y_test, grid_predctions))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        13
  versicolor       1.00      0.95      0.97        20
   virginica       0.92      1.00      0.96        12

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



### **2º Exercicio - Cancer**

In [None]:
data = load_breast_cancer()

In [None]:
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [None]:
df_target = pd.DataFrame(data['target'], columns = ['Cancer'])

In [None]:
df_target

Unnamed: 0,Cancer
0,0
1,0
2,0
3,0
4,0
...,...
564,0
565,0
566,0
567,0


In [None]:
data_f = pd.DataFrame(data = data.data,columns = data.feature_names)

data_f.columns

Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension'],
      dtype='object')

#### **Train Test Split**


In [None]:
X_train, X_test, y_train, y_test = train_test_split(data_f, np.ravel(df_target), test_size=0.30, random_state=101)

In [None]:
model = SVC()

In [None]:
model.fit(X_train, y_train)

In [None]:
predictions  = model.predict(X_test)

In [None]:
print(confusion_matrix(y_test, predictions))

[[ 56  10]
 [  3 102]]


In [None]:
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.95      0.85      0.90        66
           1       0.91      0.97      0.94       105

    accuracy                           0.92       171
   macro avg       0.93      0.91      0.92       171
weighted avg       0.93      0.92      0.92       171



In [None]:
param_grid = {'C':[0.1,1,10,100,1000], 'gamma':[1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf']}

In [None]:
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)

In [None]:
grid.fit(X_train, y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.625 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.637 total time=   0.0s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.625 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.633 total time=   0.0s
[CV 1/5] END .....C=0.1, gamma=0.01, kernel=rbf;, score=0.637 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=0.01, kernel=rbf

In [None]:
grid.best_params_

{'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}

In [None]:
grid.best_estimator_

In [None]:
grid_predctions = grid.predict(X_test)

In [None]:
print(confusion_matrix(y_test, grid_predctions))

[[ 59   7]
 [  4 101]]


In [None]:
print(classification_report(y_test, grid_predctions))

              precision    recall  f1-score   support

           0       0.94      0.89      0.91        66
           1       0.94      0.96      0.95       105

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.93       171
weighted avg       0.94      0.94      0.94       171

