<a href="https://colab.research.google.com/github/CinthiaNagahama/inteligencia_computacional/blob/main/SVM_scikit-learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating a **Support Vector Machine** (SVM) using Python scikit-learn <br>


---


Tutorial link: <a>https://www.datacamp.com/community/tutorials/svm-classification-scikit-learn-python<a>

We'll be using the cancer dataset. This dataset is computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

The dataset comprises 30 features:
  * mean radius,
  * mean texture, 
  * mean perimeter,
  * mean area, 
  * mean smoothness,
  * mean compactness,
  * mean concavity,
  * mean concave points,
  * mean symmetry,
  * mean fractal dimension,
  * radius error,
  * texture error,
  * perimeter error,
  * area error,
  * smoothness error,
  * compactness error,
  * concavity error,
  * concave points error,
  * symmetry error,
  * fractal dimension error,
  * worst radius,
  * worst texture,
  * worst perimeter,
  * worst area,
  * worst smoothness,
  * worst compactness,
  * worst concavity,
  * worst concave points,
  * worst symmetry, and
  * worst fractal dimension <br>

and a target:
  * type of cancer.

This dataset has two types of cancer classes: malignant and benign and we will build a model to classify the type of cancer.

# Loading the data

In [2]:
# Import scikit-learn datasets library
from sklearn import datasets

# Load dataset
cancer_set = datasets.load_breast_cancer()

# Understanding the dataset

In [None]:
# Print the names of the features
print("Features: ", cancer_set.feature_names)

# Print the label type of cancer('malignant' 'benign')
print("Labels: ", cancer_set.target_names)

# Print the cancer top 5 data and labels (0 : 'malignant' | 1 : 'benign')
print("Data: ", cancer_set.data[0:5], " | ", cancer_set.target[0:5]) 

# Splitting the data

In [7]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split data set into training set (70%) and testing set (30%)
data_train, data_test, target_train, target_test = train_test_split(cancer_set.data, cancer_set.target, test_size=0.3, random_state=109)

# Building the SVM

In [11]:
# Import SVM model
from sklearn import svm

# Create a SVM Classifier
clf = svm.SVC(kernel = "linear")

# Train the model using the training set
clf.fit(data_train, target_train)

# Predict the response for test dataset
target_pred = clf.predict(data_test)

# Evaluating the model

In [14]:
# Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Check how often is the classifier correct
print("Accuracy: ", metrics.accuracy_score(target_test, target_pred))

# Check the ability of the classifier not to label as positive a sample that is negative
print("Precision: ", metrics.precision_score(target_test, target_pred))

# Check the ability of the classifier to find all the positive samples
print("Recall: ", metrics.recall_score(target_test, target_pred))

Accuracy:  0.9649122807017544
Precision:  0.9811320754716981
Recall:  0.9629629629629629
