## Support Vector Machine training and evaluation

In this notebook, you will explore and train a Support Vector Machine classifier, using the dataset provided and the features extracted previously.
The Support Vector Machine we'll be using is the one from sklearn.

To get started, we have to import some tools to train, test, save and load the model.

In [None]:
import pickle
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Loading the training data and creating a train testing split

Using the pickle library, we can retrieve the features we extracted in the feature extraction stage.
The format of the data that is stored is (X,y) where:
- X is the list of feature vectors.
- y is the list of labels.
- The label in position y[i] is respective to the feature vector in X[i].

After loading, we create a train_test split. The split is done randomly (we set a random_state to make it deterministic so you can run this cell multiple times), with 20% being set for testing and the remaining for training.
We will use the training for the model training and the testing to evaluate the trained models.

In [None]:
training_vectors = "training_vectors.pkl"  # Path for the .pkl containing the extracted training data

with open(training_vectors, "rb") as f:
    data = pickle.load(f)

# Assuming the pickle contains a tuple: (X, y)
X, y = data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Support Vector Machine Training

Now we create a support vector machine(svm) with set parameters, and then train it using the fit() method.
By playing with the parameters, you can improve the quality of your model, so play with the parameters.
For each model you train, make sure to change the output_model_path, so you save all of them in different files.

In [None]:
output_model_path = "trained_svm_model.pkl"    # Model name

svm = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)  # You can adjust C, kernel, degree, gamma, coef0 and use random_state to have reproducible results
svm.fit(X_train, y_train)

with open(output_model_path, "wb") as f:
    pickle.dump(svm, f)

### Support Vector Machine Evaluation

Now that we have trained some models, you can evaluate them and compare their metrics.
We have accuracy as an example, however we encourage to explore multiple metrics, such as accuracy, precision, recall, f1-score, evaluate the confusion matrix or calculate the AUC.

To do it, we load the model and test it by:
- Using the model to predict the labels of the X_test vector.
- Compare the actual labels (y_test) with the predicted labels (y_pred) through the metrics.

In [None]:
output_model_path = "trained_svm_model.pkl"    # Model name
with open(output_model_path, "rb") as f:
    svm = pickle.load(f)

In [None]:
y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Validation Accuracy: {accuracy:.2%}")