# k-Fold Cross Validation

One problem that is often encountered when making machine learning models is the _variance problem_. This occurs when a model has different accuracy values depending on which set of data was used for training. The __k-Fold Cross Validation__ method helps estimate these different accuracy values with limited bias and optimism. 

## Algorithm:

* Split the training set into _k_ sections.
* For each section, use the section as a test set and train the model on all other sections.
* Summarize the findings for the _k_ iterations using a specified sample of model evaluation scores.

<hr>

## Code

__Performing Gaussian Kernel SVM Classification:__

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Advertisements.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting classifier to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

print(y_test)
print(y_pred)
cm

[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0
 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 1 1]
[0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1]


array([[64,  4],
       [ 3, 29]], dtype=int64)

<hr>

__Performing 10-Fold Cross Validation:__

In [2]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)

print(accuracies)
print('Mean: ' + str(accuracies.mean()))
print('Standard Deviation: ' + str(accuracies.std()))

[0.8        0.96666667 0.8        0.96666667 0.86666667 0.86666667
 0.9        0.93333333 1.         0.93333333]
Mean: 0.9033333333333333
Standard Deviation: 0.06574360974438671


__Results:__ Based on the given training data, the model is most likely to reach 90 percent accuracy with a standard deviation of 6 percent. This means the model has a high accuracy and low variance, which makes it an optimal machine learning model. 