# K Fold Cross Validation

**In this example we are gonna use K Fold Cross Validation to proper uniform our tests samples in order to better fit the models to all the sample data. After our train and test samples are all "uniformized" we are gonna run 4 different models(Logistic Regression, SVM, Decision Tree and Random Forest) to see with performs better. After that we are gonna try to improve the best model.**

In [1]:
import pandas as pd
from sklearn.datasets import load_iris
import numpy as np
db = load_iris()
dir(db)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [2]:
data = pd.DataFrame(db.data)
data[:2]

Unnamed: 0,0,1,2,3
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2


In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

In [4]:
from sklearn.model_selection import cross_val_score

In [5]:
LR_Model = cross_val_score(LogisticRegression(solver='liblinear',multi_class='ovr'), data, db.target, cv=5)
LR_Model

array([1.        , 0.96666667, 0.93333333, 0.9       , 1.        ])

In [6]:
SVC_Model = cross_val_score(SVC(), data, db.target, cv=5)
SVC_Model

array([0.96666667, 0.96666667, 0.96666667, 0.93333333, 1.        ])

In [7]:
Florest_Model = cross_val_score(RandomForestClassifier(n_estimators=8), data, db.target, cv=5)
Florest_Model

array([0.96666667, 0.96666667, 0.93333333, 0.93333333, 1.        ])

In [8]:
Tree_Model = cross_val_score(DecisionTreeClassifier(), data, db.target, cv=5)
Tree_Model

array([0.96666667, 0.96666667, 0.9       , 0.96666667, 1.        ])

In [9]:
np.average(LR_Model)

0.9600000000000002

In [10]:
np.average(SVC_Model)

0.9666666666666666

In [11]:
np.average(Florest_Model)

0.96

In [12]:
np.average(Tree_Model)

0.9600000000000002

**With an overall score of 96,7% the SVC Model performed a little bit better then it's competitors, let's try now improve this model with some hyperparameter tunning:**

In [13]:
SVC_Model_1 = cross_val_score(SVC(C=1.5, kernel='linear'), data, db.target, cv=5)
SVC_Model_1

array([0.96666667, 1.        , 0.96666667, 0.96666667, 1.        ])

In [14]:
SVC_Model_2 = cross_val_score(SVC(C=2.5, kernel='poly'), data, db.target, cv=5)
SVC_Model_2

array([1.        , 1.        , 0.9       , 0.93333333, 1.        ])

In [15]:
SVC_Model_3 = cross_val_score(SVC(C=0.5, kernel='linear'), data, db.target, cv=5)
SVC_Model_3

array([0.96666667, 1.        , 1.        , 0.96666667, 1.        ])

In [16]:
np.average(SVC_Model_1)

0.9800000000000001

In [17]:
np.average(SVC_Model_2)

0.9666666666666666

In [18]:
np.average(SVC_Model_3)

0.9866666666666667

**With the best model (SVC) we could check for some combinations, and the combination of C=0.5 and kernel as 'linear' was the best one. We improved the model, coming from an avarage score of 96,7% to 98,7%.**