# K-Fold Cross Validation

Use cross_val_score against following
models to measure the performance of each. In the end figure out the model with best performance,
1. Logistic Regression
2. SVM
3. KNN
4. Decision Tree
5. Random Forest

In [32]:
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

In [33]:
# Suppress Warnings for clean notebook
import warnings
warnings.filterwarnings('ignore')

In [34]:
iris = load_iris()

**Logisitic Regression model performance using cross_val_score**

In [35]:
l_scores = cross_val_score(LogisticRegression(), iris.data, iris.target, cv=3)
l_scores

array([0.98, 0.96, 0.98])

In [36]:
np.average(l_scores)

0.9733333333333333

**SVM model performance using cross_val_score**

In [37]:
s_scores = cross_val_score(SVC(), iris.data, iris.target)
s_scores

array([0.96666667, 0.96666667, 0.96666667, 0.93333333, 1.        ])

In [38]:
np.average(s_scores)

0.9666666666666666

### KNN

In [39]:
np.average(cross_val_score(KNeighborsClassifier(n_neighbors=3), iris.data, iris.target))

0.9666666666666668

**Decision Tree**

In [40]:
d_scores = cross_val_score(DecisionTreeClassifier(), iris.data, iris.target)
d_scores

array([0.96666667, 0.96666667, 0.9       , 0.93333333, 1.        ])

In [41]:
np.average(d_scores)

0.9533333333333334

**Random Forest performance using cross_val_score**

In [42]:
r_scores = cross_val_score(RandomForestClassifier(n_estimators=40), iris.data, iris.target)
r_scores

array([0.96666667, 0.96666667, 0.93333333, 0.93333333, 1.        ])

In [43]:
np.average(r_scores)

0.96

### Parameter tuning using k fold cross validation

In [44]:
scores1 = cross_val_score(RandomForestClassifier(n_estimators=5), iris.data, iris.target, cv=10)
np.average(scores1)

0.96

In [45]:
scores2 = cross_val_score(RandomForestClassifier(n_estimators=20), iris.data, iris.target, cv=10)
np.average(scores2)

0.96

In [46]:
scores3 = cross_val_score(RandomForestClassifier(n_estimators=30), iris.data, iris.target, cv=10)
np.average(scores3)

0.96

In [47]:
scores4 = cross_val_score(RandomForestClassifier(n_estimators=40), iris.data, iris.target, cv=10)
np.average(scores4)

0.9533333333333334

Here we used cross_val_score to fine tune our random forest classifier and figured that having around 40 trees in random forest gives best result. 