# K fold cross validation

K-Fold Cross Validation is a technique to assess the performance of a model by splitting the data into multiple folds,
training the model on some folds, and evaluating it on the remaining fold.

The average scores of different models (Logistic Regression, Random Forest, SVM) are calculated using K-Fold Cross Validation
to get a more reliable estimation of their performance.

In [19]:
#Import the necessary libraries

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
import pandas as pd
import numpy as np

In [20]:
#Load the digits dataset

digits=load_digits()

In [21]:
#Create instances for the three different models

model_svc=SVC()
model_logistic=LogisticRegression()
model_RF=RandomForestClassifier()

In [22]:
dir(digits)

['DESCR', 'data', 'feature_names', 'frame', 'images', 'target', 'target_names']

In [23]:
#Define a function to calculate the score of a model

def get_score(model,x_train,x_test,y_train,y_test):
    model.fit(x_train,y_train)
    return model.score(x_test,y_test)

In [24]:
#Perform Stratified K-Fold Cross Validation

from sklearn.model_selection import StratifiedKFold

In [25]:
Kf=StratifiedKFold(n_splits=3)

In [26]:
score_logistics=[]
score_rf=[]
score_svm=[]

In [27]:
#For train-test-split
for train_index,test_index in Kf.split(digits.data,digits.target):
    x_train,x_test,y_train,y_test=digits.data[train_index],digits.data[test_index],digits.target[train_index],digits.target[test_index]
    score_logistics.append((get_score(model_logistic,x_train,x_test,y_train,y_test)))
    score_rf.append((get_score(model_RF,x_train,x_test,y_train,y_test)))
    score_svm.append((get_score(model_svc,x_train,x_test,y_train,y_test)))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

In [28]:
#calculating score for each model
print(score_logistics)
print(score_rf)
print(score_svm)

[0.9215358931552587, 0.9415692821368948, 0.9165275459098498]
[0.9398998330550918, 0.9549248747913188, 0.9198664440734557]
[0.9649415692821369, 0.9799666110183639, 0.9649415692821369]


In [29]:
#Function for calulating avg score of each model
def avg(list):
    avg=sum(list)/len(list)
    return avg

In [30]:
print("Logistic :",avg(score_logistics))
print("RF       :",avg(score_rf))
print("SVM      :",avg(score_svm))

Logistic : 0.9265442404006677
RF       : 0.9382303839732887
SVM      : 0.9699499165275459


# Using API

In [31]:
from sklearn.model_selection import cross_val_score

In [32]:
score_l2=[]
score_rf2=[]
score_svm2=[]

In [33]:
score_l2.append((cross_val_score(model_logistic,digits.data,digits.target,cv=3)))
score_rf2.append((cross_val_score(model_RF,digits.data,digits.target,cv=3)))
score_svm2.append((cross_val_score(model_svc,digits.data,digits.target,cv=10)))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

In [34]:
print(score_l2)
print(score_rf2)
print(score_svm2)

[array([0.92153589, 0.94156928, 0.91652755])]
[array([0.93322204, 0.96160267, 0.93489149])]
[array([0.94444444, 0.98888889, 0.92777778, 0.96666667, 0.98333333,
       0.98888889, 0.98888889, 0.99441341, 0.96089385, 0.95530726])]


In [35]:
print(np.average(score_l2))
print(np.average(score_rf2))
print(np.average(score_svm2))

0.9265442404006677
0.9432387312186977
0.9699503414028554


# K-Fold Cross Vaidation On Iris Data Set


In [36]:
from sklearn.datasets import load_iris

In [37]:
iris=load_iris()

In [38]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn import tree

In [39]:
logistic=LogisticRegression()
Random_forest=RandomForestClassifier()
svm=SVC()
decision_tree=tree.DecisionTreeClassifier()

In [40]:
from sklearn.model_selection import cross_val_score

In [41]:
logistic_score=[]
Random_forest_score=[]
svm_score=[]
decision_tree_score=[]

In [42]:
logistic_score.append(cross_val_score(logistic,iris.data,iris.target,cv=3))
Random_forest_score.append(cross_val_score(Random_forest,iris.data,iris.target,cv=3))
svm_score.append(cross_val_score(svm,iris.data,iris.target,cv=3))
decision_tree_score.append(cross_val_score(decision_tree,iris.data,iris.target,cv=3))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [43]:
print("Logistic_Score     :",logistic_score)
print("Random_forest_score:",Random_forest_score)
print("svm_score          :",svm_score)
print("decision_tree_score:",decision_tree_score)

Logistic_Score     : [array([0.98, 0.96, 0.98])]
Random_forest_score: [array([0.98, 0.94, 0.98])]
svm_score          : [array([0.96, 0.98, 0.94])]
decision_tree_score: [array([0.98, 0.94, 0.98])]


In [44]:
print("Avg Logistic Score",np.average(logistic_score))
print("Avg Random_forest_score",np.average(Random_forest_score))
print("Avg svm_score",np.average(svm_score))
print("Avg decision_tree_score",np.average(decision_tree_score))

Avg Logistic Score 0.9733333333333333
Avg Random_forest_score 0.9666666666666667
Avg svm_score 0.96
Avg decision_tree_score 0.9666666666666667
