**Cross validation** is a technique used in machine learning to evaluate the performance of a model on unseen data.

 It involves dividing the available data into **multiple folds** or subsets, using one of these folds as a validation set, and training the model on the remaining folds.

 This process is repeated multiple times, each time using a **different fold** as the validation set.

The main purpose of cross validation is to prevent **overfitting**, which occurs when a model is trained too well on the training data and performs poorly on new, unseen data.

In [None]:
import numpy as np
x= np.random.random(size=(25,5))
y= np.random.randint(low=0, high=2, size=25)

In [None]:
x

array([[0.70694171, 0.85755582, 0.1668661 , 0.84917179, 0.679359  ],
       [0.05115987, 0.0635987 , 0.61576352, 0.16435614, 0.4448402 ],
       [0.6686221 , 0.45765603, 0.99804345, 0.26090886, 0.50677642],
       [0.64468474, 0.07393022, 0.17446172, 0.69208036, 0.80863243],
       [0.43209156, 0.78362291, 0.91977137, 0.02375418, 0.66648558],
       [0.40851355, 0.70956925, 0.46158207, 0.10671268, 0.48284999],
       [0.50573875, 0.75454238, 0.98629286, 0.01730493, 0.58955159],
       [0.9652771 , 0.78290148, 0.47215798, 0.32194541, 0.16243117],
       [0.889827  , 0.46104248, 0.6029787 , 0.83129087, 0.00115186],
       [0.42267335, 0.88148862, 0.0918693 , 0.81189779, 0.63651084],
       [0.67620134, 0.62218601, 0.1263444 , 0.32291956, 0.52287917],
       [0.97663429, 0.43646731, 0.96823527, 0.77810318, 0.70718614],
       [0.50791908, 0.80473291, 0.96456509, 0.33603748, 0.4022684 ],
       [0.12675379, 0.85662202, 0.55167856, 0.26437727, 0.48203425],
       [0.97411248, 0.92234617, 0.

In [None]:
y

array([1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1,
       1, 0, 0])

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2,random_state=1) #random_state=1/3/4/6/7
knn= KNeighborsClassifier()
knn.fit(x_train,y_train)
y_pred= knn.predict(x_test)
accuracy_score(y_test,y_pred)

0.8

**Differen**t accuracy for **different** random state

To understand how **random_state** **works** so you must first understand **K-Fold** how works

In [None]:
from sklearn.model_selection import KFold

kf= KFold(n_splits=5)
for train_index, test_index in kf.split(x):
  print('train',train_index,'test',test_index)

train [ 5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] test [0 1 2 3 4]
train [ 0  1  2  3  4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] test [5 6 7 8 9]
train [ 0  1  2  3  4  5  6  7  8  9 15 16 17 18 19 20 21 22 23 24] test [10 11 12 13 14]
train [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 20 21 22 23 24] test [15 16 17 18 19]
train [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] test [20 21 22 23 24]


now **overcome** this problem with **cross val score()**

In [None]:
#For classification Problem
from sklearn.model_selection import cross_val_score

knn= KNeighborsClassifier()
cross_val_score(knn, x, y, cv=10, scoring='accuracy')




array([1.        , 0.33333333, 1.        , 1.        , 0.66666667,
       1.        , 1.        , 0.5       , 1.        , 1.        ])

In [None]:
#For Regression Problem
from sklearn.model_selection import cross_val_score

knn= KNeighborsClassifier()
cross_val_score(knn, x, y, cv=10, scoring='r2')



array([ 1. , -2. ,  1. ,  1. , -0.5,  1. ,  1. , -1. ,  1. ,  1. ])

After **selecting** a **CV** value.

In [None]:
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2,random_state=3) #random_state=1/3/4/6/7
knn= KNeighborsClassifier()
knn.fit(x_train,y_train)
y_pred= knn.predict(x_test)
accuracy_score(y_test,y_pred)

1.0