# Cross Validation Code example 

In [6]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_score


In [7]:
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [8]:
# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names) 

In [9]:
dataset

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


#### Example of train-test split 
- Split into 2 groups 

In [10]:
X = dataset.drop(['Class'], axis=1)
y = dataset['Class']

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,random_state=42)  

**Stratified** - Percentage of data according to class of data is preserved

- Here the data is shuffled and split into 2 groups according to the test_size percentage stated. Also  the data is split in a stratified fashion. 
- All of this can be seen from the data below


In [12]:
X_train.head(5)

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width
81,5.5,2.4,3.7,1.0
133,6.3,2.8,5.1,1.5
137,6.4,3.1,5.5,1.8
75,6.6,3.0,4.4,1.4
109,7.2,3.6,6.1,2.5


In [13]:
X_test.head(5)

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width
73,6.1,2.8,4.7,1.2
18,5.7,3.8,1.7,0.3
118,7.7,2.6,6.9,2.3
78,6.0,2.9,4.5,1.5
76,6.8,2.8,4.8,1.4


#### Running Machine Learning Model 

In [14]:
classifier = KNeighborsClassifier(n_neighbors=1)  
classifier.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=1)

In [15]:
y_pred = classifier.predict(X_test)  


In [16]:
acc = accuracy_score(y_test, y_pred)
acc

1.0

# Example of k-Fold Cross Validation 

Using the machine learning model above, we are using the cross validation function to split and test the data.

In [17]:
scores = cross_val_score(classifier, X, y, cv=5)

In [18]:
print("Cross Validation Scores:", scores)
print("Average cross validataion score",scores.mean())

Cross Validation Scores: [0.96666667 0.96666667 0.93333333 0.93333333 1.        ]
Average cross validataion score 0.96


#### Exercise 1
- Test the model with reducing the `cv` to `1`
- Test the model with icreasing the `cv` to `100`
- Test the model with icreasing the `cv` to `50`
- Explain what happens in each case

In [19]:
scores_1 = cross_val_score(classifier,X,y, cv= 1)

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1.

In [20]:
scores_2 = cross_val_score(classifier,X,y, cv= 100)

ValueError: n_splits=100 cannot be greater than the number of members in each class.

In [23]:
scores_3 = cross_val_score(classifier,X,y, cv= 50)


In [24]:
print("Cross Validation Scores:", scores_3)
print("Average cross validataion score",scores_3.mean())

Cross Validation Scores: [1.         1.         1.         1.         1.         1.
 0.66666667 1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         0.66666667 0.66666667 1.         0.66666667 1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.33333333 1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.        ]
Average cross validataion score 0.96


# Simple video explaining Cross Validation
-[K-Fold Cross Validation](https://www.youtube.com/watch?v=TIgfjmp-4BA&ab_channel=Udacity)

### Reference:
[Scikit-Learn Cross Validation](https://scikit-learn.org/stable/modules/cross_validation.html)