# K-fold cross validation

Implement a random k-fold cross validation algorithm from scratch.

Your algorithm should:
- load the iris dataset and split its columns into features and target
- split the dataset into k-fold to perform cross validation

You can use the code bellow to implement your algorithm or implement yourself from scratch.



In [11]:
# we will implement a k-fold cross validation from scratch
# we will use the iris dataset

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.utils import shuffle

# load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

def k_fold_cross_validation(X, y, k, model):
    # X is the data
    # y is the target
    # k is the number of folds
    # model is the model to use  
    # we will return the accuracy of the model
    # we will use the accuracy as a metric
    
    #################################
    # shuffle the data and create X and y ready to be used to fit the model.
     
    # In the context of machine learning, it is essential to shuffle data,
    # to ensure that the model does not learn any order or sequence biases that may exist in the dataset.
    X, y = shuffle(X, y, random_state=42)
    # in a way that if I say X[0] the algorithm will return the first fold  of the data, the same for y
    fold_size = len(X) // k
    folds_X = [X[i * fold_size : (i + 1) * fold_size] for i in range(k)]
    folds_y = [y[i * fold_size : (i + 1) * fold_size] for i in range(k)]
    #################################
    
    # we will need to define a for loop to iterate over the folds and guarantee that each fold is used as a test set at least once
    # inside this for loop we will call the functions fit and accuracy for each one of the folds
    # X_train, y_train, X_test, y_test are build each time the for loop is called by using X and y divided before
    
    accuracies = []
    #######Your for loop here
    for i in range(k):
    # Your code to define X_train, y_train, X_test, y_test
    # create train and test sets for this fold
        X_test = folds_X[i]
        y_test = folds_y[i]
        
        # concatenate remaining folds for training
        X_train = np.concatenate(folds_X[:i] + folds_X[i+1:])
        y_train = np.concatenate(folds_y[:i] + folds_y[i+1:])
        
    # we will fit the model on the train data
        model.fit(X_train, y_train)
        
        # we will predict on the test data
        y_pred = model.predict(X_test)
        
        # we will compute the accuracy
        accuracy = np.mean(y_pred == y_test)
        
        # we will append the accuracy to the list
        accuracies.append(accuracy)
    
    # we will return the mean accuracy
    return np.mean(accuracies)
    


In [12]:
#You can use the code below to test your function

#import the random forest model
from sklearn.ensemble import RandomForestClassifier

# we will use the random forest model
model = RandomForestClassifier()

# we will use the k_fold_cross_validation function
k_fold_cross_validation(X, y, 5, model)

0.9600000000000002