# K-Folds Validation
As part of this notebook, we will be exploring how to make efficient use of small datasets by utilizing **k-folds validation**. K-folds validation splits a training dataset into multiple small batches. One of these datasets is reserved as the validation dataset 

## Project Setup

In [29]:
# Importing the necessary Python libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [2]:
# Getting the Iris dataset from Scikit-Learn
iris = datasets.load_iris()

In [15]:
# Loading the predictor value (y) and remainder of the training dataset (X) as Pandas DataFrames
X = pd.DataFrame(data = iris['data'], columns = iris['feature_names'])
y = pd.DataFrame(data = iris['target'], columns = ['target'])

## Performing a Typical Split
Before we jump into how we perform k-folds validation, let's do a quick refresher on how we typically split our dataset using a traditional `train_test_split`. Then we'll later contrast this method with k-folds validation.

In [18]:
# Performing a train_test_split on the dataset
X_train, X_val, y_train, y_val = train_test_split(X, y)

In [21]:
# Instantiating a RandomForestClassifier model
rfc_model = RandomForestClassifier()

In [22]:
# Fitting the X_train and y_train datasets to the RandomForestClassifier model
rfc_model.fit(X_train, y_train)

  rfc_model.fit(X_train, y_train)


RandomForestClassifier()

In [23]:
# Getting inferential predictions for the validation dataset
val_preds = rfc_model.predict(X_val)

In [28]:
# Generating validation metrics by comparing the inferential predictions (val_preds) to the actuals (y_val)
val_accuracy = accuracy_score(y_val, val_preds)
val_confusion_matrix = confusion_matrix(y_val, val_preds)

In [30]:
# Printing out the validation metrics
print(f'Accuracy Score: {val_accuracy}')
print(f'Confusion Matrix: \n{val_confusion_matrix}')

Accuracy Score: 0.9210526315789473
Confusion Matrix: 
[[14  0  0]
 [ 0  7  0]
 [ 0  3 14]]


## Training with K-Folds Validation
Now that we have performed a very basic model training using a traditional `train_test_split`, we are now ready to perform a training using k-folds validation.