# K-Fold cross-validation using HARDy

For cross-validation of a Convolutional Neural Network (CNN), `HARDy` provides user-defined arguments to perform k-fold cross validation. The scheme of data-splitting for k-fold validation is shown in the image below:

<img src="../images/kfoldvalidation.gif" />

For each fold, the data is splitted differently. The red-box shows changing validation set with each k-fold iteration. This enables the model to train on virtually different dataset thus enabling the user to evaluate the model objectively.

To further cross-validate, `HARDy` after providing average accuracy for k-folds, tests the `CNN` on the test set which was never seen by the CNN model.

~~~
The k-fold cross-validation is available for running on `CNN` only
~~~

To utilize k-fold cross-validation in `HARDy`, following steps are required

## Step 1: Defining the path variables

Defining the `.csv` files datapath

~~~
raw_data_path = 'path/to/raw/data/'
~~~

Defining the CNN configuration path. This configuration will be validated using k-folds

~~~
classifier_config_path = './configuration/forCNN/'
~~~

Defining the transformation configuration path

~~~
tform_config_path = './configuration/forTransformation/tform_config.yaml'
~~~

## Step 2: Defining attributes for transformations

~~~
scale = 0.2
target_size = (100, 100)
iterator_mode = 'arrays'
classes = ['class1', 'class1, 'class3']
n_threads = 1
~~~

The details on setting these parameters are available on the `Getting Started` webpage

## Step 3: Defining attributes for k-fold cross-validation

The following parameters are important in using k-fold cross-validation

~~~
k-fold = True             #indicating to use k-fold
k = 5                     #indicating how many folds to use
classifier_mode = 'cnn'   #indicating not to use the hyperparameter tuner
~~~

## Step 4: Running `HARDy`

Following script can then be executed to run the k-fold cross-validation

~~~
run.hardy_main(raw_data_path, tform_config_path, classifier_config_path, batch_size=64,
scale=0.2, num_test_files_class=750, target_size=(100, 100), iterator_mode='arrays',
classifier='cnn', n_threads=1, classes=['class_1', 'class_2', 'class_3'],
k-fold=True, k=5, project_name='my_project_name')
~~~

This will execute and produce an average accuracy for k-fold cross-validation

The run will also produce a report comprising of trained model, model evaluation summary on test set and hyper-parameter space. These reports will be created under `raw_data_path/project_name/transformation_name`

___