## An Empirical Study on the Effectiveness of Transferring Software Performance Prediction Machine Learning Models Between Compile-Time Configurations via Linear Transformation

### By Harry Spiers, supervised by Dr. Tao Chen

Contained within this notebook is the code to run the experiments laid out in the project report. The cells are laid out in the following format:

1. Initialisation cell - Contains code to import all required modules, initialise results tables, and select datasets
1. Research question 1 cell
1. Research question 2 cell
1. Research question 3 cell

To run the experiments, simply run the initialisation cell first and then run the research question cells in any order using the run button (see below). Output for the experiments will be written to `<project-root>/results`

<img src="https://i.imgur.com/Hlv9pNb.png" align="left"><br><br>

Note - If the code won't run, then ensure that you have installed the virtual environment correctly. You should see a little .venv string like this one in the top right corner of the screen:

<img src="https://i.imgur.com/oI0FRz6.png" align="left">

In [None]:
import os

import pandas as pd

import constants
from data import Dataset, split_transfer_dataset, get_random_datasets
from learner import PredictorLearner, TransferLearner
from analysis import *

# define training set sizes to be used in experiments
TRAINING_SET_SIZES = [0.2, 0.4, 0.6, 0.8]
SUBJECT_SYSTEMS = constants.SUBJECT_SYSTEMS
REPETITIONS = constants.EXPERIMENT_REPS

print('Initialising results tables')
# initialise results columns
rq1_results_fields = constants.RESULTS_DATAFRAME_COLUMN_NAMES[0]
rq1_results = pd.DataFrame(columns=rq1_results_fields)
rq2_results_fields = constants.RESULTS_DATAFRAME_COLUMN_NAMES[1]
rq2_results = pd.DataFrame(columns=rq2_results_fields)
rq3_results_fields = constants.RESULTS_DATAFRAME_COLUMN_NAMES[2]
rq3_results = pd.DataFrame(columns=rq3_results_fields)
print('Results tabled initialised\n')

print('Selecting random datasets')
# randomly select a source and target dataset for each subject system
datasets = get_random_datasets(reproducibility_mode=True)
print('Datasets selected')

## Research Question 1

### How much accuracy is lost when transferring a performance prediction model between compile-time configurations via linear transformation compared to training a new model?<br/>

**Null hypothesis – It will be more accurate to train a new model for each compile-time configuration**

**Alternative hypothesis – It will be more accurate to train a train a transfer model for each compile-time configuration**<br/><br/>

This can be thought of as the main question that we wish to answer with this project. We’d like to know how the accuracy compares if you make use of transfer learning between compile-time configurations rather than learning a whole new model. This is important to know as, if there is a big loss in accuracy, then the transfer learning approach may be infeasible.

For this research question, we record the following measurements:

| Variable name             	| Measurement                                                             	|
|---------------------------	|-------------------------------------------------------------------------	|
| mape_accuracy_pred_no_cv  	| MAPE accuracy of predictor approach without hyperparameter optimisation 	|
| mape_accuracy_trans_no_cv 	| MAPE accuracy of transfer approach without hyperparameter optimisation  	|
| mape_accuracy_pred_cv     	| MAPE accuracy of predictor approach with hyperparameter optimisation    	|
| mape_accuracy_trans_cv    	| MAPE accuracy of transfer approach with hyperparameter optimisation     	|
| mse_accuracy_pred_no_cv   	| MSE accuracy of predictor approach without hyperparameter optimisation  	|
| mse_accuracy_trans_no_cv  	| MSE accuracy of transfer approach without hyperparameter optimisation   	|
| mse_accuracy_pred_cv      	| MSE accuracy of predictor approach with hyperparameter optimisation     	|
| mse_accuracy_trans_cv     	| MSE accuracy of transfer approach with hyperparameter optimisation      	|

We calculate the p-value and effect size between `mape_accuracy_pred_cv` and `mape_accuracy_trans_cv`

In [None]:
### RQ1

predictor = PredictorLearner()
transferrer = TransferLearner()

for subject_system in SUBJECT_SYSTEMS:
    print('***********************************\nSubject system:', subject_system)
    
    # delete the results of previous subject system's test before entering next subject system's results
    mape_accuracy_pred_no_cv = []
    mape_accuracy_trans_no_cv = []
    mape_accuracy_pred_cv = []
    mape_accuracy_trans_cv = []
    mse_accuracy_pred_no_cv = []
    mse_accuracy_trans_no_cv = []
    mse_accuracy_pred_cv = []
    mse_accuracy_trans_cv = []

    # get the randomly selected target and source datasets for the current subject system
    subject_system_datasets = datasets[subject_system]

    for rep in range(REPETITIONS):
        print('Experiment repetition', rep+1)
        # grab a target and source dataset for this experiment repetition
        src_dataset, tgt_dataset = subject_system_datasets[rep]
        print('Source dataset path:', src_dataset.get_csv_path())
        print('Target dataset path:', tgt_dataset.get_csv_path(),'\n')

        # get optimised predictor model using hyperparameter optimisation
        X_train, X_validate, y_train, y_validate = tgt_dataset.get_split_dataset()
        optimised_model = predictor.get_optimal_params(X_validate, y_validate)

        # get accuracy of optimised model for predictor learner
        predictor.fit(X_train, y_train, premade_model=optimised_model)
        mape_accuracy_pred_cv.append(predictor.get_error(X_train, y_train, measure='mape'))
        mse_accuracy_pred_cv.append(predictor.get_error(X_train, y_train, measure='mse'))

        # get accuracy of non-optimised model for predictor learner
        predictor.fit(X_train, y_train)
        mape_accuracy_pred_no_cv.append(predictor.get_error(X_train, y_train, measure='mape'))
        mse_accuracy_pred_no_cv.append(predictor.get_error(X_train, y_train, measure='mse'))


        # get optimised transfer model using hyperparameter optimisation
        X_train, X_validate, y_train, y_validate = split_transfer_dataset(src_dataset, tgt_dataset)
        optimised_model = transferrer.get_optimal_params(X_validate, y_validate)

        # get accuracy of optimised model for transfer learner
        transferrer.fit(X_train, y_train, premade_model=optimised_model)
        mape_accuracy_trans_cv.append(transferrer.get_error(X_train, y_train, measure='mape'))
        mse_accuracy_trans_cv.append(predictor.get_error(X_train, y_train, measure='mse'))
        make_transfer_model_scatter_plot(transferrer.get_model(), 
                                         X_train, 
                                         y_train, 
                                         1, 
                                         mape_accuracy_trans_cv[-1], 
                                         rep+1, 
                                         subject_system)

        # get accuracy of non-optimised model for transfer learner
        transferrer.fit(X_train, y_train)
        mape_accuracy_trans_no_cv.append(transferrer.get_error(X_train, y_train, measure='mape'))
        mse_accuracy_trans_no_cv.append(predictor.get_error(X_train, y_train, measure='mse'))


    rq1_results['mse_accuracy_tgt_no_cv'] = mse_accuracy_pred_no_cv
    rq1_results['mape_accuracy_tgt_no_cv'] = mape_accuracy_pred_no_cv
    rq1_results['mse_accuracy_tgt_cv'] = mse_accuracy_pred_cv
    rq1_results['mape_accuracy_tgt_cv'] = mape_accuracy_pred_cv
    rq1_results['mse_accuracy_trans_no_cv'] = mse_accuracy_trans_no_cv
    rq1_results['mape_accuracy_trans_no_cv'] = mape_accuracy_trans_no_cv
    rq1_results['mse_accuracy_trans_cv'] = mse_accuracy_trans_cv
    rq1_results['mape_accuracy_trans_cv'] = mape_accuracy_trans_cv
    
    p_value = [get_wilcoxon_p_value(mape_accuracy_pred_cv, mape_accuracy_trans_cv)]
    cliffs_delta = [get_cliffs_delta(mape_accuracy_pred_cv, mape_accuracy_trans_cv)]
    print('Wilcoxon p value:', p_value)
    print('Cliff\'s delta:', cliffs_delta, '\n')

    save_results(rq1_results, subject_system.lower(), p_value, cliffs_delta, 1)

    # reset dataframe after results are saved to csv
    rq1_results = pd.DataFrame(columns=rq1_results.columns)
    
make_box_plots(1)
write_mean_min_max(1)

## Research Question 2

### How does the size of the training dataset impact the accuracy of our transfer model?<br/>

**Null hypothesis – The size of the training dataset will have no effect on the accuracy of the transfer model**

**Alternative hypothesis – The size of the dataset will have an effect on the transfer model’s accuracy**<br/><br/>

Research has shown that the accuracy of performance prediction models generally increases as the size of the training set increases [1, 4]. Valov et. al [14] also found that their transfer model for transferring performance prediction models between hardware environments’ accuracy increased with the training set size. As such, I believe that it is worth testing how much my transfer model’s accuracy depends on the training set size. It is also important to test how much data is required to attain an acceptable level of accuracy (if such accuracy is possible).

For this research question, we record the following measurements:

| Variable name             	| Measurement                                                            	|
|---------------------------	|------------------------------------------------------------------------	|
| mape_accuracy_pred_20pct  	| MAPE accuracy of predictor approach with 20% training set size         	|
| mape_accuracy_pred_40pct  	| MAPE accuracy of predictor approach with 40% training set size 	        |
| mape_accuracy_pred_60pct  	| MAPE accuracy of predictor approach with 60% training set size   	        |
| mape_accuracy_pred_80pct  	| MAPE accuracy of predictor approach with 80% training set size    	    |
| mape_accuracy_trans_20pct 	| MAPE accuracy of transfer approach with 20% training set size 	        |
| mape_accuracy_trans_40pct 	| MAPE accuracy of transfer approach with 40% training set size  	        |
| mape_accuracy_trans_60pct 	| MAPE accuracy of transfer approach with 60% training set size    	        |
| mape_accuracy_trans_80pct 	| MAPE accuracy of transfer approach with 80% training set size     	    |
| mse_accuracy_pred_20pct   	| MSE accuracy of predictor approach with 20% training set size             |
| mse_accuracy_pred_40pct   	| MSE accuracy of predictor approach with 40% training set size             |
| mse_accuracy_pred_60pct   	| MSE accuracy of predictor approach with 60% training set size             |
| mse_accuracy_pred_80pct   	| MSE accuracy of predictor approach with 80% training set size             |
| mse_accuracy_trans_20pct  	| MSE accuracy of transfer approach with 20% training set size              |
| mse_accuracy_trans_40pct  	| MSE accuracy of transfer approach with 40% training set size              |
| mse_accuracy_trans_60pct  	| MSE accuracy of transfer approach with 60% training set size              |
| mse_accuracy_trans_80pct  	| MSE accuracy of transfer approach with 80% training set size              |

We calculate the p-value and effect size between `mape_accuracy_trans_80pct` and `mape_accuracy_trans_20pct`

In [None]:
### RQ2

transferrer = TransferLearner()
predictor = PredictorLearner()

for subject_system in SUBJECT_SYSTEMS:
    print('***********************************\nSubject system:', subject_system)    
    # delete the results of previous subject system's test before entering next subject system's results
    mape_accuracy_pred_20pct = []
    mape_accuracy_pred_40pct = []
    mape_accuracy_pred_60pct = []
    mape_accuracy_pred_80pct = []
    mape_accuracy_trans_20pct = []
    mape_accuracy_trans_40pct = []
    mape_accuracy_trans_60pct = []
    mape_accuracy_trans_80pct = []
    mse_accuracy_pred_20pct = []
    mse_accuracy_pred_40pct = []
    mse_accuracy_pred_60pct = []
    mse_accuracy_pred_80pct = []
    mse_accuracy_trans_20pct = []
    mse_accuracy_trans_40pct = []
    mse_accuracy_trans_60pct = []
    mse_accuracy_trans_80pct = []

    # get the randomly selected target and source datasets for the current subject system
    subject_system_datasets = datasets[subject_system]

    for rep in range(REPETITIONS):
        print('Experiment repetition', rep+1)
        # grab a target and source dataset for this experiment repetition
        src_dataset, tgt_dataset = subject_system_datasets[rep]
        print('Source dataset path:', src_dataset.get_csv_path())
        print('Target dataset path:', tgt_dataset.get_csv_path(),'\n')
        
        X_train, X_validate, y_train, y_validate = tgt_dataset.get_split_dataset()
        optimised_pred_model = predictor.get_optimal_params(X_validate, y_validate)
        
        X_train, X_validate, y_train, y_validate = split_transfer_dataset(src_dataset, tgt_dataset)
        optimised_trans_model = transferrer.get_optimal_params(X_validate, y_validate)

        for train_size in TRAINING_SET_SIZES:
            X_train, X_validate, y_train, y_validate = tgt_dataset.get_split_dataset(train_size=train_size)

            # get accuracy of predictor model for current training set size
            predictor.fit(X_train, y_train, premade_model=optimised_pred_model)
            mape_accuracy_pred = predictor.get_error(X_train, y_train, measure='mape')
            mse_accuracy_pred = predictor.get_error(X_train, y_train, measure='mse')


            X_train, X_validate, y_train, y_validate = split_transfer_dataset(src_dataset, 
                                                      tgt_dataset, 
                                                      train_size=train_size)
            # get accuracy of transfer model for each training set size
            transferrer.fit(X_train, y_train, premade_model=optimised_trans_model)
            mape_accuracy_trans = transferrer.get_error(X_train, y_train, measure='mape')
            mse_accuracy_trans = transferrer.get_error(X_train, y_train, measure='mse')


            # record accuracy in appropriate results column
            if train_size == 0.2:
                mape_accuracy_pred_20pct.append(mape_accuracy_pred)
                mape_accuracy_trans_20pct.append(mape_accuracy_trans)
                mse_accuracy_pred_20pct.append(mse_accuracy_pred)
                mse_accuracy_trans_20pct.append(mse_accuracy_trans)
                make_transfer_model_scatter_plot(transferrer.get_model(), 
                                 X_train, 
                                 y_train, 
                                 2, 
                                 mape_accuracy_trans_20pct[-1], 
                                 rep+1, 
                                 subject_system,
                                 dataset_size=train_size*100)
            elif train_size == 0.4:
                mape_accuracy_pred_40pct.append(mape_accuracy_pred)
                mape_accuracy_trans_40pct.append(mape_accuracy_trans)
                mse_accuracy_pred_40pct.append(mse_accuracy_pred)
                mse_accuracy_trans_40pct.append(mse_accuracy_trans)
                make_transfer_model_scatter_plot(transferrer.get_model(), 
                                 X_train, 
                                 y_train, 
                                 2, 
                                 mape_accuracy_trans_40pct[-1], 
                                 rep+1, 
                                 subject_system,
                                 dataset_size=train_size*100)
            elif train_size == 0.6:
                mape_accuracy_pred_60pct.append(mape_accuracy_pred)
                mape_accuracy_trans_60pct.append(mape_accuracy_trans)
                mse_accuracy_pred_60pct.append(mse_accuracy_pred)
                mse_accuracy_trans_60pct.append(mse_accuracy_trans)
                make_transfer_model_scatter_plot(transferrer.get_model(), 
                                 X_train, 
                                 y_train, 
                                 2, 
                                 mape_accuracy_trans_60pct[-1], 
                                 rep+1, 
                                 subject_system,
                                 dataset_size=train_size*100)
            elif train_size == 0.8:
                mape_accuracy_pred_80pct.append(mape_accuracy_pred)
                mape_accuracy_trans_80pct.append(mape_accuracy_trans)
                mse_accuracy_pred_80pct.append(mse_accuracy_pred)
                mse_accuracy_trans_80pct.append(mse_accuracy_trans)
                make_transfer_model_scatter_plot(transferrer.get_model(), 
                                 X_train, 
                                 y_train, 
                                 2, 
                                 mape_accuracy_trans_80pct[-1], 
                                 rep+1, 
                                 subject_system,
                                 dataset_size=train_size*100)


    rq2_results['mape_accuracy_pred_20pct'] = mape_accuracy_pred_20pct
    rq2_results['mape_accuracy_pred_40pct'] = mape_accuracy_pred_40pct
    rq2_results['mape_accuracy_pred_60pct'] = mape_accuracy_pred_60pct
    rq2_results['mape_accuracy_pred_80pct'] = mape_accuracy_pred_80pct
    rq2_results['mse_accuracy_pred_20pct'] = mse_accuracy_pred_20pct
    rq2_results['mse_accuracy_pred_40pct'] = mse_accuracy_pred_40pct
    rq2_results['mse_accuracy_pred_60pct'] = mse_accuracy_pred_60pct
    rq2_results['mse_accuracy_pred_80pct'] = mse_accuracy_pred_80pct
    rq2_results['mape_accuracy_trans_20pct'] = mape_accuracy_trans_20pct
    rq2_results['mape_accuracy_trans_40pct'] = mape_accuracy_trans_40pct
    rq2_results['mape_accuracy_trans_60pct'] = mape_accuracy_trans_60pct
    rq2_results['mape_accuracy_trans_80pct'] = mape_accuracy_trans_80pct
    rq2_results['mse_accuracy_trans_20pct'] = mse_accuracy_trans_20pct
    rq2_results['mse_accuracy_trans_40pct'] = mse_accuracy_trans_40pct
    rq2_results['mse_accuracy_trans_60pct'] = mse_accuracy_trans_60pct
    rq2_results['mse_accuracy_trans_80pct'] = mse_accuracy_trans_80pct
    
    p_value = [get_wilcoxon_p_value(mape_accuracy_trans_80pct, mape_accuracy_trans_20pct)]
    cliffs_delta = [get_cliffs_delta(mape_accuracy_trans_80pct, mape_accuracy_trans_20pct)]
    print('Wilcoxon p value:', p_value)
    print('Cliff\'s delta:', cliffs_delta, '\n')
    
    save_results(rq2_results, subject_system.lower(), p_value, cliffs_delta, 2)
    
    # reset dataframe after results are saved to csv
    rq2_results = pd.DataFrame(columns=rq2_results.columns)
    
make_box_plots(2)
write_mean_min_max(2)

## Research Question 3

### How does the training time for a transfer model compare to training a new predictor model for each compile-time configuration?<br/>

**Null hypothesis – The training time for training a new predictor model for each compile-time configuration will be faster than training a transfer model for each compile-time configuration**

**Alternative hypothesis – The training time for training a transfer model for each compile-time configuration will be faster than training a new predictor model for each compile-time configuration**<br/><br/>

This question will test how feasible our approach is in terms of training time. For the most part, training a performance prediction model doesn’t take very long, usually under 100ms. This is because the size of the required input data to achieve good accuracy is small, usually around 15-100 measured configurations depending on the quality of the training data. However, I believe that it is worth testing how long it takes compared to learning a predictor new model each time because if a user needs to test lots of compile-time configurations, then it could be more efficient to learn a transfer model rather than a predictor model. By answering this question, readers can get an idea for whether it’d be more beneficial for them to make use of transfer learning, or just learn a new model for each compile-time configuration.

For this research question, we record the following measurements (in milliseconds):

| Variable name             	| Measurement                                                            	|
|---------------------------	|------------------------------------------------------------------------	|
| training_time_pred_20pct_no_cv  	| Training time of predictor approach with 20% training set size without hyperparameter optimisation         	|
| training_time_pred_40pct_no_cv  	| Training time of predictor approach with 40% training set size without hyperparameter optimisation 	        |
| training_time_pred_60pct_no_cv  	| Training time of predictor approach with 60% training set size without hyperparameter optimisation   	        |
| training_time_pred_80pct_no_cv  	| Training time of predictor approach with 80% training set size without hyperparameter optimisation    	    |
| training_time_pred_20pct_cv 	| Training time of predictor approach with 20% training set size with hyperparameter optimisation 	        |
| training_time_pred_40pct_cv 	| Training time of predictor approach with 40% training set size with hyperparameter optimisation  	        |
| training_time_pred_60pct_cv 	| Training time of predictor approach with 60% training set size with hyperparameter optimisation    	        |
| training_time_pred_80pct_cv 	| Training time of predictor approach with 80% training set size with hyperparameter optimisation     	    |
| training_time_trans_20pct_no_cv   	| Training time of transfer approach with 20% training set size without hyperparameter optimisation             |
| training_time_trans_40pct_no_cv   	| Training time of transfer approach with 40% training set size without hyperparameter optimisation             |
| training_time_trans_60pct_no_cv   	| Training time of transfer approach with 60% training set size without hyperparameter optimisation             |
| training_time_trans_80pct_no_cv   	| Training time of transfer approach with 80% training set size without hyperparameter optimisation             |
| training_time_trans_20pct_cv  	| Training time of transfer approach with 20% training set size with hyperparameter optimisation              |
| training_time_trans_40pct_cv  	| Training time of transfer approach with 40% training set size with hyperparameter optimisation              |
| training_time_trans_60pct_cv  	| Training time of transfer approach with 60% training set size with hyperparameter optimisation              |
| training_time_trans_80pct_cv  	| Training time of transfer approach with 80% training set size with hyperparameter optimisation              |

We calculate the p-value and effect size between `mape_accuracy_trans_80pct` and `mape_accuracy_trans_20pct`, and `training_time_pred_80pct_cv` and `training_time_trans_80pct_cv`

In [None]:
### RQ3

transferrer = TransferLearner()
predictor = PredictorLearner()

for subject_system in SUBJECT_SYSTEMS:
    print('***********************************\nSubject system:', subject_system)
    # delete the results of previous subject system's test before entering next subject system's results
    training_time_pred_20pct_no_cv = []
    training_time_pred_40pct_no_cv = []
    training_time_pred_60pct_no_cv = []
    training_time_pred_80pct_no_cv = []
    training_time_pred_20pct_cv = []
    training_time_pred_40pct_cv = []
    training_time_pred_60pct_cv = []
    training_time_pred_80pct_cv = []
    training_time_trans_20pct_no_cv = []
    training_time_trans_40pct_no_cv = []
    training_time_trans_60pct_no_cv = []
    training_time_trans_80pct_no_cv = []
    training_time_trans_20pct_cv = []
    training_time_trans_40pct_cv = []
    training_time_trans_60pct_cv = []
    training_time_trans_80pct_cv = []

    # get the randomly selected target and source datasets for the current subject system
    subject_system_datasets = datasets[subject_system]

    for rep in range(REPETITIONS):
        print('Experiment repetition', rep+1)
        # grab a target and source dataset for this experiment repetition
        src_dataset, tgt_dataset = subject_system_datasets[rep]
        print('Source dataset path:', src_dataset.get_csv_path())
        print('Target dataset path:', tgt_dataset.get_csv_path(),'\n')
        
        X_train, X_validate, y_train, y_validate = tgt_dataset.get_split_dataset()
        optimised_pred_model = predictor.get_optimal_params(X_validate, y_validate)
        X_train, X_validate, y_train, y_validate = split_transfer_dataset(src_dataset, tgt_dataset)
        optimised_trans_model = transferrer.get_optimal_params(X_validate, y_validate)
        

        for train_size in TRAINING_SET_SIZES:

            X_train, X_validate, y_train, y_validate = tgt_dataset.get_split_dataset(train_size=train_size)
            # get optimised predictor model using hyperparameter optimisation
            predictor.fit(X_train, y_train, premade_model=optimised_pred_model)

            # gather results
            training_time_pred_no_cv = predictor.get_training_time()
            training_time_pred_cv = predictor.get_training_time(include_optimisation_time=True)


            X_train, X_validate, y_train, y_validate = split_transfer_dataset(src_dataset, tgt_dataset, train_size=train_size)
            # get optimised transfer model using hyperparameter optimisation
            transferrer.fit(X_train, y_train, premade_model=optimised_trans_model)

            # gather results
            training_time_trans_no_cv = transferrer.get_training_time()
            training_time_trans_cv = transferrer.get_training_time(include_optimisation_time=True)


            # record accuracy in appropriate results column
            if train_size == 0.2:
                training_time_pred_20pct_no_cv.append(training_time_pred_no_cv)
                training_time_pred_20pct_cv.append(training_time_pred_cv)
                training_time_trans_20pct_no_cv.append(training_time_trans_no_cv)
                training_time_trans_20pct_cv.append(training_time_trans_cv)
            elif train_size == 0.4:
                training_time_pred_40pct_no_cv.append(training_time_pred_no_cv)
                training_time_pred_40pct_cv.append(training_time_pred_cv)
                training_time_trans_40pct_no_cv.append(training_time_trans_no_cv)
                training_time_trans_40pct_cv.append(training_time_trans_cv)
            elif train_size == 0.6:
                training_time_pred_60pct_no_cv.append(training_time_pred_no_cv)
                training_time_pred_60pct_cv.append(training_time_pred_cv)
                training_time_trans_60pct_no_cv.append(training_time_trans_no_cv)
                training_time_trans_60pct_cv.append(training_time_trans_cv)
            elif train_size == 0.8:
                training_time_pred_80pct_no_cv.append(training_time_pred_no_cv)
                training_time_pred_80pct_cv.append(training_time_pred_cv)
                training_time_trans_80pct_no_cv.append(training_time_trans_no_cv)
                training_time_trans_80pct_cv.append(training_time_trans_cv)

    rq3_results['training_time_pred_20pct_no_cv'] = training_time_pred_20pct_no_cv
    rq3_results['training_time_pred_40pct_no_cv'] = training_time_pred_40pct_no_cv
    rq3_results['training_time_pred_60pct_no_cv'] = training_time_pred_60pct_no_cv
    rq3_results['training_time_pred_80pct_no_cv'] = training_time_pred_80pct_no_cv
    rq3_results['training_time_pred_20pct_cv'] = training_time_pred_20pct_cv
    rq3_results['training_time_pred_40pct_cv'] = training_time_pred_40pct_cv
    rq3_results['training_time_pred_60pct_cv'] = training_time_pred_60pct_cv
    rq3_results['training_time_pred_80pct_cv'] = training_time_pred_80pct_cv
    rq3_results['training_time_trans_20pct_no_cv'] = training_time_trans_20pct_no_cv
    rq3_results['training_time_trans_40pct_no_cv'] = training_time_trans_40pct_no_cv
    rq3_results['training_time_trans_60pct_no_cv'] = training_time_trans_60pct_no_cv
    rq3_results['training_time_trans_80pct_no_cv'] = training_time_trans_80pct_no_cv
    rq3_results['training_time_trans_20pct_cv'] = training_time_trans_20pct_cv
    rq3_results['training_time_trans_40pct_cv'] = training_time_trans_40pct_cv
    rq3_results['training_time_trans_60pct_cv'] = training_time_trans_60pct_cv
    rq3_results['training_time_trans_80pct_cv'] = training_time_trans_80pct_cv
    
    p_value_no_cv = get_wilcoxon_p_value(training_time_pred_80pct_no_cv, training_time_trans_80pct_no_cv)
    p_value_cv = get_wilcoxon_p_value(training_time_pred_80pct_cv, training_time_trans_80pct_cv)
    cliffs_delta_no_cv = get_cliffs_delta(training_time_pred_80pct_no_cv, training_time_trans_80pct_no_cv)
    cliffs_delta_cv = get_cliffs_delta(training_time_pred_80pct_cv, training_time_trans_80pct_cv)
    p_values = [p_value_cv, p_value_no_cv]
    cliffs_deltas = [cliffs_delta_cv, cliffs_delta_no_cv]
    
    print('Wilcoxon p value for no CV:', p_value_no_cv)
    print('Wilcoxon p value for CV:', p_value_cv)
    print('Cliff\'s delta for no CV:', cliffs_delta_no_cv)
    print('Cliff\'s delta for CV:', cliffs_delta_cv, '\n')

    save_results(rq3_results, subject_system.lower(), p_values, cliffs_deltas, 3)

    # reset dataframe after results are saved to csv
    rq3_results = pd.DataFrame(columns=rq3_results.columns)
    
make_box_plots(3)
write_mean_min_max(3)