# Randomization test
In this notebook, the change in prediction error associated with the use of seven regional models in comparison to one general model is evaluated using a randomization test. Under the null hypothesis, the use of regional models defined on the seven geographical regions of Turkey does not result in a reduction in prediction error in comparison with one general model for all of Turkey. Therefore, under the null hypothesis, any random assignment of stations to seven clusters has the potential of producing a similar reduction in prediction error. The alternative hypothesis states that the use of these specific regions to create regional models resulted in a reduction in prediction error.

The steps of this evaluation method is described below:
* Weather stations are randomly assigned to 7 clusters, where the number of datapoints in each cluster is determined by sampling from a Dirichlet distribution in order to obtain cluster dataset sizes similar to the dataset size in each of the seven regions used in this study. (See creating test_permutations notebook for more details)
* Stations in each cluster are split to train and test stations.
* A polynomial regression model is trained for each cluster.
* A general model is trained on all training stations combined and its prediction's RMSE is computed.
* The combined predictions of the 7 cluster models are used to compute the cluster model RMSE.
* The change in RMSE between the general and cluster models is computed using the following equation:
$$ RMSE_{change} = \frac{RMSE_{cluster} - RMSE_{general}}{RMSE_{general}} $$
* The previous steps are repeated 1000 times
* A p-value is computed by taking the ratio of iterations in which the reduction in prediction error ($RMSE_{change}$) exceeded that of the studied models.


### Readings:
* [On different randomization tests](https://stats.stackexchange.com/questions/104040/resampling-simulation-methods-monte-carlo-bootstrapping-jackknifing-cross)
* **Book**: Permutation, Parametric and Bootstrap Tests of Hypotheses
* [On bootstrap test power](https://stats.stackexchange.com/questions/420959/why-is-power-of-a-hypothesis-test-a-concern-when-we-can-bootstrap-any-representa)
* [On how to compute power by simulation](https://nickch-k.github.io/EconometricsSlides/Week_08/Power_Simulations.html)
* [On multiple test corrections](https://www.stat.berkeley.edu/~mgoldman/Section0402.pdf)
* [Simulation based power analysis](https://osf.io/n62hg/)


**Note**:
* The functions used in this notebook are found in *ETProject/RandomizationTest*.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from pathlib import Path
import yaml
from scipy.stats import dirichlet
import time

main_dir_path = Path.cwd().parents[0]
os.chdir(main_dir_path)

from ETProject.MyModel import PolynomialRegressor
from ETProject.RandomizationTest import *

### Loading data

In [7]:
# Loading previously created cluster permutations
path = 'randomization_test'

file_name = 'cluster_permutations.csv'
permutations = pd.read_csv(os.path.join(path, file_name))

# Loading ET data
all_data = pd.read_csv('processed_data/et_data.csv')

Defining scoring function

In [8]:
def rmse(y_true, y_pred):
    return np.sqrt( (1/y_true.shape[0]) * np.sum((y_true-y_pred)**2) )

Running test

In [42]:
input_combo = 3
regressor = PolynomialRegressor
score_fun = rmse


rand_test = RandomizationTest(all_data,
                              permutations,
                              input_combo,
                              regressor,
                              score_fun)

t1 = time.time()
df = rand_test.run_test()
t2 = time.time()
test_time = t2 - t1

# Saving test results
path = 'combo_{}_test.csv'.format(input_combo)
# df.to_csv(path)

print(f'\nTest time: {test_time}')

Started randomization test
Number of iterations: 1000


_________________________________________________
#################################################
Test time: 614.5390510559082
