In [None]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")

## Resampling Methods

In [None]:
%%stata
use ../Data/breathe, clear
quietly do ../Do/no2

### Cross-Validation
#### Validation Set Approach

In [None]:
%%stata
splitsample , generate(sample) split(.80 .20) rseed(52)
label define slabel 1 "Training" 2 "Validation"
label values sample slabel
tabulate sample

In [None]:
%%stata
quietly regress react no2_class $cc i.$fc if sample==1
estimates store ols
lassogof ols, over(sample)

#### Leave-One-Out Cross-Validation

One needs to install the user-written package ```loocv``` by issuing the command ```ssc install loocv``` before executing the following code:

In [None]:
%%stata
loocv regress react no2_class $cc i.$fc

Given the original sample $\{Y_1,\ldots,Y_n\}$ and the loocv predictions $\{\widehat{Y}_1,\ldots,\widehat{Y}_n\}$, then
$$
\begin{align}
\text{Root Mean Squared Errors}&=&\sqrt{n^{-1}\sum_{i=1}^n(Y_i-\widehat{Y}_i)^2}\\
\text{Mean Absolute Errors}&=&n^{-1}\sum_{i=1}^n|Y_i-\widehat{Y}_i|\\
\text{Pseudo-R2}&=&\widehat{\text{corr}}(Y_i,\widehat{Y}_i)^2
\end{align}
$$

#### _k_-Fold Cross-Validation

One needs to install the user-written package ```crossfold``` by issuing the command ```ssc install crossfold``` before executing the following code:

In [None]:
%%stata
crossfold regress react no2_class $cc i.$fc, k(5) stub(fold)

Displaying the OLS estimates from the 3th fold

In [None]:
%%stata -eret steret
estimates restore fold3

In [None]:
steret['e(b)']

In [None]:
import pandas as pd
from pystata import stata
from sfi import Scalar, Matrix
stata.run('qui crossfold regress react no2_class $cc i.$fc, k(5) stub(fold)')
df_rmse = pd.DataFrame(sum(Matrix.get('r(fold)'),[]))
rows = Matrix.getRowNames('r(fold)')

stata.run('qui crossfold regress react no2_class $cc i.$fc, k(5) stub(fold) mae')
df_mae = pd.DataFrame(sum(Matrix.get('r(fold)'),[]))

stata.run('qui crossfold regress react no2_class $cc i.$fc, k(5) stub(fold) r2')
df_r2 = pd.DataFrame(sum(Matrix.get('r(fold)'),[]))

# Export to result with Dataframe format
result = pd.concat([df_rmse,df_mae,df_r2],axis=1)
result.columns = ['RMSE','MAE','pseudo R2']
result.index = rows
print(result)

In this case $\sqrt{CV_{(5)}}$ equals

In [None]:
import math as math
import statistics as st
print(math.sqrt(st.mean(result['RMSE']**2)))