In [None]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")

## Preparing the data

In [None]:
%%stata
use ../Data/breathe, clear
quietly do ../Do/no2
display "$cc"
display "$fc"

We use ```splitsample``` with the option ```split(.75 .25)``` to generate the variable ```sample```, which is 1 for a 75% of the sample and 2 for the remaining 25% of the sample. The assignment of each observation in sample to 1 or 2 is random, but the ```rseed``` option makes the random assignment reproducible.

In [None]:
%%stata
splitsample , generate(sample) split(.75 .25) rseed(52)
label define slabel 1 "Training" 2 "Validation"
label values sample slabel
tabulate sample

## OLS

In [None]:
%%stata
quietly regress react no2_class $cc i.($fc) if sample==1
estimate store ols

## Ridge

In [None]:
%%stata
quietly elasticnet linear react no2_class $cc i.($fc) if sample==1, alpha(0) lambda(0.1(.005)0.3) folds(781) nolog
estimate store ridge

## Lasso

In [None]:
%%stata
quietly lasso linear react no2_class $cc i.($fc) if sample==1, folds(20) rseed(52) nolog
estimate store lasso

## Elastic Net

In [None]:
%%stata
quietly elasticnet linear react no2_class $cc i.($fc) if sample==1, alpha(.02 (0.02) .1) nolog folds(20) rseed(52)
estimate store elasticnet

## In- \& Out-of-Sample Prediction

In [None]:
%%stata
lassogof ols ridge lasso elasticnet, over(sample)

<strong>Postselection</strong> coefficients should not be used with <em>elasticnet</em> and, in particular, with <em>ridge regression</em>. Ridge works by shrinking the coefficient estimates, and these are the estimates that should be used for prediction. Because postselection coefficients are OLS regression coefficients for the selected coefficients and because ridge always selects all variables, postselection coefficients after ridge are OLS regression coefficients for all potential variables, which clearly we do not want to use for prediction.
