# Ames House Price Data - Elastic Net

> Juptyer notebook, running a Julia 0.5.2 kernel, with the help of Machine Learning modules written by the author

*We build an elastic net model using all features. A biproduct of this model build is an alternative list of important features (with categoricals not one-hot encoded)*

Estimated 95% confidence interval for RMSL error of Sale Price predictions for the tuned model: 

    0.106 ± 0.022

## Loading and transforming  the data
As linear models generally perform better when input features are normally distributed, we perform Box-Cox transformations of non-negative numerical features where it is not too extreme to do so. We also rescale (standardize) to arrange zero mean and unit variance. Finally, we one-hot encode the categorical features.

First, let's load in the data:

In [19]:
push!(LOAD_PATH, pwd()) # Allow loading of modules from current directory 
addprocs(3) # for parallel processing
using Preprocess
import DataFrames: head, readtable, writetable
using Regressors, Validation
import TreeCollections: DataTable

df = readtable("2.cleaned/train_randomized.csv")
const y = collect(df[:target])
x = df[2:end-1]; # untransformed inputs

If we want to make predictions with our trained model later on (eg during validation/testing) then we must record each transformation we apply to the training input, so that these may be applied to new inputs before sending to the model. The objects encoding a given transformation are here called *schemes*. So transformation is in two steps: (i) Compute and record the appropriate scheme using the training data; and (ii) Transform the training data according to that scheme. Note that it is bad practice to transform training and validation/test data simultaneously, because this contributes to data leakage. Use only training data to compute your transformation schemes!

We begin with the Box-Cox transformations: 

In [3]:
q = BoxCoxScheme(x, shift=true) # allow shifts for non-negative data including 0 as value
xx = transform(q, x);

Box-Cox transformations: 
  :LotFrontage    lambda=0.32  shift=0.0
  :LotArea    lambda=0.02  shift=0.0
  :OverallQual    (*not* transformed, too few values)
  :OverallCond    (*not* transformed, too few values)
  :YearBuilt    (*not* transformed, lambda too extreme)
  :YearRemodAdd    (*not* transformed, lambda too extreme)
  :MasVnrArea    lambda=0.24  shift=20.305357142857144
  :BsmtFinSF1    lambda=0.7  shift=87.39821428571429
  :BsmtFinSF2    lambda=0.0  shift=9.33543956043956
  :BsmtUnfSF    lambda=0.38  shift=113.39807692307693
  :TotalBsmtSF    lambda=0.86  shift=210.13173076923078
  :x1stFlrSF    lambda=0.0  shift=0.0
  :x2ndFlrSF    lambda=1.14  shift=68.7065934065934
  :LowQualFinSF    lambda=-0.22  shift=1.1721153846153847
  :GrLivArea    lambda=0.12  shift=0.0
  :BsmtFullBath    (*not* transformed, too few values)
  :BsmtHalfBath    (*not* transformed, too few values)
  :FullBath    (*not* transformed, too few values)
  :HalfBath    (*not* transformed, too few values)
  :B

Standarization on numerical attributes is next:

In [4]:
s = StandardizationScheme(xx)
xxx = transform(s, xx);

Features standarized: 
  :LotFrontage    mu=8.965332891471617  sigma=1.3042539504552757
  :LotArea    mu=9.992871325426014  sigma=0.6155782882565797
  :OverallQual    mu=6.0885989010989015  sigma=1.3696691706201316
  :OverallCond    mu=5.576236263736264  sigma=1.1139656335816104
  :YearBuilt    mu=1971.1854395604396  sigma=30.201589946070243
  :YearRemodAdd    mu=1984.819368131868  sigma=20.652142559919664
  :MasVnrArea    mu=7.138807945380177  sigma=3.691174565964794
  :BsmtFinSF1    mu=104.2882805108197  sigma=66.04467846122878
  :BsmtFinSF2    mu=2.6391565255269476  sigma=1.1571104708924427
  :BsmtUnfSF    mu=27.117852140291152  sigma=7.888313476590563
  :TotalBsmtSF    mu=534.7731648596779  sigma=152.26001232809912
  :x1stFlrSF    mu=7.004710141674975  sigma=0.3133438765165076
  :x2ndFlrSF    mu=911.1724307987926  sigma=1027.319507873004
  :LowQualFinSF    mu=0.21050981870739582  sigma=0.40495905493068596
  :GrLivArea    mu=11.607592877641274  sigma=0.7833644957416361
  :BsmtFullBa

And finally we one-hot encode the categoricals:

In [5]:
t = HotEncodingScheme(xxx)
const X = transform(t, xxx) # inputs after all transformations
head(X) 

Unnamed: 0,MSSubClass___120,MSSubClass___160,MSSubClass___180,MSSubClass___190,MSSubClass___20,MSSubClass___30,MSSubClass___40,MSSubClass___45,MSSubClass___50,MSSubClass___60,MSSubClass___70,MSSubClass___75,MSSubClass___80,MSSubClass___85,MSSubClass___90,MSZoning__C (all),MSZoning__FV,MSZoning__RH,MSZoning__RL,MSZoning__RM,LotFrontage,LotArea,Street__Grvl,Street__Pave,LotShape__IR1,LotShape__IR2,LotShape__IR3,LotShape__Reg,LandContour__Bnk,LandContour__HLS,LandContour__Low,LandContour__Lvl,LotConfig__Corner,LotConfig__CulDSac,LotConfig__FR2,LotConfig__FR3,LotConfig__Inside,LandSlope__Gtl,LandSlope__Mod,LandSlope__Sev,Neighborhood__Blmngtn,Neighborhood__Blueste,Neighborhood__BrDale,Neighborhood__BrkSide,Neighborhood__ClearCr,Neighborhood__CollgCr,Neighborhood__Crawfor,Neighborhood__Edwards,Neighborhood__Gilbert,Neighborhood__IDOTRR,Neighborhood__MeadowV,Neighborhood__Mitchel,Neighborhood__NAmes,Neighborhood__NPkVill,Neighborhood__NWAmes,Neighborhood__NoRidge,Neighborhood__NridgHt,Neighborhood__OldTown,Neighborhood__SWISU,Neighborhood__Sawyer,Neighborhood__SawyerW,Neighborhood__Somerst,Neighborhood__StoneBr,Neighborhood__Timber,Neighborhood__Veenker,Condition1__Artery,Condition1__Feedr,Condition1__Norm,Condition1__PosA,Condition1__PosN,Condition1__RRAe,Condition1__RRAn,Condition1__RRNe,Condition1__RRNn,Condition2__Artery,Condition2__Feedr,Condition2__Norm,Condition2__PosA,Condition2__PosN,Condition2__RRAe,Condition2__RRAn,Condition2__RRNn,BldgType__1Fam,BldgType__2fmCon,BldgType__Duplex,BldgType__Twnhs,BldgType__TwnhsE,HouseStyle__1.5Fin,HouseStyle__1.5Unf,HouseStyle__1Story,HouseStyle__2.5Fin,HouseStyle__2.5Unf,HouseStyle__2Story,HouseStyle__SFoyer,HouseStyle__SLvl,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle__Flat,RoofStyle__Gable,RoofStyle__Gambrel,RoofStyle__Hip,RoofStyle__Mansard,RoofStyle__Shed,RoofMatl__CompShg,RoofMatl__Membran,RoofMatl__Metal,RoofMatl__Roll,RoofMatl__Tar&Grv,RoofMatl__WdShake,RoofMatl__WdShngl,Exterior1st__AsbShng,Exterior1st__AsphShn,Exterior1st__BrkComm,Exterior1st__BrkFace,Exterior1st__CBlock,Exterior1st__CemntBd,Exterior1st__HdBoard,Exterior1st__ImStucc,Exterior1st__MetalSd,Exterior1st__Plywood,Exterior1st__Stone,Exterior1st__Stucco,Exterior1st__VinylSd,Exterior1st__Wd Sdng,Exterior1st__WdShing,Exterior2nd__AsbShng,Exterior2nd__AsphShn,Exterior2nd__Brk Cmn,Exterior2nd__BrkFace,Exterior2nd__CBlock,Exterior2nd__CmentBd,Exterior2nd__HdBoard,Exterior2nd__ImStucc,Exterior2nd__MetalSd,Exterior2nd__Other,Exterior2nd__Plywood,Exterior2nd__Stone,Exterior2nd__Stucco,Exterior2nd__VinylSd,Exterior2nd__Wd Sdng,Exterior2nd__Wd Shng,MasVnrType__BrkCmn,MasVnrType__BrkFace,MasVnrType__None,MasVnrType__Stone,MasVnrType___NA,MasVnrArea,ExterQual__Ex,ExterQual__Fa,ExterQual__Gd,ExterQual__TA,ExterCond__Ex,ExterCond__Fa,ExterCond__Gd,ExterCond__Po,ExterCond__TA,Foundation__BrkTil,Foundation__CBlock,Foundation__PConc,Foundation__Slab,Foundation__Stone,Foundation__Wood,BsmtQual__Ex,BsmtQual__Fa,BsmtQual__Gd,BsmtQual__TA,BsmtQual___NA,BsmtCond__Fa,BsmtCond__Gd,BsmtCond__Po,BsmtCond__TA,BsmtCond___NA,BsmtExposure__Av,BsmtExposure__Gd,BsmtExposure__Mn,BsmtExposure__No,BsmtExposure___NA,BsmtFinType1__ALQ,BsmtFinType1__BLQ,BsmtFinType1__GLQ,BsmtFinType1__LwQ,BsmtFinType1__Rec,BsmtFinType1__Unf,BsmtFinType1___NA,BsmtFinSF1,BsmtFinType2__ALQ,BsmtFinType2__BLQ,BsmtFinType2__GLQ,BsmtFinType2__LwQ,BsmtFinType2__Rec,BsmtFinType2__Unf,BsmtFinType2___NA,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating__Floor,Heating__GasA,Heating__GasW,Heating__Grav,Heating__OthW,Heating__Wall,HeatingQC__Ex,HeatingQC__Fa,HeatingQC__Gd,HeatingQC__Po,HeatingQC__TA,CentralAir__N,CentralAir__Y,Electrical__FuseA,Electrical__FuseF,Electrical__FuseP,Electrical__Mix,Electrical__SBrkr,Electrical___NA,x1stFlrSF,x2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual__Ex,KitchenQual__Fa,KitchenQual__Gd,KitchenQual__TA,TotRmsAbvGrd,Functional__Maj1,Functional__Maj2,Functional__Min1,Functional__Min2,Functional__Mod,Functional__Sev,Functional__Typ,Fireplaces,FireplaceQu__Ex,FireplaceQu__Fa,FireplaceQu__Gd,FireplaceQu__None,FireplaceQu__Po,FireplaceQu__TA,GarageType__2Types,GarageType__Attchd,GarageType__Basment,GarageType__BuiltIn,GarageType__CarPort,GarageType__Detchd,GarageType___NA,GarageYrBlt,GarageFinish__Fin,GarageFinish__RFn,GarageFinish__Unf,GarageFinish___NA,GarageCars,GarageArea,GarageQual__Ex,GarageQual__Fa,GarageQual__Gd,GarageQual__Po,GarageQual__TA,GarageQual___NA,GarageCond__Ex,GarageCond__Fa,GarageCond__Gd,GarageCond__Po,GarageCond__TA,GarageCond___NA,PavedDrive__N,PavedDrive__P,PavedDrive__Y,WoodDeckSF,OpenPorchSF,EnclosedPorch,x3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType__COD,SaleType__CWD,SaleType__Con,SaleType__ConLD,SaleType__ConLI,SaleType__ConLw,SaleType__New,SaleType__Oth,SaleType__WD,SaleCondition__Abnorml,SaleCondition__AdjLand,SaleCondition__Alloca,SaleCondition__Family,SaleCondition__Normal,SaleCondition__Partial
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3882190888692983,-0.6126581030207808,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.794789664869238,2.175797585847309,0.35807917592655,0.8803266690312199,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,-0.7376906618785439,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.935490267876247,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3503024754207659,-1.7576870586752815,-0.5512823050315486,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.9583585613649228,-0.7817315352963173,-0.1344380971970907,-1.6729788397725982,1.1136719802050763,-0.239734925648083,-1.025800100747542,-0.7585204729541204,-1.0599385551152638,-0.2116841151858936,0.0,0.0,0.0,1.0,-1.554895766107958,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-0.9513465954961076,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.2720815571230804,1.0,0.0,0.0,0.0,0.3156958663890339,1.6315042055861677,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.8441116991343465,-0.9499888176599136,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,-0.1209018573474471,0.1374252746946536,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.76981414576793,0.4591227380874145,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.3955202759186676,-0.5172836992139112,1.1196284864453063,1.0255900474577375,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0930888980672215,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2.095769544161945,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3503024754207659,-0.175851821092536,1.92658934829022,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.9470468988212613,-0.7817315352963173,-0.1344380971970907,1.072420407324476,1.1136719802050763,-0.239734925648083,0.800073917208963,-0.7585204729541204,-1.0599385551152638,-0.2116841151858936,0.0,0.0,1.0,0.0,0.3063771049141196,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.171901754092016,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.1055567247460616,0.0,1.0,0.0,0.0,1.6557926053465657,1.9369628462206256,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.7434317273653064,0.7090039689491661,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,-1.2326898844919716,-1.3670198377515044,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.2484622021478217,-0.2699507272652216,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.6654169623226991,-0.5172836992139112,1.0534068072697624,0.8803266690312199,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,-0.7376906618785439,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-1.1062442593733643,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3503024754207659,0.6462316366647728,-0.6335725482882738,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.9937531789890316,0.7811538685804759,-0.1344380971970907,0.1479074218262232,-0.8189935778146988,-0.239734925648083,0.800073917208963,1.2313999029399323,0.165852180585947,-0.2116841151858936,0.0,0.0,1.0,0.0,0.9268013952548122,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.6102775792979542,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0222092079837637,1.0,0.0,0.0,0.0,0.3156958663890339,-0.3747414852056253,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.8441116991343465,0.8290803176036513,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,0.2496941517007277,-1.3670198377515044,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.4291126935119562,0.2418387851538892,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.0646863512732694,0.3804100624731624,0.2918574967510059,-0.2333592322387483,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.848427276796167,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.546995728667441,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3.74775871172558,-1.7576870586752815,1.5137719063420143,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.3576266484480497,-0.7817315352963173,-0.1344380971970907,0.4909510940182057,1.1136719802050763,-0.239734925648083,0.800073917208963,-0.7585204729541204,0.165852180585947,-0.2116841151858936,0.0,0.0,0.0,1.0,0.3063771049141196,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.6102775792979542,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0637127652173351,0.0,0.0,1.0,0.0,0.3156958663890339,0.6608842068911813,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0361998649708006,1.4205679829213278,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,-0.4914978663956219,-1.3670198377515044,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0608851980689457,-0.1439592587714001,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.794789664869238,-0.5172836992139112,1.1858501656208504,1.07401117359991,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,-0.7376906618785439,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.4464781710318216,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3503024754207659,0.8812408215071788,0.1909056791522569,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.05225512237927,-0.7817315352963173,-0.1344380971970907,-0.7518441816129947,-0.8189935778146988,-0.239734925648083,-1.025800100747542,-0.7585204729541204,0.165852180585947,-0.2116841151858936,0.0,0.0,0.0,1.0,-0.3140471854265729,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-0.9513465954961076,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.022039006836186,0.0,0.0,0.0,1.0,-2.364497611526029,-2.2036560283354523,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,-0.8441116991343465,0.0865232086939936,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,2.1026741969416016,-0.6147972815284254,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
6,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.0252672896824413,-0.2583836915234851,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.794789664869238,1.2781038241602358,-0.6021351721188386,1.07401117359991,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,-0.7376906618785439,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.7659923338303275,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.3503024754207659,-0.4021952914588755,0.0392448431385222,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,-0.1325066769018222,-0.7817315352963173,-0.1344380971970907,-0.9228724759952976,1.1136719802050763,-0.239734925648083,-1.025800100747542,-0.7585204729541204,0.165852180585947,-0.2116841151858936,0.0,0.0,1.0,0.0,-0.9344714757672656,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-0.9513465954961076,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,-1.0614787110736894,0.0,0.0,1.0,0.0,-1.0244008725684977,-1.064819860326507,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.8441116991343465,0.4997371809644559,-0.3953028540815998,-0.1289599728639647,-0.2884191934755677,-0.05809532844067,-0.1902253931180565,-1.9738819025883207,1.6418703871408116,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0


## Tuning parameters
The elastic net has two main parameters. The first parameter `0 ≤ alpha < 1` determines the mix of L1/L2 regularization: `alpha = 1` for L1 penalty only (the Lasso case) `alpha = 0` for L2 penalty only (Ridge case). (If one suspects `alpha = 0` is optimal, then one should use a ridge regression algorithm instead.) The second parameter `lambda ≥ 0` is the amount of regularization. If `alpha` is chosen, and `lambda` is set to zero (the default) then training will automatically seek to optimize the value of `lambda`. For full details, make the query `?ElasticNetRegressor`.

From a search not published in this notebook, we obtained an approximately optimal value of `alpha=0.92`. We now fine-tune both parameters with cross-validation:

In [6]:
rgs = ElasticNetRegressor(alpha=0.921,X,y)

Optimizing regularization parameter using 10-fold cross-validation. 
fold number = 1  lambda = 0.0012134317578285233
fold number = 2  lambda = 0.00014331110331509343
fold number = 3  lambda = 0.00025120933486664083
fold number = 4  lambda = 0.0009477406488931978
fold number = 5  lambda = 0.0004258124466593651
fold number = 6  lambda = 0.0019250899731813723
fold number = 7  lambda = 0.0004781328521813045
fold number = 8  lambda = 0.0006272948354375241
fold number = 9  lambda = 0.00045674544911826606
fold number = 10  lambda = 0.0006867497786499316


ElasticNetRegressor@...5716

In [7]:
@more

ElasticNetRegressor@...5716


Dict{Symbol,Any} with 7 entries:
  :alpha            => 0.921
  :standardize      => false
  :n_lambdas        => 100
  :lambda           => 0.0
  :criterion        => :coef
  :lambda_min_ratio => 0.0
  :max_n_coefs      => 0

Hyperparameters:


Dict{Symbol,Any} with 6 entries:
  :intercept         => 11.8542
  :loglambdaopt_stde => 0.752439
  :cv_rmse           => 0.106615
  :lambdaopt         => 0.000562883
  :cv_rmse_stde      => 0.0112994
  :loglambdaopt      => -7.48244

Regularization is to be set automatically (lambda = 0.0)
Post-fit parameters:
[1m[37m                                        Non-zero coefficients
[0m[1m[37m                              ┌────────────────────────────────────────┐[0m 
        [1m[37mMSZoning__C (all) (-)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.274[0m [1m[37m│[0m [1m[37m[0m
    [1m[37mNeighborhood__Crawfor (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.099[0m                      [1m[37m│[0m [1m[37m[0m
                [1m[37mGrLivArea (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.071[0m                         [1m[37m│[0m [1m[37m[0m
          [1m[37mFunctional__Typ (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪[0m[1m[37m 0.07[0m                           [1m[37m│[0m [1m[37m[0m
    [1m[37mNeighborhood__StoneBr (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪[0m[1m[37m 0.07[0m                           [1m[37m│[0m [1m[37m[0m


The training above returns an optimal value for `log(lambda)` and a standard deviation for its estimate based on cross-validation. We now use this to obtain a range of `lambda` values to use in our own explicit optimization of the two parameters:

In [8]:
m = rgs.loglambdaopt
delta = rgs.loglambdaopt_stde
lambdas = exp(linspace(m - delta, m + delta, 10)) # the range for lambda

alphas = linspace(0.9,0.99,10); # the range for alpha

And search for parameters minimizing the cross-validation error:

In [9]:
alphas, lambdas, rmserrors = @getfor α alphas λ lambdas cv_error(
    ElasticNetRegressor(alpha=α, lambda=λ),X,y; 
    parallel=true, verbose=false, n_folds=10);

α=0.99 λ=0.00119453362806364346

In [10]:
indmin(rmserrors) # linear index of minimum error

In [11]:
ind2sub(size(rmserrors), 36) # subscripts for that position

(6,4)

In [12]:
α, λ = (alphas[6] , lambdas[4]) # the optimal parameter values

(0.95,0.00043801763437244256)

## Cross-validation of tuned model

In [13]:
errors_elastic = cv_errors(ElasticNetRegressor(alpha=α, lambda=λ), X, y;
    n_folds=12, parallel=true, verbose=false);

In [14]:
string(mean(errors_elastic), " ± ", std(errors_elastic))

"0.10629478180351216 ± 0.021605771568457277"

In [22]:
import JLD: jldopen, write, close
file = jldopen("cv_errors.jld", "r+") # open in append mode
write(file, "errors_elastic", errors_elastic)
close(file)

## Ranking of features based on final model

In [20]:
elastic = ElasticNetRegressor(X, y, alpha=α, lambda=λ) # train on all training patterns
head(coefs(elastic))

Unnamed: 0,feature,name,coef
1,16,MSZoning__C (all),-0.2961189418702098
2,47,Neighborhood__Crawfor,0.1036580807842533
3,63,Neighborhood__StoneBr,0.0799829018662005
4,238,Functional__Typ,0.0741081152337582
5,220,GrLivArea,0.0704656648331386
6,96,OverallQual,0.0669696374417104


In [21]:
showall(elastic)

Dict{Symbol,Any} with 7 entries:
  :alpha            => 0.95
  :standardize      => false
  :n_lambdas        => 100
  :lambda           => 0.000438018
  :criterion        => :coef
  :lambda_min_ratio => 0.0
  :max_n_coefs      => 0

Dict{Symbol,Any} with 3 entries:
  :intercept    => 11.8591
  :lambdaopt    => 7.72782e228
  :loglambdaopt => 7.09036e159

ElasticNetRegressor@...7170
Hyperparameters:
Post-fit parameters:
[1m[37m                                        Non-zero coefficients
[0m[1m[37m                              ┌────────────────────────────────────────┐[0m 
        [1m[37mMSZoning__C (all) (-)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.297[0m [1m[37m│[0m [1m[37m[0m
    [1m[37mNeighborhood__Crawfor (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.103[0m                       [1m[37m│[0m [1m[37m[0m
    [1m[37mNeighborhood__StoneBr (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.079[0m                         [1m[37m│[0m [1m[37m[0m
          [1m[37mFunctional__Typ (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪[0m[1m[37m 0.074[0m                          [1m[37m│[0m [1m[37m[0m
                [1m[37mGrLivArea (+)[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪[0m[1m[37m 0.07[0m                           [1m[37m│[0m [1m[37m[0m
            

According to the Ames House Price documentation, the top feature `MSZoning__C (all)` is hot (equal to one) for a house in a commericial zone. Not surprisingly, a house in a commercial zone is, according to our model, highly undesirable. Homes in the neighborhoods with labels `Crawfor` and `StoneBr` are likely to be valued highly (as, to a lesser extent, are `BrkSide` and `Edwards`) although a combination of attractive features (especially, Functional Type, Ground Living Area and Overall Quality) could keep homes in other areas competitive.