# Ames House Price Data - Extraction of Important Features
>Juptyer notebook, running a Julia 0.5.2 kernel, with the help of Machine Learning modules written by the author

*Here we obtain a ranking of the importance of each feature using the decision tree regularization method of [Deng and Runger (2012)](https://arxiv.org/abs/1201.1587v3). Another way to extract important features, using the Lasso regularized linear model, is described in [The Elastic Net Model](ElasticNet.ipynb)* 

## Transforming the data

The data is read in as a `DataFrame` instance but the regularized decision tree we build requires the input data to be in `DataTable` form. The `DataTable` form is defined in the `TreeCollections` module. The query `?DataTable` describes this data structure in detail. 

Note that we have no need to one-hot encode categoricals as our decision tree algorithms handle mixed data types.


In [None]:
push!(LOAD_PATH, pwd()) # Allow loading of modules from current directory 
using ADBUtilities, Preprocess, Regressors, TreeCollections
import DataFrames: DataFrame, head, readtable, writetable

df = readtable("2.cleaned/train_randomized.csv")

const X = DataTable(df[2:end-1]) # drop the identifying feature :Id and the target
const y = collect(df[:target]);

## Ranking the features

To build a Deng-Runger regularized tree we simply give the basic decision tree model a `penalty` keyword argument. We use the default value sugggested by the authors:

In [2]:
tree = TreeRegressor(X,y,penalty=0.5)

TreeRegressor@...3115

In [3]:
@more # shorthand for `showall(ans)`

Dict{Symbol,Any} with 7 entries:
  :max_features             => 0
  :extreme                  => false
  :regularization           => 0.0
  :min_patterns_split       => 2
  :popularity_given_feature => Dict(68=>35,2=>131,11=>28,46=>6,25=>13,55=>26,42…
  :cutoff                   => 0
  :penalty                  => 0.5

TreeRegressor@...3115
  Hyperparameters:
[1m[37m                            Feature importance
[0m[1m[37m                 ┌────────────────────────────────────────┐[0m 
     [1m[37mOverallQual[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.218[0m [1m[37m│[0m [1m[37m[0m
       [1m[37mGrLivArea[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.158[0m          [1m[37m│[0m [1m[37m[0m
    [1m[37mNeighborhood[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.125[0m               [1m[37m│[0m [1m[37m[0m
       [1m[37mx1stFlrSF[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.08[0m                       [1m[37m│[0m [1m[37m[0m
     [1m[37mTotalBsmtSF[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.074[0m                       [1m[37m│[0m [1m[37m[0m
      [1m[37mBsmtFinSF1[0m[1m[37m │[0m[1m[34m▪▪▪▪▪▪▪▪▪[0m[1m[37m 0.06[0m                          [1m[37m│[0m [1m

The *popularity* of a feature `j` for a pattern `p` is the number of times `p[j]` is consulted as `p` is run down the tree to determine the corresponding prediction node. To obtain the overall *popularity* of a feature, we sum over all patterns. The popularities are supplied by the `popularity_given_feature` attribute of the tree (a dictionary keyed on feature index). The *importance* of a feature is its popularity, normalised by the most popular feature's popularity. Note that `importance(tree)` is the corresponding dictionary of importance keyed on the *name* of the feature (and what is displayed above).

In [4]:
importance_given_name = importance(tree)
important_features = reverse(collect(keys_ordered_by_values(importance_given_name)))[1:40]

LoadError: UndefVarError: importance not defined

We see that about half of the 40 most important features are ordinal, half categorical:

In [35]:
sum(X[important_features].scheme.is_ordinal)

22

## Writing results to file

In [36]:
dg = DataFrame([important_features,],[:field,])

writetable("3.important_features/important_features.csv", dg)