In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
from sklearn.preprocessing import StandardScaler, scale
import seaborn as sns

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

<img style="float:right" src="https://www.washington.edu/brand/files/2014/09/W-Logo_Purple_Hex.png" width=60px)/>
# Traits and Range Shifts: Feature Engineering + Regularization 

## <small>work by [Tony Cannistra](http://www.github.com/acannistra) and the [Buckley Lab](http://faculty.washington.edu/lbuckley) at the University of Washington</small>

There is a tradeoff in this modeling persuit between interpretability and predictive power in the models that we're using. Many of the purely statistical machine learning models aren't particularly interpretable––that is, we're unable to tease out the direct influence of certain predictors on the response variable (in this case, geographic shift). 

In an attempt to both capture the nonlinearity of the relationships between many of our predictors and the response and to enhance the model's interpretability, we combine feature engineering with a regression approach. 

[featEng]: http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/ "Source"

## Feature Engineering
> Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. [source] [featEng]

This applies to our problem in several ways. The most straightforward work in feature engineering we've already done most of. Namely, the encoding of categorical features in a way that creates binary features for each category of each categorical feature. This is known as *one hot* encoding. 

A much more interesting approach to feature engineering is the creation of new features by linearly combining existing features in some way. 

The challenge there is that creating pairwise multiples of, say, 20 features leads us to move from 20 features to 400 features. Learning any model on these 400 features is sure to lead to overfitting, especially in a linear model. We can combat this with regularization. 

There are two interesting forms of regularization that we can use for this project: ridge regression and LASSO regression. 