# Feature engineering with Python
After this 2 hours practical, you will be able to
- preprocess main data types (numerical, various categorical, time data)
- understand the challenges of high dimension
- evaluate and optimize your feature engineering

The content is quite dense. You can run it in a "blackbox" way, but you are encouraged to go back to it later to deepend your understanding. You can see today as a toolbox with working examples. We have tried to use "real world datasets" as much as possible.

#### What is feature engineering?
Very vague notion, mainly all transformation to go from raw data to input of your final ML pipeline. Can be standard or ad-hoc creative transformation of your data. A lot of constraints influence your feature engineering
- extract meaningful information from data
- transform your data to respect mathematical constraints of algorithms
- reduction of dimensions

Let's see how it works in practice. It may seem a bit chaotic, but there are actually some rules that help! 
<img src="machine_learning_2x.png" style="width: 300px;">

# Table of contents
1. [Date features](#dates)
    1. [Turn timestamps into categorical features](#dateToCategorical)
    2. [How to best deal with categorical features](#CategoricalEncoding)
    3. [How about big data?](#BigData)
2. [Numerical features](#numerical)
3. [Dimensionality reduction](#PCA)

There are a few questions to help you interprete the results!

In [None]:
# setup your notebook
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as spstats
from sklearn.linear_model import Ridge, LinearRegression, LogisticRegression
from sklearn.metrics import mean_absolute_error, roc_curve
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

## 1. Date features <a name="dates"></a>

A word about data.
The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer.

A lot of things can be done with this dataset, for our practical, we will just consider a time series problem, to predict the true CO concentration CG(GT) from the timestamp, and other parameters easy to acquire like Temperature, relative and absolute humidity (RH, AH). This way we can monitor CO level without a sensor. 


In [None]:
# first get the data
airdata = pd.read_csv('AirQualityUCI.csv', sep=';', decimal=b',', na_values=-200)

In [None]:
# have a look
airdata.head(5)

In [None]:
# have a more informative look!
airdata.describe(include='all')

first two columns are categorical (date, time), others are numerical. Let's have a look.

In [None]:
sns.clustermap(airdata[airdata.columns[2:-2]].corr(), cmap="vlag", vmin=-1, vmax=1, annot=True)

### A. Turn timestamps into categorical features <a name="dateToCategorical"></a>
To exploit the timestamps, we are going to use the [pandas timeseries functions](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) to extract meaningful categorical features like the day in the week, the month etc. We can consider other more complex variables like season, if it is a holiday etc.

In [None]:
# we drop Nan values for the variables we care about
airdata_nona = airdata.dropna(subset=['CO(GT)', 'Date', 'Time', 'T', 'AH', 'RH'])

In [None]:
# some conversion work
airdata_nona = airdata_nona.assign(date_format = pd.to_datetime(airdata_nona.Date))
airdata_nona.index=airdata_nona.date_format
airdata_nona = airdata_nona.assign(day_year=airdata_nona.date_format.dt.dayofyear)
airdata_nona = airdata_nona.assign(day_week=airdata_nona.date_format.dt.dayofweek)
airdata_nona = airdata_nona.assign(weekname=airdata_nona.date_format.dt.week)
airdata_nona = airdata_nona.assign(month=airdata_nona.date_format.dt.month)
airdata_nona = airdata_nona.assign(hour=airdata_nona.Time.str[:-6].astype(int))

In [None]:
# we limit ourselves to a reduced time window in which the measures are dense.
airdata_nona_small = airdata_nona[(airdata_nona.date_format>='2004-04-01')&(airdata_nona.date_format<='2005-04-30')]


In [None]:
# some sanity check.
print(airdata.shape, airdata_nona.shape, airdata_nona_small.shape)

##### Evaluation setup
to go further, we need a train-test split. The goal is to see how a good preprocess of inputs leads to better predictions.
the data is kind of complicated, we cannot just randomly divide in train and test!! Indeed, if I have the level of CO some day at 3pm, provided my model is complex enough to get the info, I might have data leakage and bias my prediction of the CO level at 4pm the same day in the test set. Let us get as a test set with 12 random weeks.

We also have to choose a metric for evaluation. Let's gor for the [mean absolute error](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html), and a regression algorithm. Let's go for simple [linear ridge regression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We could have something fancier and better performance, but we are interested in feature engineering here.

In [None]:
# get train test split.
np.random.seed(4697)
test_weeks = np.random.choice(airdata_nona_small.weekname.unique(), 12, replace=False)

In [None]:
test_weeks

In [None]:
training_set = airdata_nona_small[~airdata_nona_small.weekname.isin(test_weeks)]
testing_set = airdata_nona_small[airdata_nona_small.weekname.isin(test_weeks)]

Let's train a first model! This will be our reference for performance

In [None]:
variable_list = ['T', 'AH', 'RH', 'month', 'day_week', 'hour']
X_train = training_set[variable_list]
Y_train = training_set['CO(GT)'].as_matrix().ravel()
X_test = testing_set[variable_list]
Y_test = testing_set['CO(GT)'].as_matrix().ravel()
clf = Ridge(alpha=1)
clf.fit(X_train, Y_train)
print('train MAE\t', mean_absolute_error(Y_train, clf.predict(X_train)), '\t; test MAE\t', mean_absolute_error(Y_test, clf.predict(X_test)))

Now let's do models adding one variable at a time.

In [None]:
for i, variable in enumerate(variable_list):
    X_train = training_set[variable_list[:i+1]]
    Y_train = training_set['CO(GT)'].as_matrix().ravel()
    X_test = testing_set[variable_list[:i+1]]
    Y_test = testing_set['CO(GT)'].as_matrix().ravel()
    clf = Ridge(alpha=1)
    clf.fit(X_train, Y_train)
    print('train MAE\t', mean_absolute_error(Y_train, clf.predict(X_train)), '\t; test MAE\t', mean_absolute_error(Y_test, clf.predict(X_test)))

**Question** Why have we done this last experiment? what does it teach us?

Answer here




### B. How to best deal with categorical features <a name="CategoricalEncoding"></a>
For now, we have not really thought about it, we are lucky and our categorical data have a natural numeric representation. It is called [Label encoding](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html). But there are other possibilities, like [one hot encoding](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder), also called [dummy encoding](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html).

In [None]:
sns.boxplot(x='day_week', y='CO(GT)', data=training_set, showfliers=False)

In [None]:
sns.boxplot(x='hour', y='CO(GT)', data=training_set, showfliers=False)

In [None]:
sns.boxplot(x='month', y='CO(GT)', data=training_set, showfliers=False)



Let's sum up. We have transformed useless time data into several useful categorical data like day of the week, hour in the day, month, they naturally have some numerical value (January is 1 etc), or the hour of the day is numerical. This is called [Label encoding](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)

**Question** Given we consider a linear model, briefly explain why one hot encoding might work better for this dataset.

Answer here


Let's look at each variable among day_week, hour and month, and see if it works better with label encoding or one hot encoding.

In [None]:
for variable in ['day_week', 'hour', 'month']:
    enc = OneHotEncoder()
    enc.fit(training_set[variable].as_matrix().reshape(-1, 1))
    X_train_hot = enc.fit_transform(training_set[variable].as_matrix().reshape(-1, 1))
    X_train = training_set[variable].as_matrix().reshape(-1, 1)
    Y_train = training_set['CO(GT)'].as_matrix().ravel()
    X_test_hot = enc.transform(testing_set[variable].as_matrix().reshape(-1, 1))
    X_test = testing_set[variable].as_matrix().reshape(-1, 1)
    Y_test = testing_set['CO(GT)'].as_matrix().ravel()
    clf = Ridge(alpha=1)
    clf.fit(X_train, Y_train)
    clf_hot = Ridge(alpha=1)
    clf_hot.fit(X_train_hot, Y_train)
    print(variable, '\ttrain MAE  ', np.round(mean_absolute_error(Y_train, clf.predict(X_train)), 2), '; test MAE  ', np.round(mean_absolute_error(Y_test, clf.predict(X_test)),2),  '\t; dummy train MAE  ', np.round(mean_absolute_error(Y_train, clf_hot.predict(X_train_hot)), 2), '\t; dummy test MAE  ', np.round(mean_absolute_error(Y_test, clf_hot.predict(X_test_hot)), 2))

**Question** What do you think of those results??

Answere here



In [None]:
# final best perf with all 3 variables in ont hot encoding
X = pd.get_dummies(airdata_nona_small, columns=['day_week', 'hour', 'month'])
col_list = [c for c in X.columns if '_' in c]
col_list.remove('day_year')
col_list.remove('date_format')
col_list += ['T', 'RH', 'AH']
X_train = X[~X.weekname.isin(test_weeks)][col_list]
X_test = X[X.weekname.isin(test_weeks)][col_list]
Y_train = training_set['CO(GT)'].as_matrix().ravel()
Y_test = testing_set['CO(GT)'].as_matrix().ravel()
clf = Ridge(alpha=1)
clf.fit(X_train, Y_train)
print('train MAE\t', mean_absolute_error(Y_train, clf.predict(X_train)), '\t; test MAE\t', mean_absolute_error(Y_test, clf.predict(X_test)))

**Question** Are you convinced?

Answer here

Further understanding can be achieved by looking at the weights found by the model in the label encoding and in the one hot encoding



### C. How about big data? <a name="BigData"></a>
Ok, fine, amazing performances! But our categorical variables were kind of nice, and do not have thousands of possible values. In that case we could not binarize everything. What are we going to do??
<img src="https://cdn-images-1.medium.com/max/1600/0*FwubnnoNlt6Coo9j.png">

Let's imagine we have a categorical variable with many possible values. A few real world examples could be airports in flight delay data (there are >5k public airports in the US!!!), or IP addresses. 

Let us imagine it is not possible to binarize hours of the day (imagine we have a record every minute for instance). We could group the minutes by bins. Or we could consider the average of the target variable for that minute! Let's do that for hours in the day. It may seem weird, but if we stick to our train data, no data leakage.

In [None]:
# Let's build our average. We are very careful and use only the train set !!
X_train = airdata_nona_small[~airdata_nona_small.weekname.isin(test_weeks)]
match_time = X_train.groupby('Time')['CO(GT)'].mean().to_frame()
match_day = X_train.groupby('day_week')['CO(GT)'].mean().to_frame()
match_month = X_train.groupby('month')['CO(GT)'].mean().to_frame()

In [None]:
# let's add our new features to the dataset!!!!
X_time = pd.merge(airdata_nona_small, match_time, left_on='Time', right_index=True, suffixes=['', '_time'])
X_time_day = pd.merge(X_time, match_day, left_on='day_week', right_index=True, suffixes=['', '_day'])
X_time_day_month = pd.merge(X_time_day, match_month, left_on='month', right_index=True, suffixes=['', '_month'])

In [None]:
# let's test our brand new features!!!!
# we want the list of course
tmp_list = [c for c in X_time_day_month.columns if '_' in c]
tmp_list.remove('date_format')
tmp_list.remove('day_year')
tmp_list.remove('day_week')
col_list = tmp_list + ['T', 'RH', 'AH']
# and then ususal performance measuring setup.
X_train = X_time_day_month[~X_time_day_month.weekname.isin(test_weeks)][col_list]
X_test = X_time_day_month[X_time_day_month.weekname.isin(test_weeks)][col_list]
Y_train = training_set['CO(GT)'].as_matrix().ravel()
Y_test = testing_set['CO(GT)'].as_matrix().ravel()
clf = Ridge(alpha=1)
clf.fit(X_train, Y_train)
print('train MAE \t', mean_absolute_error(Y_train, clf.predict(X_train)), '\t; test MAE\t', mean_absolute_error(Y_test, clf.predict(X_test)))

**Question** What do you think of this result? Can we expect that result if we have a look back at the boxplots made earlier?

Answer here


Well, no improvement with feature engineering happens all the time. Feature engineering is an art, and most of the time what you try does not work. Just keep this trick in mind for the day it might apply. Other ideas (not exhaustive)
- only keep the N most frequent possibilites for one hot encoding
- pre-cluster the possibilities, either there is a natural structure (like day/night for hours, seasons for months), or just use other features to run your favorite clustering algorithm


## 2. Numerical features <a name="numerical"></a>
Ok, actually we have not looked at all at numerical features, and there may be some work to do there as well! First thing is to plot the distribution, remove potential outliers. Let's look at our target variable CO(GT)!!

In [None]:
sns.distplot(airdata_nona_small['CO(GT)'])

**Question** What can you say about the distribution? Are there outliers? What probability law can you recognize (or not)? Why? Can you think of a transformation we could apply to data? 

Answer here

In [None]:

sns.distplot(np.log(airdata_nona_small['CO(GT)']))

We went for the log. You can find more about [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution). Let's see if performance is higher with normally-distributed target variable.

In [None]:
# final best perf
X = pd.get_dummies(airdata_nona_small, columns=['day_week', 'hour', 'month'])
col_list = [c for c in X.columns if '_' in c]
col_list.remove('day_year')
col_list.remove('date_format')
col_list += ['T', 'RH', 'AH']
X_train = X[~X.weekname.isin(test_weeks)][col_list]
X_test = X[X.weekname.isin(test_weeks)][col_list]
Y_train = training_set['CO(GT)'].as_matrix().ravel()
Y_test = testing_set['CO(GT)'].as_matrix().ravel()
clf = Ridge(alpha=1)
clf.fit(X_train, np.log(Y_train))
print('train MAE log target \t', mean_absolute_error(Y_train, np.exp(clf.predict(X_train))), '\t; test MAE log target\t', mean_absolute_error(Y_test, np.exp(clf.predict(X_test))))

And a little improvement here as well. Most algorithms deal better with regular normally-distributed data. Another possible transformation is the [box-cox](https://en.wikipedia.org/wiki/Power_transform). It is close to the logarithm, but more parametrizable to really obtain a normal distribution. In our case, it works the same as the log, but to keep in your toolbox.

In [None]:

def invboxcox(y,ld):
   if ld == 0:
      return(np.exp(y))
   else:
      return(np.exp(np.log(ld*y+1)/ld))
    
l, opt_lambda = spstats.boxcox(Y_train)    
# final best perf
X = pd.get_dummies(airdata_nona_small, columns=['day_week', 'hour', 'month'])
col_list = [c for c in X.columns if '_' in c]
col_list.remove('day_year')
col_list.remove('date_format')
col_list += ['T', 'RH', 'AH']
X_train = X[~X.weekname.isin(test_weeks)][col_list]
X_test = X[X.weekname.isin(test_weeks)][col_list]
Y_train = training_set['CO(GT)'].as_matrix().ravel()
Y_test = testing_set['CO(GT)'].as_matrix().ravel()
clf = Ridge(alpha=1)
l, opt_lambda = spstats.boxcox(Y_train) 
clf.fit(X_train, l)
print(mean_absolute_error(Y_train, invboxcox(clf.predict(X_train), opt_lambda)), mean_absolute_error(Y_test, invboxcox(clf.predict(X_test), opt_lambda)))

In [None]:
# let's have a look at the distribution!!
sns.distplot(l)

**Question** What can you say about this distribution? Which one looks closer to a gaussian between box-cox and log? Does box-cox improve performance here? decrease?

Answer here

## 3. Dimensionality reduction <a name="PCA"></a>
Another case where feature engineering is crucial is when you have too many features. This is painful to store, very long to run an algorithm, even to make a prediction. Moreover, there might be noise in your data, and dimensionality reduction can even improve your performance!! There are many methods, to learn a new space of smaller dimension called [manifold](http://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding) where your data lives, or to [reduce the dimensions](http://scikit-learn.org/stable/modules/unsupervised_reduction.html). Here we will detail an example of PCA.

So let's switch to a new dataset. It consists in mutant p53 proteins, i.e. the p53 protein, a key protein for cancer, has a modification. The goal is to predict from physical measurements whether it is still active or not. There are a total of 5409 attributes per instance. 

- Attributes 1-4826 represent 2D electrostatic and surface based features. 
- Attributes 4827-5408 represent 3D distance based features. 
- Attribute 5409 is the class attribute, which is either active or inactive. 

In [None]:
# let's read the data. 
active = pd.read_csv('K9_active.csv', header=None, na_values='?')
inactive = pd.read_csv('K9_inactive3000.csv', header=None, na_values='?')
data = pd.concat((active, inactive), axis=0)

In [None]:
data.shape

In [None]:
data.head()

In [None]:
# drop na
data = data.dropna(subset=range(5408))
data[5408].value_counts()

In [None]:
# prepare data - separate in train-test-split
X = data[list(range(5408))].as_matrix()
Y = (data[[5408]]=='inactive').astype(bool).astype(int).as_matrix().ravel()
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, stratify=Y)

In [None]:
# train a logistic regression model, and display results
logistic = LogisticRegression()
logistic.fit(X_train, Y_train)


In [None]:
# performance evaluaion - accuracy
logistic.score(X_test, Y_test)

**Question** What other metrics can we use to evaluate a classification? Are they more thorough?

Answer here

In [None]:
# performance evaluaion - ROC curve
fpr, tpr, _ = roc_curve(Y_test, logistic.decision_function(X_test))

plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='linear classif')
plt.legend(loc='best')



We have a lot of dimensions. Are they all useful? Let's apply a Principal Component Analysis. You can find a good introduction to PCA [here](https://web.stanford.edu/~hastie/Papers/ESLII.pdf), in setion 14.5. Very important: always standardize your variables before applying PCA. To do it in a clean way, we are using Scikit-learn' [pipelines](http://scikit-learn.org/stable/modules/pipeline.html#pipeline).

In [None]:
# Let's apply PCA
target_names = Y_train

std_clf = make_pipeline(StandardScaler(), PCA(n_components=500))
X_r = std_clf.fit(X_train).transform(X_train)

pca_std = std_clf.named_steps['pca']

In [None]:
# How do we choose the number of components: Percentage of variance explained for each components
plt.figure(1, figsize=(4, 3))
plt.clf()
plt.axes([.2, .2, .7, .7])
plt.plot(np.cumsum(pca_std.explained_variance_ratio_), linewidth=2)
plt.axis('tight')
plt.xlabel('n_components')
plt.ylabel('explained_variance_')

**Question** We have represented the cumulative explained variance as a function of the number of retained components. Does it keep improving at the same pace? When does the improvement slow down? What can you deduce for the number of informative components? What other criteria could be used to choose the number of components?

Answer here

In [None]:
# PCA can also be used to visualize data.
plt.figure(2)
plt.subplot(121)
colors = ['darkorange', 'navy']
lw = 2
for color, i, target_name in zip(colors, [1, 0], target_names):
    plt.scatter(X_r[Y_train == i, 0], X_r[Y_train == i, 1], color=color, alpha=.8, lw=lw,
                label=str(i))
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel('PCA1')
plt.ylabel('PCA2')
plt.title('PCA of P53 dataset (PC2 vs PC1)')

plt.subplot(122)
colors = ['darkorange', 'navy']
lw = 2
for color, i, target_name in zip(colors, [1, 0], target_names):
    plt.scatter(X_r[Y_train == i, 7], X_r[Y_train == i, 2], color=color, alpha=.8, lw=lw,
                label=str(i))
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel('PCA8')
plt.ylabel('PCA3')
plt.title('PCA of P53 dataset (PC3 vs PC8)')

plt.subplots_adjust(left=.1, wspace=1, top=.5)

**Question** Comment.

here

Now let's see how it affects classification performance

In [None]:
# let's choose a number of components to use in classification (try to vary it and look how it affects results.)
N_features = 90

# let's run the same logistic regression as before, with our new input features
logistic = LogisticRegression()
logistic.fit(X_r[:,:N_features], Y_train)

X_test_r = std_clf.transform(X_test)[:,:N_features]
fpr_r, tpr_r, _ = roc_curve(Y_test, logistic.decision_function(X_test_r))

print(logistic.score(X_test_r, Y_test))

plt.figure(1)
plt.plot([0, 1], [0, 1], 'k--')

plt.plot(fpr, tpr, label='linear classif')
plt.plot(fpr_r, tpr_r, label='PCA{} + linear classif'.format(N_features))
plt.legend(loc='best')

Depending on the seed, you may already have very good performances with the original features. With PCA, you have results at least as good with 50 times fewer dimensions!! As an exercise, you can also try to make 2 distinct PCAs for the 2 types of features in the dataset, as they may have different variances. Here you already have satisfying results, but that is a trick to remember.

There are many other methods to achieve reduction of dimension. See chapter 14 of the Elements of statistical learning, Hastie, Tibshirani, Friedman and scikit learn documentation [here](http://scikit-learn.org/stable/modules/decomposition.html) or [here](http://scikit-learn.org/stable/modules/manifold.html).

## Datasets origins
The **P53 dataset** is adapted from [UCI](https://archive.ics.uci.edu/ml/datasets/p53+Mutants), originally extracted from the following papers

Danziger, S.A., Baronio, R., Ho, L., Hall, L., Salmon, K., Hatfield, G.W., Kaiser, P., and Lathrop, R.H. (2009) Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning, PLOS Computational Biology, 5(9), e1000498 

Danziger, S.A., Zeng, J., Wang, Y., Brachmann, R.K. and Lathrop, R.H. (2007) Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants, Bioinformatics, 23(13), 104-114. 

Danziger, S.A., Swamidass, S.J., Zeng, J., Dearth, L.R., Lu, Q., Chen, J.H., Cheng, J., Hoang, V.P., Saigo, H., Luo, R., Baldi, P., Brachmann, R.K. and Lathrop, R.H. (2006) Functional census of mutation sequence spaces: the example of p53 cancer rescue mutants, IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 3, 114-125. 


The **Air quality dataset** is used as it is on [UCI](https://archive.ics.uci.edu/ml/datasets/Air+Quality#), originally extracted from the following paper

S. De Vito, E. Massera, M. Piga, L. Martinotto, G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005. 

## A few references to go further, but mostly practice ;)

https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.04-Feature-Engineering.ipynb 

https://github.com/dipanjanS/practical-machine-learning-with-python/blob/master/notebooks/Ch04_Feature_Engineering_and_Selection/Feature%20Engineering%20on%20Numeric%20Data.ipynb

https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

https://towardsdatascience.com/understanding-feature-engineering-part-4-deep-learning-methods-for-text-data-96c44370bbfa (series of 4 blog articles)