# Wide and Deep Neural Networks with Keras
## For general machine-learning problems

#### Summary

In their paper [Wide & Deep Learning for Recommender Systems](https://arxiv.org/pdf/1606.07792.pdf), Google outlines a neural network architecture that combines the benefits of two different models. Under a wide model, an extremely sparse matrix consisting primarily of cross-product transformations of features is linked directly to an output neuron. Under a deep model, the inputs are passed through Embedding layers, followed by consecutive densely connected hidden layers before finally arriving at the output. Under the combined Wide & Deep Model the benefits of both systems (good memorization and generalization respectively) are retained by combining the outputs and training the models consecutively, in this form:

![wad](wad.PNG)

This notebook demonstrates the building of a wide-and-deep model architecture in Keras and its use in making predictions against the [Adult Census](https://archive.ics.uci.edu/ml/datasets/adult) dataset from UCIMLR. Note that this is not necessarily the best method, and certainly not the easiest. Google in fact provides a pre-made classifier as part of TensorFlow called the [DNNLinearCombinedClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedClassifier) which should be your first port of call if you want to use such a technique. The intent in doing this so manually was simply to solidify my understanding of the method, and I hope that it can do the same for others too.

Let's delve straight in with the imports. I'm going to be using the functional API for Keras, so that's what I've imported. I'm also instantiating the dataset, and then outputting a `.head()`. to demonstrate the look of the dataset.

In [1]:
import pandas as pd
from keras.models import Model, Sequential
from keras.layers import Dense, Embedding, Input, Flatten, Reshape, Concatenate
from keras.utils import plot_model
import numpy as np
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


In [2]:
df = pd.read_csv('./presentation/adult.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
age               32561 non-null int64
workclass         32561 non-null object
fnlwgt            32561 non-null int64
education         32561 non-null object
education-num     32561 non-null int64
marital-status    32561 non-null object
occupation        32561 non-null object
relationship      32561 non-null object
race              32561 non-null object
sex               32561 non-null object
capital-gain      32561 non-null int64
capital-loss      32561 non-null int64
hours-per-week    32561 non-null int64
native-country    32561 non-null object
income            32561 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


In [4]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


The target is the 'income' column. We can see we have a number of continuous and categorical features, and we'll deal with these in different ways.

#### The Deep Model

This will comprise the continuous features in unaltered form, since they're already in an appropriate format. It will also include all of the categorical features, which will each be passed through its own [Embedding](https://www.tensorflow.org/guide/embedding) layer. I won't go into Embedding too much (follow the link to learn more), but suffice to say it's essentially a means of encoding each categorical feature into $N$ new features of floating point values. n is a number chosen by the user, and the actual values that represent each category are learned as part of the training process, just like the weights for Dense layers. In my case I'm copying Google's example and setting $N$ to 32. There are 8 categorical features, plus 6 continuous ones, which means the dense model will have 262 inputs in total.

#### The Wide Model

This will comprise two things;

1. Each of the 8 categorical features, one-hot encoded.
2. A cross product transformation of each of the features created in step 1. 

The cross-product is where things become especially interesting. This effectively creates new binary features in the form `and(featureA, featureB)`, `and(featureA, featureC)`, `and(featureB, featureC)` and so on. Remember though that the features here are not the original "workclass" / "occupation" etc but rather their one-hot encoded children, which means the features created will be of the form `and(workclass_State-gov, occupation_Adm-clerical)` and `and(workclass_Private, occupation_Handlers-cleaners)` and so on. As you can imagine; this makes for an extremely large and sparse matrix, some 10,500 new features in our case. 

We'll first define which columns fall into which bracket:

In [5]:
num_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
deep_cols = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
wide_cols = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']

And then define a function that one-hot encodes the wide columns and subsequently performs the cross-product transformations.

In [6]:
def build_wide_features(df, wide_cols):
    for col in wide_cols:
        df = pd.concat([df, pd.get_dummies(df[col], prefix=col)], axis=1)
        
    for col in wide_cols:
        for next_col in np.delete(wide_cols, np.argwhere(wide_cols==col)):
            for col_val in df[col].unique():
                for next_col_val in df[next_col].unique():
                    df[col + '_' + col_val + '_x_' + next_col + '_' + next_col_val] = df[col + '_' + col_val] & df[next_col + '_' + next_col_val]
                    
    df.drop(wide_cols, axis=1, inplace=True)
        
    return df

wide = build_wide_features(
    df=df.drop(
        [
            'age',
            'fnlwgt',
            'education-num',
            'capital-gain',
            'capital-loss',
            'hours-per-week',
            'income'
        ], axis=1),
    wide_cols=wide_cols
)

wide.shape

10,506 features. Oof. Next step is to do some preprocessing on the dataframe. We will...

1. Label-Encode the categorical columns; a necessary precursor to the Embedding
2. Strip out the target feature
3. Concatenate on our 10,506 new wide features so everything's in one dataframe

In [8]:
wide_x_cols = wide.columns

for col in deep_cols:
    df[col] = df[col].astype('category').cat.codes

y = df.income.astype('category').cat.codes
df = pd.concat([df, wide], axis=1).drop('income', axis=1)

And now for the actual magic, the definition of the model:

In [41]:
# First, the easy-peasy Input for our continuous features. We have 6 numeric features,
# so we just instantiate an Input of shape 6.
continuous_inputs = Input(shape=(6,), name='continuous-inputs')

# Next, we create an Input - Embed - Flatten system for each of the  categorical deep columns, and then merge them together
deep_inputs = []
deep_outputs = []

for col in deep_cols:
    i = Input(shape=(1,), name=col+'-inputs')
    deep_inputs.append(i)
    e = Embedding(df[col].nunique()+1, 32, input_length=1, name=col+'-embedding')(i)
    f = Flatten(name=col+'-flatten')(e)
    deep_outputs.append(f)
    
deep_merge = Concatenate(name='deep-merge')([continuous_inputs] + deep_outputs)

# Then add the three deep layers

d1 = Dense(64, activation='relu', name='deep-1024')(deep_merge)
d2 = Dense(32, activation='tanh', name='deep-512')(d1)
d3 = Dense(16, activation='relu', name='deep-256')(d2)

# Next we add the wide bit; a simple input matching the 10,506 dimensions of that
# part of our dataset plus an output layer with a single neuron.

wide_input = Input(shape=(wide.shape[1],), name='wide-inputs')
#wo = Dense(1, activation='relu', name='wide-out')(wide_input)

# And merge the two segments outputs into one unified output
wad_merge = Concatenate(name='wad-merge')([d3, wide_input])#([do, wo])
wado = Dense(1, activation='sigmoid', name='wide-and-deep-out')(wad_merge)

# And finally, compile the model
model = Model(inputs=[continuous_inputs] + [i for i in deep_inputs] + [wide_input],outputs=wado)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

We can print a summary to take a look at the structure we created:

In [42]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
workclass-inputs (InputLayer)   (None, 1)            0                                            
__________________________________________________________________________________________________
education-inputs (InputLayer)   (None, 1)            0                                            
__________________________________________________________________________________________________
marital-status-inputs (InputLay (None, 1)            0                                            
__________________________________________________________________________________________________
occupation-inputs (InputLayer)  (None, 1)            0                                            
__________________________________________________________________________________________________
relationsh

And yeah, that's looking good. It's also useful to take a look at a plot of the model, which we can optain with `plot_model(model, model.png)`:

![plot](model.png)

And if we compare that to the image from Google's paper about their final model architecture, you can see it's an easy match:

![gplot](google_model.png)

In [43]:
model.fit([df[num_cols]] + [df[[i]] for i in deep_cols] + [df[wide_x_cols]], y, batch_size=128, epochs=5, validation_split=0.25)

Train on 24420 samples, validate on 8141 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x287335b8f60>