# Introduction
Wide and deep architect has been proven as one of deep learning applications combining memorization and generatlization in areas such as search and recommendation. Google released its [wide&deep learning](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) in 2016. 

* wide part: helps to memorize the past behaviour for specific choice
* deep part: embed into low dimension, help to discover new user, product combinations

Later, on top of wide & deep learning, [deepfm](https://arxiv.org/abs/1703.04247) was developed combining DNN model and Factorization machines, to furthur address the interactions among the features. 

## wide & deep model
![wide&deep learning](https://1.bp.blogspot.com/-Dw1mB9am1l8/V3MgtOzp3uI/AAAAAAAABGs/mP-3nZQCjWwdk6qCa5WraSpK8A7rSPj3ACLcB/s640/image04.png)
## deepFM model
![deepfm learning](https://www.researchgate.net/profile/Huifeng_Guo/publication/318829508/figure/fig1/AS:522607722467328@1501610798143/Wide-deep-architecture-of-DeepFM-The-wide-and-deep-component-share-the-same-input-raw.png)

## Comparison
In wide part of wide & deep learning, it is a logistic regression, which requires a lot of manual feature engineering efforts to generate the large-scale feature set for wide part. While the deepfm model instead has a shared embeded layers for both deep and fm parts, dot product between embeded features  also address the interactions.

## deepFM model in details
* 1st order factorization machines (summation of all embed layers)
    + numeric features with shape (None, 1) => dense layer => map to shape (None, 1)
    + categorical features (single level) with shape (None,1) => embedding layer (latent_dim = 1) => map to shape (None, 1)
    + categorical features (multi level) with shape (None,L) => embedding layer (latent_dim = 1) => map to shape (None, L)
    + output will summation of all embeded features, result in a tensor with shape (None, 1)
* 2nd order factorization machines (summation of dot product between embed layers)
    + numeric features => dense layer => map to shape (None, 1, k)
    + categorical features (single level) => embedding layer (latent_dim = k) => map to shape (None, 1, k)
    + categorical features (multi level) with shape (None,L) => embedding layer (latent_dim = k) => map to shape (None, L, k)
    + shared embed layer will be the concatenated layers of all embeded features
    + shared embed layer => dot layer => 2nd order of fm part
* deep part (DNN model on shared embed layers)
    + shared embed layer => dense layer => deep part
    

## preprocess data

The dataset used to implement deepfm is movieLens(ml-1m) data.    
To add more features to the ratings.csv, I joined the user features and movies features.
The features used are as below:
* numeric feature: user_fea3
* categorical feature (single level): uid, mid
* categorical feature (multi level): movie_genre

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

def load_ratings():
    COL_NAME = ['uid','mid','rating','timestamp']
    df = pd.read_csv('./dataset/ml-1m/ratings.dat',sep='::', header=None, engine='python', names=COL_NAME)
    return df

def load_movies():
    COL_NAME = ['mid','movie_name','movie_genre']
    df = pd.read_csv('./dataset/ml-1m/movies.dat',sep='::', header=None, engine='python', names=COL_NAME)
    return df

def load_users():
    COL_NAME = ['uid','user_fea1','user_fea2','user_fea3','user_fea4']
    df = pd.read_csv('./dataset/ml-1m/users.dat',sep='::', header=None, engine='python', names=COL_NAME)
    return df

def text2seq(text, n_genre):
    """ using tokenizer to encoded the multi-level categorical feature
    """
    tokenizer = Tokenizer(lower=True, split='|',filters='', num_words=n_genre)
    tokenizer.fit_on_texts(text)
    seq = tokenizer.texts_to_sequences(text)
    seq = pad_sequences(seq, maxlen=3,padding='post')
    return seq

n_genre = 15

ratings = load_ratings()
movies = load_movies()
users = load_users()

print("====== rating.dat ======")
print(ratings.head())
print("===== movies.dat ======")
print(movies.head())
print("====== users.dat ======")
print(users.head())

movies['movie_genre'] = text2seq(movies.movie_genre.values, n_genre=n_genre).tolist()

ratings = ratings.join(movies.set_index('mid'), on = 'mid', how = 'left')
ratings = ratings.join(users.set_index('uid'), on = 'uid', how = 'left')
print("====== preprocessed data =======")
(ratings.head())

   uid   mid  rating  timestamp
0    1  1193       5  978300760
1    1   661       3  978302109
2    1   914       3  978301968
3    1  3408       4  978300275
4    1  2355       5  978824291
   mid                          movie_name                   movie_genre
0    1                    Toy Story (1995)   Animation|Children's|Comedy
1    2                      Jumanji (1995)  Adventure|Children's|Fantasy
2    3             Grumpier Old Men (1995)                Comedy|Romance
3    4            Waiting to Exhale (1995)                  Comedy|Drama
4    5  Father of the Bride Part II (1995)                        Comedy
   uid user_fea1  user_fea2  user_fea3 user_fea4
0    1         F          1         10     48067
1    2         M         56         16     70072
2    3         M         25         15     55117
3    4         M         45          7     02460
4    5         M         25         20     55455
   mid                          movie_name movie_genre
0    1               

Unnamed: 0,uid,mid,rating,timestamp,movie_name,movie_genre,user_fea1,user_fea2,user_fea3,user_fea4
0,1,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),"[1, 0, 0]",F,1,10,48067
1,1,661,3,978302109,James and the Giant Peach (1996),"[9, 13, 0]",F,1,10,48067
2,1,914,3,978301968,My Fair Lady (1964),"[13, 5, 0]",F,1,10,48067
3,1,3408,4,978300275,Erin Brockovich (2000),"[1, 0, 0]",F,1,10,48067
4,1,2355,5,978824291,"Bug's Life, A (1998)","[9, 2, 0]",F,1,10,48067


## Construct model

* define input layers

``` python
# numerica features
fea3_input = Input((1,), name = 'input_fea3')
num_inputs = [fea4_input]
# single level categorical features
uid_input = Input((1,), name = 'input_uid')
mid_input = Input((1,), name= 'input_mid')
cat_sl_inputs = [uid_input, mid_input]

# multi level categorical features (with 3 genres at most)
genre_input = Input((3,), name = 'input_genre')
cat_ml_inputs = [genre_input]

inputs = num_inputs + cat_sl_inputs + cat_ml_inputs
```

* 1st order factorization machines

```python
# all tensors are reshape to (None, 1)
num_dense_1d = [Dense(1, name = 'num_dense_1d_fea4')(fea4_input)]
cat_sl_embed_1d = [Embedding(n_uid + 1, 1, name = 'cat_embed_1d_uid')(uid_input),
                    Embedding(n_mid + 1, 1, name = 'cat_embed_1d_mid')(mid_input)]
cat_ml_embed_1d = [Embedding(n_genre + 1, 1, name = 'cat_embed_1d_genre')(genre_input)]

cat_sl_embed_1d = [Reshape((1,))(i) for i in cat_sl_embed_1d]
cat_ml_embed_1d = [Flatten()(i) for i in cat_ml_embed_1d]

# add all tensors
y_fm_1d = Add(name = 'fm_1d_output')(num_dense_1d + cat_sl_embed_1d + cat_ml_embed_1d)
```
![title](./image/fm_model_1d.png)

* 2nd order factorization machines
the 2nd order fm can be simplified, using 
\begin{equation*}
\sum{x_ix_j} = \frac{1}{2} \left((\sum{x})^2 - \sum({x}^2)\right)
\end{equation*}

```python
# reshape all tensors to (None, k)
num_dense_2d = [Dense(k, name = 'num_dense_2d_fea4')(fea4_input)]
cat_sl_embed_2d = [Embedding(n_uid + 1, k, name = 'cat_embed_2d_uid')(uid_input), 
                   Embedding(n_mid + 1, k, name = 'cat_embed_2d_mid')(mid_input)]
cat_ml_embed_2d = [Embedding(n_genre + 1, k, name = 'cat_embed_2d_genre')(genre_input)]
cat_ml_embed_2d = [Lambda(lambda x: K.mean(x, axis=1), name = 'embed_2d_mean')(i) for i in cat_ml_embed_2d]

num_dense_2d = [Reshape((1,k))(i) for i in num_dense_2d]
cat_ml_embed_2d = [Reshape((1,k))(i) for i in cat_ml_embed_2d]

embed_2d = Concatenate(axis=1, name = 'concat_embed_2d')(num_dense_2d + cat_sl_embed_2d + cat_ml_embed_2d)

# calcuate the interactions by simplication
# sum of (x1*x2) = sum of (0.5*[(xi)^2 - (xi^2)])
tensor_sum = Lambda(lambda x: K.sum(x, axis = 1), name = 'sum_of_tensors')
tensor_square = Lambda(lambda x: K.square(x), name = 'square_of_tensors')

sum_of_embed = tensor_sum(embed_2d)
square_of_embed = tensor_square(embed_2d)

square_of_sum = Multiply()([sum_of_embed, sum_of_embed])
sum_of_square = tensor_sum(square_of_embed)

sub = Subtract()([square_of_sum, sum_of_square])
sub = Lambda(lambda x: x*0.5)(sub)
y_fm_2d = Reshape((1,), name = 'fm_2d_output')(tensor_sum(sub))
```
![title](./image/fm_model_2d.png)

* deep part

```python
# flat embed layers from 3D to 2D tensors
y_dnn = Flatten(name = 'flat_embed_2d')(embed_2d)
for h in dnn_dim:
    y_dnn = Dropout(dnn_dr)(y_dnn)
    y_dnn = Dense(h, activation='relu')(y_dnn)
y_dnn = Dense(1, activation='relu', name = 'deep_output')(y_dnn)

# combinded deep and fm parts
y = Concatenate()([y_fm_1d, y_fm_2d, y_dnn])
y = Dense(1, name = 'deepfm_output')(y)

fm_model_1d = Model(inputs, y_fm_1d)
fm_model_2d = Model(inputs, y_fm_2d)
deep_model = Model(inputs, y_dnn)
deep_fm_model = Model(inputs, y)
```
![title](./image/deep_model.png)

put together all parts:

![title](./image/deep_fm_model.png)

In [10]:
import tensorflow.keras.backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import *

def deep_fm_model(n_uid, n_mid, n_genre, k, dnn_dim, dnn_dr):
    # numerica features
    fea4_input = Input((1,), name = 'input_fea4')
    num_inputs = [fea4_input]
    # single level categorical features
    uid_input = Input((1,), name = 'input_uid')
    mid_input = Input((1,), name= 'input_mid')
    cat_sl_inputs = [uid_input, mid_input]

    # multi level categorical features (with 3 genres at most)
    genre_input = Input((3,), name = 'input_genre')
    cat_ml_inputs = [genre_input]

    inputs = num_inputs + cat_sl_inputs + cat_ml_inputs

    # first order fm
    # all tensors are reshape to (None, 1)
    num_dense_1d = [Dense(1, name = 'num_dense_1d_fea4')(fea4_input)]
    cat_sl_embed_1d = [Embedding(n_uid + 1, 1, name = 'cat_embed_1d_uid')(uid_input),
                        Embedding(n_mid + 1, 1, name = 'cat_embed_1d_mid')(mid_input)]
    cat_ml_embed_1d = [Embedding(n_genre + 1, 1, name = 'cat_embed_1d_genre')(genre_input)]

    cat_sl_embed_1d = [Reshape((1,))(i) for i in cat_sl_embed_1d]
    cat_ml_embed_1d = [Flatten()(i) for i in cat_ml_embed_1d]

    # add all tensors
    y_fm_1d = Add(name = 'fm_1d_output')(num_dense_1d + cat_sl_embed_1d + cat_ml_embed_1d)


    # second order fm
    # reshape all tensors to (None, k)
    num_dense_2d = [Dense(k, name = 'num_dense_2d_fea4')(fea4_input)]
    cat_sl_embed_2d = [Embedding(n_uid + 1, k, name = 'cat_embed_2d_uid')(uid_input), 
                       Embedding(n_mid + 1, k, name = 'cat_embed_2d_mid')(mid_input)]
    cat_ml_embed_2d = [Embedding(n_genre + 1, k, name = 'cat_embed_2d_genre')(genre_input)]
    cat_ml_embed_2d = [Lambda(lambda x: K.mean(x, axis=1), name = 'embed_2d_mean')(i) for i in cat_ml_embed_2d]

    num_dense_2d = [Reshape((1,k))(i) for i in num_dense_2d]
    cat_ml_embed_2d = [Reshape((1,k))(i) for i in cat_ml_embed_2d]

    embed_2d = Concatenate(axis=1, name = 'concat_embed_2d')(num_dense_2d + cat_sl_embed_2d + cat_ml_embed_2d)

    # calcuate the interactions by simplication
    # sum of (x1*x2) = sum of (0.5*[(xi)^2 - (xi^2)])
    tensor_sum = Lambda(lambda x: K.sum(x, axis = 1), name = 'sum_of_tensors')
    tensor_square = Lambda(lambda x: K.square(x), name = 'square_of_tensors')

    sum_of_embed = tensor_sum(embed_2d)
    square_of_embed = tensor_square(embed_2d)

    square_of_sum = Multiply()([sum_of_embed, sum_of_embed])
    sum_of_square = tensor_sum(square_of_embed)

    sub = Subtract()([square_of_sum, sum_of_square])
    sub = Lambda(lambda x: x*0.5)(sub)
    y_fm_2d = Reshape((1,), name = 'fm_2d_output')(tensor_sum(sub))


    # dnn part
    y_dnn = Flatten(name = 'flat_embed_2d')(embed_2d)
    for h in dnn_dim:
        y_dnn = Dropout(dnn_dr)(y_dnn)
        y_dnn = Dense(h, activation='relu')(y_dnn)
    y_dnn = Dense(1, activation='relu', name = 'deep_output')(y_dnn)

    # combinded deep and fm parts
    y = Concatenate()([y_fm_1d, y_fm_2d, y_dnn])
    y = Dense(1, name = 'deepfm_output')(y)

    fm_model_1d = Model(inputs, y_fm_1d)
    fm_model_2d = Model(inputs, y_fm_2d)
    deep_model = Model(inputs, y_dnn)
    deep_fm_model = Model(inputs, y)
    
    return fm_model_1d, fm_model_2d, deep_model, deep_fm_model

In [11]:
params = {
    'n_uid': ratings.uid.max(),
    'n_mid': ratings.mid.max(),
    'n_genre': 14,
    'k':20,
    'dnn_dim':[64,64],
    'dnn_dr': 0.5
}

fm_model_1d, fm_model_2d, deep_model, deep_fm_model = deep_fm_model(**params)

## Split Data

In [12]:
def df2xy(ratings):
    x = [ratings.user_fea3.values, 
         ratings.uid.values, 
         ratings.mid.values, 
         np.concatenate(ratings.movie_genre.values).reshape(-1,3)]
    y = ratings.rating.values
    return x,y

in_train_flag = np.random.random(len(ratings)) <= 0.9
train_data = ratings.loc[in_train_flag,]
valid_data = ratings.loc[~in_train_flag,]
train_x, train_y = df2xy(train_data)
valid_x, valid_y = df2xy(valid_data)

## Train Model

In [13]:
from tensorflow.keras.callbacks import TensorBoard, EarlyStopping, ModelCheckpoint
# train  model
deep_fm_model.compile(loss = 'MSE', optimizer='adam')
early_stop = EarlyStopping(monitor='val_loss', patience=3)
callbacks = [early_stop]
deep_fm_model.fit(train_x, train_y, epochs=30, batch_size=2048, validation_split=0.1, callbacks = callbacks)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 810082 samples, validate on 90010 samples
Epoch 1/1


<tensorflow.python.keras.callbacks.History at 0x7f9eeaa83978>

## Model Architect

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(fm_model_1d, to_file='./image/fm_model_1d.png',show_shapes=True, show_layer_names=True)
plot_model(fm_model_2d, to_file='./image/fm_model_2d.png',show_shapes=True, show_layer_names=True)
plot_model(deep_model, to_file='./image/deep_model.png',show_shapes=True, show_layer_names=True)
plot_model(deep_fm_model, to_file='./image/deep_fm_model.png',show_shapes=True, show_layer_names=True)