# xDeepFM

## Introduction

It is easy to think of the model is probably composed of three parts by seeing the word xDeepFM. 
They are, **x**, **deep**, **fm**. 
Corresponding to the context of the deep neural network, 
**deep** stands for **multilayer perceptron**, 
**fm** refers to **factorization machine**. 
x stands for extreme. 

It seems that this name appears to be the enhanced version of DeepFM, 
but in fact, and his real first-degree close relatives should be Deep&Cross Network, then DeepFM.

The structure of DeepFM is very intuitive. 
One of the main take-aways is that **FM** and **Deep** share the same input Embeddings.

## Motivations

So what problem the xDeepFM is mainly to solve? 
The primary role of the DCN's Cross-layer is to construct higher-order features automatically, but they exist in a bit-wise manner. 
For example, assume there are two fields, Age & Occupation, 
let's call the embedding vector corresponding to Age Field <a1, b1, c1>, 
embedding vector to Occupation field <a1, a2, a3>. 

In DCN, the Cross-layer takes the direct input as <a1,b1,c1,a2,b2,c2>, 
which is simply a concatenated vector of all bits in the embedding layer, 
however, the interaction between the bit and the field (where "bit" belongs to) has to be ignored entirely. 

Cross-layer uses "bit" as the finest granularity for learning, 
while FM uses "vector" as the finest units to learn the feature to feature interaction, i.e., vector-wise. 
xDeepFM is motivated to solve how to introduce the vector-wise learning, like what FM does into the cross-layer.

## Models
We design a new cross network, named Compressed Interaction
Network (CIN), with the following considerations: (1) interactions
are applied at vector-wise level, not at bit-wise level; (2) high-order
feature interactions is measured explicitly; (3) the complexity of
network will not grow exponentially with the degree of interactions.
Since an embedding vector is regarded as a unit for vector-wise
interactions, hereafter we formulate the output of field embedding
as a matrix X

0 ∈ R
m×D , where the i-th row in X
0
is the embedding
vector of the i-th field: X
0
i,∗
= ei
, and D is the dimension of the field
embedding. The output of the k-th layer in CIN is also a matrix
X
k ∈ R
Hk×D , whereHk denotes the number of (embedding) feature
vectors in the k-th layer and we let H0 = m. For each layer, X
k
are
Network Architecture: 
![xDeepFM](images/xDeepFM_arch.jpg)


In [3]:
import tensorflow as tf
import pandas as pd
import numpy as np
from recman.layers import CIN

In [None]:
tf.one_hot

In [11]:
cin_cross_layer_units = (16, 16)
embedding_size = 4
final_results = []

field_age = tf.constant([[1, 2, 3, 4]], dtype=tf.float32)
field_occ = tf.constant([[5, 6, 7, 8]], dtype=tf.float32)
field_size_0 = 2

field_input = tf.concat([field_age, field_occ], axis=1)
field_input = tf.reshape(field_input, shape=(1, 2, 4))  # 1 * 2 * 4

split_tensor_0 = tf.split(field_input, embedding_size * [1], axis=2)  # 4 * (1 * 2 * 1)
split_tensor = tf.split(field_input, embedding_size * [1], axis=2)  # 4 * (1 * 2 * 1)

dot_result_m_0 = tf.matmul(
    split_tensor_0, split_tensor, transpose_b=True
)  # 4 * 1 * 2 * 2
dot_result_o_0 = tf.reshape(
    dot_result_m_0, shape=[embedding_size, -1, field_size_0 * field_size_0]
)  # 4 * 1 * 4
dot_result_0 = tf.transpose(dot_result_o_0, perm=[1, 0, 2])  # 1 * 4 * 4

filter_0 = tf.ones(
    shape=(1, 2 * 2, cin_cross_layer_units[0])
)  # last dim 4 is the cross_layer_size
feat_map_0 = tf.nn.conv1d(
    dot_result_0, filters=filter_0, stride=1, padding="VALID"
)  # 1 * 4 * 8
# flip the row and column, 1 * 8 * 4
feat_map_transpose_0 = tf.transpose(feat_map_0, perm=[0, 2, 1])
# cut it into 2 parts up and down
next_hidden_0, direct_connect_0 = tf.split(
    feat_map_transpose_0, 2 * [cin_cross_layer_units[0] // 2], 1
)
field_size_1 = cin_cross_layer_units[0] // 2
final_results.append(direct_connect_0)

split_tensor_1 = tf.split(next_hidden_0, embedding_size * [1], axis=2)
dot_result_m_1 = tf.matmul(split_tensor_0, split_tensor_1, transpose_b=True) # 4 * 1 * 2 * 4
dot_result_o_1 = tf.reshape(dot_result_m_1, shape=[embedding_size, -1, field_size_0 * field_size_1]) # 4 * 2 * 4
dot_result_1 = tf.transpose(dot_result_o_1, perm=[1, 0, 2]) # 2 * 4 * 4

filter_1 = tf.ones(
    shape=(1, field_size_1 * field_size_0, cin_cross_layer_units[1])
) # 1 * 8 * 8
feat_map_1 = tf.nn.conv1d(dot_result_1, filters=filter_1, stride=1, padding="VALID") # 1 * 4 * 8

direct_connect_1 = tf.transpose(feat_map_1, perm=[0, 2, 1])
final_results.append(direct_connect_1)

result = tf.concat(final_results, axis=1)
result = tf.reduce_sum(result, axis=-1, keepdims=False)

In [10]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    r_field_input = sess.run(field_input)     
    r_split_tensor_0 = sess.run(split_tensor_0) # 4 None * 2 * 1
    r_dot_result_m_0 = sess.run(dot_result_m_0)
    r_dot_result_o_0 = sess.run(dot_result_o_0)
    r_dot_result_0 = sess.run(dot_result_0)
    r_feat_map_0 = sess.run(feat_map_0)
    r_feat_map_transpose_0 = sess.run(feat_map_transpose_0)
    r_next_hidden_0 = sess.run(next_hidden_0)
    r_split_tensor_1 = sess.run(split_tensor_1)
    r_dot_result_m_1 = sess.run(dot_result_m_1)
    r_dot_result_o_1 = sess.run(dot_result_o_1)
    r_dot_result_1 = sess.run(dot_result_1)
    r_feat_map_1 = sess.run(feat_map_1)
    r_direct_connect_1 = sess.run(direct_connect_1)
    r_result = sess.run(direct_connect_1)

In [None]:
print([[[1, 2, 3, 4]] * 4])



In [None]:
print([[[1, 2, 3, 4]] * 4])

