# DeepFM

- 논문
  - DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
- Tensorflow 구현 버전([링크](https://github.com/shenweichen/DeepCTR))
- PyTorch 버전([링크](https://github.com/shenweichen/DeepCTR-Torch))
- Factorization Machine 등 다양한 모델을 사용해 볼 수 있는 [torchfm 링크](https://pypi.org/project/torchfm/)

## DeepFM : A Factorization-Machine based Neural Network Network for CTR Prediction

![deep47](./image/deep47.PNG)

* **DeepLearning과 Factorization을 합쳐 놓았다.**
    - **Matrix Factorization**과 어떤 차이가 있는가?

## Abstract

![deep48](./image/deep48.PNG)

* **CTR이 높을 수록 고객이 관심이 많다는 것이고, 따라서 CTR이 높은 아이탬을 추천하자!**
    - 조금 구체적으로 말하자면, 논문에서는 CTR을 클릭 횟수, **즉, 클릭이 무조건 좋아서 클릭한 것은 아니다!** 

* Low와 High-order interactions 모두 학슬할 수 있다.
    - Deep And Wide 알고리즘도, Wide 부분에서 low interaction을, Deep부분에서 higher interaction을 예측할 수 있는데.
    <br><br>
    - 그렇다면, **Deep and Wide와 DeepFM은 무슨 차이**가 있을까?
        + 추가 feature engineering 없이 raw feature를 그대로 쓸 수 있다.
            - **Wide and Deep** : cross feature를 만들어야 함 --> 사람의 도메인지식이 필요하다.
                + Wide(low interaction)부분에서 feature engineering이 필요하다.
            - **DeepFM**
                + FM(low interaction) 부분에서 feature engineering이 필요없다.
    

## Introduction (1)

![deep49](./image/deep49.PNG)

## Introduction (2) : 현재까지 추천알고리즘 연구 정리

![deep50](./image/deep50.PNG)

* **Generalized Linear Model**
    - ex) **Matrix Factorization** : **High order feature interaction을 반영하기 어렵다.**
  

* **feature engineering이 필요 없고, Low와 high-order 모두 가능한 모델을 만들고 싶어서 제작!!**

## Contributions

![deep51](./image/deep51.PNG)

* Low-order, High-order 학습 가능
* 효율적인 학습!

## Our Approach

![deep52](./image/deep52.PNG)

* Input data(Sparse Feature)와 Dense Embedding을 공유한다.
* FM Layer과 Hidden Layer를 동시에 학습하고 Output을 출력한다.

## DeepFM

![deep53](./image/deep53.PNG)

## FM Component의 핵심

![deep54](./image/deep54.PNG)

## Deep Component

![deep55](./image/deep55.PNG)

## Relationship with the other Neural Networks

![deep56](./image/deep56.PNG)

## Experiments

![deep57](./image/deep57.PNG)

## Efficiency Comparison

![deep58](./image/deep58.PNG)

## Effictiveness Comparison

![deep59](./image/deep59.PNG)

## Hyper-Parameter Study - (1) (Project)할때 생각해보자

![deep60](./image/deep60.PNG)

## Hyper-Parameter Study - (2)  

![deep61](./image/deep61.PNG)

## Conclusions

![deep62](./image/deep62.PNG)

# [실습] DeepFM

- 논문
  - DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
- Tensorflow 구현 버전([링크](https://github.com/shenweichen/DeepCTR))
- PyTorch 버전([링크](https://github.com/shenweichen/DeepCTR-Torch))
- Factorization Machine 등 다양한 모델을 사용해 볼 수 있는 [torchfm 링크](https://pypi.org/project/torchfm/)

## torchfm

- `pip install torchfm`으로 바로 설치할 수 있다
- DeepFM 구현을 위한 모듈은 아래와 같이 정리할 수 있다

In [None]:
# !pip install torchfm

In [1]:
import numpy as np
import torch
import torch.nn.functional as F

C:\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
C:\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll


In [2]:
class FeaturesLinear(torch.nn.Module):

    def __init__(self, field_dims, output_dim=1):
        super().__init__()
        self.fc = torch.nn.Embedding(sum(field_dims), output_dim)
        self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return torch.sum(self.fc(x), dim=1) + self.bias


class FeaturesEmbedding(torch.nn.Module):

    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.embedding = torch.nn.Embedding(sum(field_dims), embed_dim)
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
        torch.nn.init.xavier_uniform_(self.embedding.weight.data)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return self.embedding(x)


class FactorizationMachine(torch.nn.Module):

    def __init__(self, reduce_sum=True):
        super().__init__()
        self.reduce_sum = reduce_sum

    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``
        """
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix


class MultiLayerPerceptron(torch.nn.Module):

    def __init__(self, input_dim, embed_dims, dropout, output_layer=True):
        super().__init__()
        layers = list()
        for embed_dim in embed_dims:
            layers.append(torch.nn.Linear(input_dim, embed_dim))
            layers.append(torch.nn.BatchNorm1d(embed_dim))
            layers.append(torch.nn.ReLU())
            layers.append(torch.nn.Dropout(p=dropout))
            input_dim = embed_dim
        if output_layer:
            layers.append(torch.nn.Linear(input_dim, 1))
        self.mlp = torch.nn.Sequential(*layers)

    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, embed_dim)``
        """
        return self.mlp(x)


In [3]:
class DeepFactorizationMachineModel(torch.nn.Module):
    """
    A pytorch implementation of DeepFM.

    Reference:
        H Guo, et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, 2017.
    """

    def __init__(self, field_dims, embed_dim, mlp_dims, dropout):
        super().__init__()
        self.linear = FeaturesLinear(field_dims)
        self.fm = FactorizationMachine(reduce_sum=True)
        self.embedding = FeaturesEmbedding(field_dims, embed_dim)
        self.embed_output_dim = len(field_dims) * embed_dim
        self.mlp = MultiLayerPerceptron(self.embed_output_dim, mlp_dims, dropout)

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        embed_x = self.embedding(x)
        x = self.linear(x) + self.fm(embed_x) + self.mlp(embed_x.view(-1, self.embed_output_dim))
        return torch.sigmoid(x.squeeze(1))

## Load dataset and Train model

In [4]:
from google.colab import drive
drive.mount('/content/drive')
data_path = '/content/drive/MyDrive/data/kmrd/kmr_dataset/datafile/kmrd-small'

Mounted at /content/drive


In [5]:
import torch.utils.data

class KMRDDataset(torch.utils.data.Dataset):
    def __init__(self, data_path):
        data = pd.read_csv(os.path.join(data_path,'rates.csv'))[:10000]
        
        user_to_index = {original: idx for idx, original in enumerate(data.user.unique())}
        movie_to_index = {original: idx for idx, original in enumerate(data.movie.unique())}
        data['user'] = data['user'].apply(lambda x: user_to_index[x])
        data['movie'] = data['movie'].apply(lambda x: movie_to_index[x])
        # [user, movie, rate] -> (user, movie, rate)
        data = data.to_numpy()[:, :3]

        self.items = data[:, :2].astype(np.int)  # -1 because ID begins from 1
        self.targets = self.__preprocess_target(data[:, 2]).astype(np.float32)
        self.field_dims = np.max(self.items, axis=0) + 1
        self.user_field_idx = np.array((0, ), dtype=np.long)
        self.item_field_idx = np.array((1,), dtype=np.long)

    def __len__(self):
        return self.targets.shape[0]

    def __getitem__(self, index):
        return self.items[index], self.targets[index]

    def __preprocess_target(self, target):
        target[target <= 9] = 0
        target[target > 9] = 1
        return target

In [13]:
import pandas as pd
import os
dataset = KMRDDataset(data_path=data_path)

In [14]:
print(dataset.item_field_idx)
print(dataset.field_dims)
print(sum(dataset.field_dims))
print(torch.nn.Embedding(sum(dataset.field_dims), 16))
print(torch.nn.Parameter(torch.zeros((1,))))
print(np.array((0, *np.cumsum(dataset.field_dims)[:-1]), dtype=np.long))

[1]
[466 532]
998
Embedding(998, 16)
Parameter containing:
tensor([0.], requires_grad=True)
[  0 466]


In [15]:
train_length = int(len(dataset) * 0.8)
valid_length = int(len(dataset) * 0.1)
test_length = len(dataset) - train_length - valid_length
train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(
    dataset, (train_length, valid_length, test_length))

In [16]:
from torch.utils.data import DataLoader

train_data_loader = DataLoader(train_dataset, batch_size=16)
valid_data_loader = DataLoader(valid_dataset, batch_size=16)
test_data_loader = DataLoader(test_dataset, batch_size=1)

In [17]:
print(dataset.items)
print(dataset.targets)

[[  0   0]
 [  0   1]
 [  0   2]
 ...
 [465  15]
 [465  15]
 [465 338]]
[0. 0. 0. ... 0. 0. 0.]


In [18]:
model = DeepFactorizationMachineModel(dataset.field_dims, embed_dim=16, mlp_dims=(16, 16), dropout=0.2)
model

DeepFactorizationMachineModel(
  (linear): FeaturesLinear(
    (fc): Embedding(998, 1)
  )
  (fm): FactorizationMachine()
  (embedding): FeaturesEmbedding(
    (embedding): Embedding(998, 16)
  )
  (mlp): MultiLayerPerceptron(
    (mlp): Sequential(
      (0): Linear(in_features=32, out_features=16, bias=True)
      (1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Dropout(p=0.2, inplace=False)
      (4): Linear(in_features=16, out_features=16, bias=True)
      (5): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): Dropout(p=0.2, inplace=False)
      (8): Linear(in_features=16, out_features=1, bias=True)
    )
  )
)

In [19]:
criterion = torch.nn.BCELoss()
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001, weight_decay=1e-6)

In [20]:
import tqdm
log_interval = 100

model.train()
total_loss = 0
tk0 = tqdm.tqdm(train_data_loader, smoothing=0, mininterval=1.0)
for i, (fields, target) in enumerate(tk0):
    # fields, target = fields.to(device), target.to(device)
    y = model(fields)
    loss = criterion(y, target.float())
    model.zero_grad()
    loss.backward()
    optimizer.step()
    total_loss += loss.item()
    if (i + 1) % log_interval == 0:
        tk0.set_postfix(loss=total_loss / log_interval)
        total_loss = 0

100%|██████████| 500/500 [00:01<00:00, 314.28it/s, loss=0.601]
