# DeepFM Model for Click-Through Rate Prediction

## Problem Statement
You are given a dataset for **Click-Through Rate (CTR) prediction**, which includes both **categorical and numerical features**. Your task is to **implement the DeepFM model** and train it on the provided dataset.

**DeepFM** is a recommendation model that combines:
- **Factorization Machines (FM)** for capturing **low-order feature interactions**.
- **A deep neural network (DNN)** for capturing **high-order feature interactions**.

### **Mathematical Formulation of DeepFM**
Given an input feature vector \( x \), the DeepFM model consists of:

#### **1. Linear Component (FM Part)**
The **linear term** of FM captures independent feature contributions:
$$y_{\text{linear}} = w_0 + \sum_{i=1}^{n} w_i x_i$$
where \( w_0 \) is the global bias, and \( w_i \) is the weight for feature \( x_i \).

#### **2. Factorization Machine (FM) Component**
The FM component models pairwise feature interactions using latent factor embeddings:
$$y_{\text{FM}} = \frac{1}{2} \sum_{i=1}^{n} \sum_{j=i+1}^{n} \langle v_i, v_j \rangle x_i x_j$$
where \( v_i \) and \( v_j \) are the learned embedding vectors for features \( i \) and \( j \).

An alternative formulation:
$$y_{\text{FM}} = \frac{1}{2} \sum_{i=1}^{n} \left( \sum_{j=1}^{k} v_{ij} x_j \right)^2 - \sum_{j=1}^{k} v_{ij}^2 x_j^2$$
captures second-order interactions efficiently.

#### **3. Deep Neural Network (DNN) Component**
The **DNN** component models higher-order feature interactions:


#### **4. Final Prediction Layer**
The final output of DeepFM is computed as:
$$y_{\text{hat}} = \sigma(y_{\text{linear}} + y_{\text{FM}} + y_{\text{DNN}})$$
where \( \sigma \) is the **sigmoid function** that outputs the probability of a click.

## **Instructions**
You need to complete the following sections in the given Python script:

1. **Implement the DeepFM model architecture** (`DeepFM` class).
2. **Implement the training loop** (`train_model` function).
3. **Ensure the model is trainable on the given dataset**.

Your implementation should:
- Implement the **FM and DNN components**.
- Combine both components in a final **prediction layer**.
- Train the model on the given dataset.

🚀 **Now, implement the missing parts and build the DeepFM model for CTR prediction!**


In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import pandas as pd
import numpy as np

from torchkeras import summary, kerasmodel

In [17]:
class DNN(nn.Module):
    def __init__(self, input_size, hidden_layers, output_size=1):
        super(DNN, self).__init__()
        # initiate layers
        layers = []
        prev_size = input_size
        
        # iterate hidden_layers and activations, append to layers
        for i in range(len(hidden_layers)):
            layers.append(nn.Linear(prev_size, hidden_layers[i]))
            layers.append(nn.ReLU())
            prev_size = hidden_layers[i]

        # nn.Sequential(*layers)
        layers.append(nn.Linear(prev_size, output_size))
        self.network = nn.Sequential(*layers)
        
    def forward(self, x):
        return self.network(x)

class FM(nn.Module):
    def __init__(self, feature_size, latent_size):
        super(FM, self).__init__()
        self.latent_size = latent_size

        self.b = nn.Parameter(torch.zeros([1, ]))
        self.w1 = nn.Parameter(torch.randn([feature_size, 1]))
        self.w2 = nn.Parameter(torch.randn([feature_size, latent_size]))

    def forward(self, x):
        first_order = torch.matmul(x, self.w1) + self.b
        second_order = 1/2 * torch.sum( \
                                (torch.pow(torch.matmul(x, self.w2), 2) - \
                                 torch.matmul(torch.pow(x, 2), torch.pow(self.w2, 2))), dim=1, keepdim=True)
        return first_order + second_order
        

# DeepFM Model (Fill in the missing parts)
class DeepFM(nn.Module):
    def __init__(self, input_dim, hidden_units, activations):
        super(DeepFM, self).__init__()
        # TODO: Implement network layers
        self.linear = nn.Linear(input_dim, 1)
        self.DNN = DNN(input_dim, hidden_units)
        self.FM = FM(input_dim, latent_dim)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        # TODO: Implement forward pass
        # 把离散特征和连续特征进行拼接作为FM和DNN的输入
        # wide
        wide_outputs = self.FM(x)
        # deep
        deep_outputs = self.DNN(x)
        output = self.sigmoid(torch.add(wide_outputs, deep_outputs)).view(-1)
        return output
        

# Training Function (Fill in the missing parts)
def train_model(model, train_loader, num_epochs=5, lr=0.001):
    # TODO: Define loss function and optimizer
    loss = nn.BCEloss()
    optimizer = optim.Adam(params=model.parameters(), lr=lr)
    
    for epoch in range(num_epochs):
        model.train()
        epoch_loss = 0
        for X_batch, y_batch in train_loader:
            predictions = model(X_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {epoch_loss / len(train_loader):.4f}")
    
    return model

In [18]:
input_dim = 128
hidden_units = [64, 32]
latent_dim = 10
model = DeepFM(input_dim, hidden_units, latent_dim)

In [19]:
summary(model, input_shape=(input_dim,))

--------------------------------------------------------------------------
Layer (type)                            Output Shape              Param #
FM-1                                         [-1, 1]                1,409
Linear-2                                    [-1, 64]                8,256
ReLU-3                                      [-1, 64]                    0
Linear-4                                    [-1, 32]                2,080
ReLU-5                                      [-1, 32]                    0
Linear-6                                     [-1, 1]                   33
Sigmoid-7                                    [-1, 1]                    0
Total params: 11,778
Trainable params: 11,778
Non-trainable params: 0
--------------------------------------------------------------------------
Input size (MB): 0.000488
Forward/backward pass size (MB): 0.001488
Params size (MB): 0.044930
Estimated Total Size (MB): 0.046906
---------------------------------------------------------



In [4]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import pandas as pd
import numpy as np

# Load dataset
def load_data(train_path):
    train_df = pd.read_csv(train_path)
    
    # Drop ID column
    train_df.drop(columns=['Id'], inplace=True)
    
    return train_df

# Preprocessing function
def preprocess_data(train_df):
    # Separate labels
    y_train = train_df['Label'].fillna(0).astype(int).values
    train_df = train_df.drop(columns=['Label'])
    
    # Fill missing numerical values with 0
    num_features = [col for col in train_df.columns if col.startswith("I")]
    train_df[num_features] = train_df[num_features].fillna(0)
    
    # Normalize numerical features
    scaler = MinMaxScaler()
    train_df[num_features] = scaler.fit_transform(train_df[num_features])
    
    # Encode categorical features
    cat_features = [col for col in train_df.columns if col.startswith("C")]
    for col in cat_features:
        train_df[col].fillna("missing", inplace=True)
        le = LabelEncoder()
        train_df[col] = le.fit_transform(train_df[col])
    
    return train_df, y_train

# PyTorch Dataset Class
class CTRDataset(Dataset):
    def __init__(self, X, y=None):
        self.X = torch.tensor(X.values, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.float32) if y is not None else None
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        if self.y is not None:
            return self.X[idx], self.y[idx]
        return self.X[idx]


# Load and preprocess data
train_df = load_data("/mnt/data/train.csv")
train_df, y_train = preprocess_data(train_df)

# Create DataLoaders
train_dataset = CTRDataset(train_df, y_train)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Initialize model
input_dim = train_df.shape[1]
hidden_units = [64, 32]
latent_dim = 10
model = DeepFM(input_dim, hidden_units, latent_dim)

# Train model
model = train_model(model, train_loader, num_epochs=5, lr=0.001)


FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/train.csv'