# Project Scenarion

- My project is a Product Recommendation System that uses Neural Collaborative Filtering (NCF) to automatically suggest relevant products to customers based on their past purchases.

In real business terms, this system can:
- Increase cross-selling and upselling opportunities.
- Help customers discover new products they’re likely to buy.
- Improve customer retention by offering personalized shopping experiences.
- Reduce manual work of creating static product bundles.

# Result Measure
- This project delivers personalized product recommendations by learning hidden patterns in customer purchase behavior, increasing the likelihood of repeat purchases and larger basket sizes.

# Quantifiable Measure, Where my project helps
- Developed a Neural Collaborative Filtering recommendation engine that analyzes customer purchase history to suggest top-N relevant products, expected to increase average order value by 10–15%


In [1]:
import pandas as pd
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader

# Load and Preprocess Data

In [4]:
# Read the csv
df = pd.read_excel('Online Retail.xlsx')
df.head(5)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [5]:
# Drop rows with missing CustomerID
df = df.dropna(subset=['CustomerID'])

# Map CustomerID and StockCode to indices
- I can’t use raw IDs like CustomerID = 12345 or StockCode = 'ABC123' directly in any Recommendation systems

# Why ?
- Neural networks need integer indices starting from 0 to build embeddings.
- An embedding layer expects inputs like 0, 1, 2, ..., N for users and 0, 1, 2, ..., M for items.
- Real CustomerID and StockCode are arbitrary and sparse,  so as engineers we must convert them to sequential indices.


In [6]:
# Map CustomerID and StockCode to indices
# Ensure CustomerID is integer
df['CustomerID'] = df['CustomerID'].astype(int)

# Create a mapping: real CustomerID -> index
customer_indexes = {cid: idx for idx, cid in enumerate(df['CustomerID'].unique())}

# Create a mapping: real StockCode -> index
product_indexes = {pid: idx for idx, pid in enumerate(df['StockCode'].unique())}

In [7]:
# Implicit interaction: if Quantity > 0 => 1
df['interaction'] = (df['Quantity'] > 0).astype(int)

In [10]:
df['customer_idx'] = df['CustomerID'].map(customer_indexes)
df['product_idx'] = df['StockCode'].map(product_indexes)

In [12]:
# Drop duplicate customer-product pairs, keep sum of Quantity or max
df = df.groupby(['customer_idx', 'product_idx']).agg({'interaction': 'max'}).reset_index()

In [13]:
num_customers = len(customer_indexes)
num_products = len(product_indexes)

# Create the dataset and Dataloader 

In [None]:
class PurchaseDataset(Dataset):
    def __init__(self, df):

        # Extract the customer_idx, values, make it an Numpy Array, convert it to Pytorch Tensor, required indexing
        self.users = torch.tensor(df['customer_idx'].values, dtype=torch.long)

        # Extract the Product Index, Convert it into a tensor, 
        self.items = torch.tensor(df['product_idx'].values, dtype=torch.long)

        # Exctract the Iteration column, Convert it into a tensor, usually floats needed for the loss function
        self.labels = torch.tensor(df['interaction'].values, dtype=torch.float32)

    def __len__(self):
        return len(self.users) # how many samples do I have, Return the number of users

    # Return a single sample, For each index, return the user id, take the Item Id, add the corresponding layer
    def __getitem__(self, idx):
        return self.users[idx], self.items[idx], self.labels[idx]

# Create the instance layer
dataset = PurchaseDataset(df)

# Wrap the dataset with the Pytorch loader, each batch will contain 1024 samples, the data will be randomly sampled
loader = DataLoader(dataset, batch_size=1024, shuffle=True)

# Define NCF model - Why Use NCF

## Why Neural Collaborative Filter
- I have a customer–product purchase matrix (users × items).
- It’s sparse — most customers haven’t bought most products.
- I want to predict which products each customer is likely to buy next,  so I can recommend top-N products.

## What is NCF
- Neural Collaborative Filtering (NCF) is a deep learning version of collaborative filtering.
- Instead of only using dot products (like classic Matrix Factorization), NCF learns user–item interactions with neural networks.
- NCF: Learns latent vectors and passes them through multiple non-linear layers → learns more complex matching functions.

### Why NCF
- NCF lets you learn non-linear user–item matching functions, so you can model:
- User behavior patterns.
- Item co-purchase relationships.
- Context.
- The output is a ranking score for each user–item pair → you rank products → recommend top-N.

### Process
- Input: The sparse customer–product matrix.
- Model: NCF (embedding layers + MLP).
- Output: For each customer, get scores for all products → pick top-N highest.

### Why am I using it
- I am using NCF because it improves over classic collaborative filtering by modeling non-linear interactions, which helps make better personalized recommendations.

In [15]:
# Define the NCF module
class NCF(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=50):
        super(NCF, self).__init__()

        # Embedding layer for users
        # Maps each user ID to a dense vector of size `embedding_dim`
        self.user_embedding = nn.Embedding(num_users, embedding_dim)

        # Embedding layer for items
        # Maps each item ID to a dense vector of size `embedding_dim`
        self.item_embedding = nn.Embedding(num_items, embedding_dim)

        # First Fully Connected Layer
        # Takes the concatenated user and item embeddings (size = embedding_dim * 2)
        # and transforms them to a hidden layer of size 128
        self.fc1 = nn.Linear(embedding_dim * 2, 128)

        # Second fully connected layer:
        # Further reduces the hidden layer size from 128 to 64
        self.fc2 = nn.Linear(128, 64)

        # Output layer:
        # Produces a single prediction score for the user-item pair
        self.output = nn.Linear(64, 1)

        #  Activation function:
        # Applies a sigmoid to squash output score to range [0, 1]
        self.activation = nn.Sigmoid()

    def forward(self, user, item):

        # Look up user embedding for the given user IDs
        u = self.user_embedding(user) # Shape: (batch_size, embedding_dim)

        # Concatenate user and item embeddings along the last dimension
        # So I have [user_emb | item_emb] → Shape: (batch_size, embedding_dim * 2)
        i = self.item_embedding(item)

        # Pass through first fully connected layer + ReLU non-linearity
        x = torch.cat([u, i], dim=-1)

        # Pass through first fully connected layer + ReLU non-linearity
        x = torch.relu(self.fc1(x))

        # Pass through second fully connected layer + ReLU non-linearity
        x = torch.relu(self.fc2(x))

        # Pass through output layer (linear transformation)
        x = self.output(x)

        # Apply sigmoid to squash score between 0 and 1
        return self.activation(x)
    
# Model Created
# num_customers = total number of unique customers
# num_products = total number of unique products
model = NCF(num_customers, num_products)

# Process
- Embeddings:
1. Turn discrete IDs (user & item IDs) into dense continuous vectors.
2. Each embedding learns latent factors representing user preferences and item properties.

- MLP layers:
1. After combining the embeddings, the model passes them through fully connected (dense) layers with ReLU.
2. This lets the network learn complex, non-linear interactions between user and item features.

- Output:
1. A single score between 0 and 1, predicting the probability that the user will interact with the item (e.g., purchase it).

#### What happens during training?
- Input: (user_id, item_id) pairs with labels (1 = purchased, 0 = not purchased).
- Loss: Usually binary cross-entropy, comparing predicted probability vs. true label.
- The embeddings + dense layers learn to map each user–item pair to the likelihood of interaction.



# Training Loop
Loss Function -> 
- The loss function BCELoss measures how well the model’s predicted probability matches the true label (1 = purchased, 0 = not purchased).
- It penalizes the model more when confident wrong predictions are made, guiding it to improve accuracy.

In [16]:
# Loss function
# # Binary Cross-Entropy Loss for binary labels (1 = purchased, 0 = not purchased)
criterion = nn.BCELoss()

# Optimizer
# Binary Cross-Entropy Loss for binary labels (1 = purchased, 0 = not purchased)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

#  Number of training epochs:
#  One epoch = one full pass through the training dataset
num_epochs = 50

# Training loop:
for epoch in range(num_epochs):

    # Put model in training mode (important if I have layers like dropout or batch norm)
    model.train()
    
    # Tracks total loss for this epoch
    total_loss = 0

    #  Loop over training batches:
    for users, items, labels in loader:
        
        # Forward pass: predict interaction probability for each (user, item) pair
        preds = model(users, items).squeeze()

        # Compute binary cross-entropy loss between predictions and true labels
        loss = criterion(preds, labels)

        # Backward pass: reset gradients to zero before computing new gradients
        optimizer.zero_grad()

        # Compute gradients via backpropagation
        loss.backward()

        # Update model parameters using gradients
        optimizer.step()

        # Accumulate loss for reporting
        total_loss += loss.item()

    # Print average loss for this epoch
    print(f"Epoch {epoch+1}/{num_epochs} - Loss: {total_loss:.4f}")

Epoch 1/50 - Loss: 13.0074
Epoch 2/50 - Loss: 4.3603
Epoch 3/50 - Loss: 3.7132
Epoch 4/50 - Loss: 3.1546
Epoch 5/50 - Loss: 2.7205
Epoch 6/50 - Loss: 2.3703
Epoch 7/50 - Loss: 2.0613
Epoch 8/50 - Loss: 1.7746
Epoch 9/50 - Loss: 1.5168
Epoch 10/50 - Loss: 1.2005
Epoch 11/50 - Loss: 0.9321
Epoch 12/50 - Loss: 0.6653
Epoch 13/50 - Loss: 0.4380
Epoch 14/50 - Loss: 0.2437
Epoch 15/50 - Loss: 0.1283
Epoch 16/50 - Loss: 0.0864
Epoch 17/50 - Loss: 0.0479
Epoch 18/50 - Loss: 0.0223
Epoch 19/50 - Loss: 0.0141
Epoch 20/50 - Loss: 0.0095
Epoch 21/50 - Loss: 0.0060
Epoch 22/50 - Loss: 0.0041
Epoch 23/50 - Loss: 0.0030
Epoch 24/50 - Loss: 0.0024
Epoch 25/50 - Loss: 0.0019
Epoch 26/50 - Loss: 0.0015
Epoch 27/50 - Loss: 0.0012
Epoch 28/50 - Loss: 0.0010
Epoch 29/50 - Loss: 0.0008
Epoch 30/50 - Loss: 0.0007
Epoch 31/50 - Loss: 0.0006
Epoch 32/50 - Loss: 0.0005
Epoch 33/50 - Loss: 0.0004
Epoch 34/50 - Loss: 0.0003
Epoch 35/50 - Loss: 0.0003
Epoch 36/50 - Loss: 0.0002
Epoch 37/50 - Loss: 0.0002
Epoch 38/

# Process
- BCELoss:
1. Binary Cross-Entropy is the right choice when the target is 0 or 1 (interaction or no interaction).
2. The model outputs probabilities [0,1] -> Sigmoid makes sure the output matches this range.

- Adam:
1. A popular optimizer for training deep networks.
2. It adapts the learning rate for each parameter, which often speeds up convergence.

- model.train():
1. Tells PyTorch you’re in training mode (important for layers like Dropout or BatchNorm, though I don’t have them here yet).

- for users, items, labels in loader:
1. loader is your DataLoader that yields mini-batches.
2. Each batch has:
3. users: tensor of user IDs
4. items: tensor of item IDs
5. labels: tensor of 0s and 1s 

-  .squeeze():
1. Removes extra dimensions to ensure the predicted tensor and the labels match shapes.

- .zero_grad() → .backward() → .step():
1. Classic PyTorch training cycle:
2. Clear gradients
3. Compute gradients with backprop
4. Update weights

- total_loss & print:
1. Tracks how well your model is training.
2. If Loss goes down over epochs → model is learning.


# Make recommendations

In [17]:
# Evaluate the model
model.eval()

NCF(
  (user_embedding): Embedding(4372, 50)
  (item_embedding): Embedding(3684, 50)
  (fc1): Linear(in_features=100, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (output): Linear(in_features=64, out_features=1, bias=True)
  (activation): Sigmoid()
)

# Recommend products function
- Takes a single user ID and returns the top-K recommended product IDs for that user
- Computes predicted scores for all products for that user, then picks the highest ones

In [18]:
def recommend_products(user_id, top_k=5): 
    
    # Look up the internal index for this user (if you mapped user IDs to indices)
    user_idx = customer_indexes[user_id]

    # Create a tensor with all possible product indices (0 to num_products - 1)
    item_indices = torch.arange(num_products)

    # Create a tensor repeating the user index for each product
    # Shape: [num_products]
    user_tensor = torch.tensor([user_idx] * num_products)

    #  Disable gradient tracking for inference (faster, uses less memory)
    with torch.no_grad():
        # Predict scores for all (user, product) pairs
        scores = model(user_tensor, item_indices).squeeze()
    
    #  Pick top-K product indices with highest scores
    top_items = torch.topk(scores, top_k).indices.tolist()

    # Convert internal product indices back to original product IDs (this part chatgtp generated failed too many times alone)
    recommended_products = [list(product_indexes.keys())[list(product_indexes.values()).index(i)] for i in top_items]

    return recommended_products

# What does it do
- It predicts which products a specific user will likely interact with next, scores all possible products, and returns the top-K with the highest predicted probabilities.

# Use the NLC for the 10 Best Customers

In [20]:
df = pd.read_excel('Online Retail.xlsx')

In [21]:
# Calculate total spending per customer
df['TotalSpending'] = df['Quantity'] * df['UnitPrice']

In [23]:
# Group by customer ID and create a df with customer spending and customer Id
customer_spending = df.groupby('CustomerID')['TotalSpending'].sum().sort_values(ascending=False)

In [24]:
# Get top 10 customer IDs
top_customers = customer_spending.head(10).index.tolist()

print("Top 10 customers:", top_customers)

Top 10 customers: [14646.0, 18102.0, 17450.0, 14911.0, 12415.0, 14156.0, 17511.0, 16684.0, 13694.0, 15311.0]


In [26]:
# Map the customer
# Map the customer with the index
top_customer_indices = [customer_indexes[cid] for cid in top_customers]
print("Mapped indices:", top_customer_indices)

Mapped indices: [908, 450, 402, 67, 979, 181, 16, 863, 37, 8]


# Recommend top 5 products for each customer

In [28]:
print("Recommend Products Based On the Stock Code")
for id in top_customers:
    recommended = recommend_products(id, top_k=5)
    print(f"Customer {id} -> Recommended: {recommended}")

Recommend Products Based On the Stock Code
Customer 14646.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 18102.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 17450.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 14911.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 12415.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 14156.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 17511.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 16684.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 13694.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']
Customer 15311.0 -> Recommended: ['84029G', '84029E', 71053, '85123A', '84406B']


# Back Up Quantifiable Claim

In [29]:
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

In [30]:
test_truth = test_df.groupby('CustomerID')['StockCode'].apply(set).to_dict()

In [31]:
# Example: top 5
K = 5

# For a few test customers
sample_customers = list(test_truth.keys())[:100]

precisions = []
recalls = []

for cid in sample_customers:
    true_items = test_truth[cid]
    if not true_items:
        continue

    rec_items = set(recommend_products(cid, top_k=K))

    num_hits = len(true_items & rec_items)

    precision = num_hits / K
    recall = num_hits / len(true_items)

    precisions.append(precision)
    recalls.append(recall)


In [32]:
avg_precision_at_k = sum(precisions) / len(precisions)
avg_recall_at_k = sum(recalls) / len(recalls)

print(f"Precision@{K}: {avg_precision_at_k:.4f}")
print(f"Recall@{K}: {avg_recall_at_k:.4f}")


Precision@5: 0.0040
Recall@5: 0.0010
