# Bundle Recommendation Model

## Objective
This notebook implements the core product bundle recommendation logic
using precomputed co-occurrence and time-aware features.

The model is:
- Rule-based
- Explainable
- Production-friendly

No evaluation or serving is performed here.


In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

In [2]:
BASE_DIR = Path().resolve().parent
FEATURE_PATH = BASE_DIR / "data" / "features"

co_df = pd.read_parquet(FEATURE_PATH / "co_occurrence.parquet")
weighted_co_df = pd.read_parquet(FEATURE_PATH / "weighted_co_occurrence.parquet")
popularity_df = pd.read_parquet(FEATURE_PATH / "product_popularity.parquet")

co_df.head()

Unnamed: 0,product_a,product_b,co_count
0,21730,22752,26
1,21730,71053,29
2,21730,84029E,24
3,21730,84029G,26
4,21730,84406B,23


In [3]:
co_df.shape, weighted_co_df.shape, popularity_df.shape

((3750528, 3), (3750528, 3), (3922, 2))

To enable fast recommendation lookup, we convert feature tables
into dictionary-based structures.

In [4]:
# Popularity lookup
popularity_map = dict(
    zip(popularity_df["StockCode"], popularity_df["popularity"])
)

# Co-occurrence lookup
co_map = {}

for _, row in weighted_co_df.iterrows():
    a, b, score = row["product_a"], row["product_b"], row["weighted_score"]
    
    co_map.setdefault(a, {})[b] = score
    co_map.setdefault(b, {})[a] = score

## Scoring Strategy

Each candidate product is scored based on:
- Time-aware co-occurrence strength
- Popularity normalization to reduce bias

Final Score:
score = co_occurrence_score / log(1 + product_popularity)


In [None]:
def score_product(target_product, candidate_product):
    co_score = co_map.get(target_product, {}).get(candidate_product, 0)
    pop = popularity_map.get(candidate_product, 1)
    
    return co_score / np.log1p(pop)

In [6]:
def recommend_bundle(product_id, top_k=5):
    if product_id not in co_map:
        return []
    
    candidates = co_map[product_id]
    
    scored = [
        (prod, score_product(product_id, prod))
        for prod in candidates
    ]
    
    scored = sorted(scored, key=lambda x: x[1], reverse=True)
    
    return scored[:top_k]


In [7]:
example_product = list(co_map.keys())[0]

recommend_bundle(example_product, top_k=5)


[('71477', np.float64(1.7948887125479207)),
 ('85123A', np.float64(1.573257266242966)),
 ('23313', np.float64(1.3029139004888437)),
 ('23322', np.float64(1.2740848082979412)),
 ('23355', np.float64(1.238314017860616))]

## Cold-Start Strategy

If a product has no co-occurrence history,
we fall back to globally popular products.


In [8]:
def cold_start_recommendation(top_k=5):
    popular = sorted(
        popularity_map.items(),
        key=lambda x: x[1],
        reverse=True
    )
    return popular[:top_k]


In [9]:
def get_bundle_recommendation(product_id, top_k=5):
    if product_id in co_map:
        return recommend_bundle(product_id, top_k)
    else:
        return cold_start_recommendation(top_k)


In [10]:
# Load mapping for display
clean_df = pd.read_parquet(
    BASE_DIR / "data" / "processed" / "clean_transactions.parquet"
)

product_name_map = (
    clean_df.groupby("StockCode")["Description"]
    .agg(lambda x: x.mode().iloc[0])
    .to_dict()
)


In [11]:
product_id = example_product

recommendations = get_bundle_recommendation(product_id, top_k=5)

[
    (product_name_map.get(pid, pid), round(score, 4))
    for pid, score in recommendations
]


[('COLOUR GLASS. STAR T-LIGHT HOLDER', np.float64(1.7949)),
 ('WHITE HANGING HEART T-LIGHT HOLDER', np.float64(1.5733)),
 ('VINTAGE CHRISTMAS BUNTING', np.float64(1.3029)),
 ('LARGE WHITE HEART OF WICKER', np.float64(1.2741)),
 ('HOT WATER BOTTLE KEEP CALM', np.float64(1.2383))]

## Model Summary

- Bundle recommendations based on co-purchase behavior
- Time-aware weighting favors recent trends
- Popularity normalization reduces bias
- Cold-start handled via popularity fallback

This model is suitable for offline evaluation and API deployment.
