# 🛍️ H&M Personalized Fashion Recommendations - Vanilla Baseline


This notebook provides a **baseline recommendation system** for the H&M Personalized Fashion Recommendations Kaggle competition.

✅ Goals of this notebook:
- Use recent item popularity (last 7 days)
- Personalize for known customers using their past purchases
- Fill up to 12 recommendations per customer

We keep it **vanilla (non-model based)** to establish a strong starting point.


In [None]:
import pandas as pd
from collections import defaultdict

# Load data
transactions = pd.read_csv('transactions_train/transactions_train.csv', parse_dates=['t_dat'])
sample_sub = pd.read_csv('sample_submission/sample_submission.csv')


### Step 1: Compute Recent Popular Articles

In [2]:

# Use articles from the last 7 days for recent popularity
last_date = transactions['t_dat'].max()
recent_transactions = transactions[transactions['t_dat'] >= last_date - pd.Timedelta(days=7)]
popular_articles = recent_transactions['article_id'].astype(str).apply(lambda x: x.zfill(10)).value_counts().index.tolist()

# Get top 50 to be used for filling predictions
top_articles = popular_articles[:50]


In [3]:
top_articles

['0924243001',
 '0924243002',
 '0923758001',
 '0918522001',
 '0909370001',
 '0866731001',
 '0751471001',
 '0915529003',
 '0915529005',
 '0448509014',
 '0762846027',
 '0714790020',
 '0865799006',
 '0918292001',
 '0850917001',
 '0919273002',
 '0896169005',
 '0929275001',
 '0894780001',
 '0751471043',
 '0673677002',
 '0889550002',
 '0935541001',
 '0934835001',
 '0573085028',
 '0918525001',
 '0706016001',
 '0788575004',
 '0573085042',
 '0863583001',
 '0928206001',
 '0910601003',
 '0930380001',
 '0863646001',
 '0929165002',
 '0915526001',
 '0715624001',
 '0863595006',
 '0898692006',
 '0852584001',
 '0909059002',
 '0923340001',
 '0762846006',
 '0791587001',
 '0788575002',
 '0881942001',
 '0706016003',
 '0906352001',
 '0873279005',
 '0827968001']

###  Step 2: Get Customer Purchase History

In [4]:

# Get last bought articles per customer
customer_last_articles = (
    transactions
    .sort_values("t_dat", ascending=False)
    .drop_duplicates(subset=["customer_id", "article_id"])
    .groupby("customer_id")["article_id"]
    .apply(list)
)

# Convert article IDs to string format
customer_last_articles = customer_last_articles.apply(lambda x: [str(i).zfill(10) for i in x])


### Step 3: Generate Vanilla Recommendations

In [5]:

predictions = {}

for cust_id in sample_sub['customer_id']:
    cust_articles = customer_last_articles.get(cust_id, [])
    # Deduplicate past purchases
    seen = set()
    cust_articles = [x for x in cust_articles if not (x in seen or seen.add(x))]

    recs = cust_articles.copy()

    for art in top_articles:
        if len(recs) >= 12:
            break
        if art not in recs:
            recs.append(art)

    predictions[cust_id] = ' '.join(recs[:12])



In [6]:
predictions

{'00000dbacae5abe5e23885899a1fa44253a17956c6d1c3d25f88aa139fdfc657': '0568601043 0841260003 0887593002 0890498002 0795440001 0859416011 0694736004 0785710001 0812683013 0785186005 0797065001 0656719005',
 '0000423b00ade91418cceaf3b26c6af3dd342b51fd051eec9c12fb36984420fa': '0826211002 0351484002 0811925005 0811927004 0599580083 0559630026 0723529001 0811835004 0599580055 0751628002 0599580049 0759871002',
 '000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318': '0794321007 0858883002 0851400006 0750424014 0870304002 0852643001 0852643003 0727808007 0727808001 0723529001 0351484002 0578020002',
 '00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2c5feb1ca5dff07c43e': '0742079001 0732413001 0924243001 0924243002 0923758001 0918522001 0909370001 0866731001 0751471001 0915529003 0915529005 0448509014',
 '00006413d8573cd20ed7128e53b7b13819fe5cfc2d801fe7fc0f26dd8d65a85a': '0791587015 0927530004 0730683050 0896152002 0818320001 0827971001 0589440005 0399061015 0698286003 0707704003 0677

###  Step 4: Create Submission File

In [None]:

submission = sample_sub.copy()
submission['prediction'] = submission['customer_id'].map(predictions)

# Fallback for unknown customers
fallback = ' '.join(top_articles[:12])
submission['prediction'] = submission['prediction'].fillna(fallback)

# Save to file
submission.to_csv("vanilla_hm_submission.csv", index=False)
print(" Submission file saved: vanilla_hm_submission.csv")


✅ Submission file saved: vanilla_hm_submission.csv


### Step 5: Pre-check

In [8]:
# Check number of rows
print("Number of rows:", submission.shape[0])

# Check number of columns
print("Number of columns:", submission.shape[1])

# Preview header & first few rows
print("\nHeader:", submission.columns.tolist())
print("\nFirst 5 rows:")
print(submission.head())


Number of rows: 1371980
Number of columns: 2

Header: ['customer_id', 'prediction']

First 5 rows:
                                         customer_id  \
0  00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...   
1  0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...   
2  000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...   
3  00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...   
4  00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...   

                                          prediction  
0  0568601043 0841260003 0887593002 0890498002 07...  
1  0826211002 0351484002 0811925005 0811927004 05...  
2  0794321007 0858883002 0851400006 0750424014 08...  
3  0742079001 0732413001 0924243001 0924243002 09...  
4  0791587015 0927530004 0730683050 0896152002 08...  



## Summary
- This notebook gives a **strong vanilla baseline** using just purchase history and recent popularity.
- It handles both **seen and unseen customers** effectively.


