# <b>Task 2</b>: Lookalike Model

Build a <b>Lookalike Model</b> that takes a user's information as input and recommends <b>3 similar
customers</b> based on their profile and transaction history. The model should:
- Use both <b>customer</b> and product information.
- Assign a <b>similarity</b> score to each recommended customer

## Loading the Dataset

In [1]:
import pandas as pd

customers = pd.read_csv("Customers.csv")
products = pd.read_csv("Products.csv")
transactions = pd.read_csv("Transactions.csv")

## Merging the files

In [2]:
merged = transactions.merge(customers, on="CustomerID", how="left").merge(products, on="ProductID", how="left")

## Aggregating the data

1. <b>Profile Information</b>:
   - Region
   - Signup Date
2. <b>Transaction Behavior</b>:
   - Total spending (TotalValue).
   - Total quantity purchased (Quantity).
3. <b>Product Preferences</b>:
   - Product category preferences.

In [3]:
import numpy as np

# Aggregate transaction data for customers
customer_features = merged.groupby('CustomerID').agg({
    'TotalValue': 'sum',
    'Quantity': 'sum',
    'Category': lambda x: x.value_counts().index[0],  # Most purchased product category
    'Region': 'first', 
    'SignupDate': 'first',  
}).reset_index()

# Convert 'SignupDate' to years since signup
customer_features['SignupDate'] = pd.to_datetime(customer_features['SignupDate'])
customer_features['YearsSinceSignup'] = (pd.Timestamp.now() - customer_features['SignupDate']).dt.days / 365

# One-hot encode region and category
customer_features = pd.get_dummies(customer_features, columns=['Region', 'Category'], drop_first=True)

## Defining Similarity

- <b>Demographic features</b>: Region, YearsSinceSignup.
- <b>Behavioral features</b>: Total spending, total quantity purchased, category preferences.

In [4]:
# Standardize features
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_features = scaler.fit_transform(customer_features.drop(['CustomerID', 'SignupDate'], axis=1))

## Building the Similarity (Lookalike) Model

Similarity Score used - <b>Cosine Similarity</b>

In [5]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity
similarity_matrix = cosine_similarity(scaled_features)

# Convert to DataFrame for easier lookup
similarity_df = pd.DataFrame(similarity_matrix, index=customer_features['CustomerID'], columns=customer_features['CustomerID'])

## Recommeding Similar Customers / Lookalikes

In [6]:
def recommend_similar_customers(customer_id, n=3):
    # Get similarity scores for the target customer
    scores = similarity_df.loc[customer_id].sort_values(ascending=False)
    
    # Exclude the target customer and get the top N similar customers
    top_customers = scores.iloc[1:n+1]
    
    return top_customers

# Recommend for the first customer in the customer_features DataFrame
example_customer_id = customer_features['CustomerID'].iloc[0]
recommendations = recommend_similar_customers(customer_id=example_customer_id, n=3)
print(recommendations)

CustomerID
C0184    0.996783
C0118    0.982304
C0107    0.971308
Name: C0001, dtype: float64


### Generating "Lookalike.csv" for the first 20 customers containing top 3 lookalikes with there similarity scores for each

In [7]:
lookalike_map = {}

for customer_id in customers['CustomerID'][:20]:  # First 20 customers
    # Get similarity scores for the target customer
    scores = similarity_df.loc[customer_id].sort_values(ascending=False)
    
    # Exclude the target customer and get the top 3 similar customers
    top_similar = scores.iloc[1:4]  # Top 3 excluding self
    
    # Add to lookalike map
    lookalike_map[customer_id] = [(similar_id, round(score, 4)) for similar_id, score in top_similar.items()]

# Prepare Lookalike Map as a CSV
lookalike_rows = [
    {"cust_id": cust_id, "lookalikes": str(lookalikes)}
    for cust_id, lookalikes in lookalike_map.items()
]

lookalike_df = pd.DataFrame(lookalike_rows)

# Save to CSV
lookalike_df.to_csv("AkulBharadwaj_BH_Lookalike.csv", index=False)
print("Lookalike.csv has been created successfully!")

Lookalike.csv has been created successfully!
