# Lookalike Model for Customer Similarity


## Overview
This notebook builds a **Lookalike Model** to find similar customers based on profile and transaction history.

### Steps:
1. Load and merge datasets (`Customers.csv`, `Products.csv`, `Transactions.csv`)
2. Create customer profile embeddings using product preferences and demographics.
3. Compute customer similarity using **cosine similarity**.
4. Recommend top 3 similar customers for each of the first 20 customers (`C0001 - C0020`).
5. Save the results to `Lookalike.csv`.

---


In [None]:

# Import required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Load datasets
customers = pd.read_csv('Customers.csv')
products = pd.read_csv('Products.csv')
transactions = pd.read_csv('Transactions.csv')

# Convert dates to datetime format
customers['SignupDate'] = pd.to_datetime(customers['SignupDate'])
transactions['TransactionDate'] = pd.to_datetime(transactions['TransactionDate'])

# Merge transactions with customers and products
data = transactions.merge(customers, on='CustomerID').merge(products, on='ProductID')

# Display sample data
data.head()


In [None]:

# Create customer profile embeddings

# Aggregate transaction history by customer
customer_features = data.groupby(['CustomerID', 'Category'])['TotalValue'].sum().unstack(fill_value=0)

# Normalize data
scaler = StandardScaler()
customer_features_scaled = scaler.fit_transform(customer_features)

# Convert to DataFrame
customer_embeddings = pd.DataFrame(customer_features_scaled, index=customer_features.index)

# Display sample embeddings
customer_embeddings.head()


In [None]:

# Compute cosine similarity between customers
similarity_matrix = cosine_similarity(customer_embeddings)
similarity_df = pd.DataFrame(similarity_matrix, index=customer_embeddings.index, columns=customer_embeddings.index)

# Function to get top 3 similar customers
def get_top_lookalikes(customer_id, n=3):
    similar_customers = similarity_df[customer_id].sort_values(ascending=False).iloc[1:n+1]
    return list(zip(similar_customers.index, similar_customers.values))

# Generate lookalikes for the first 20 customers (C0001 - C0020)
lookalike_dict = {cust: get_top_lookalikes(cust) for cust in customers['CustomerID'][:20]}

# Convert to DataFrame
lookalike_df = pd.DataFrame([(k, v[0][0], v[0][1], v[1][0], v[1][1], v[2][0], v[2][1]) for k, v in lookalike_dict.items()], 
                            columns=['CustomerID', 'Lookalike1', 'Score1', 'Lookalike2', 'Score2', 'Lookalike3', 'Score3'])

# Save to CSV
lookalike_df.to_csv('Lookalike.csv', index=False)

# Display lookalike results
lookalike_df.head()
