**Task 2: Lookalike Model**

The goal is to find three similar customers for each customer

based on profile and transaction history.



**1.Data Preprocessing**

In [1]:
import pandas as pd

# Load the datasets
customers_df = pd.read_csv("Customers.csv")
transactions_df = pd.read_csv("Transactions.csv")

# Convert date columns to datetime format
customers_df["SignupDate"] = pd.to_datetime(customers_df["SignupDate"])
transactions_df["TransactionDate"] = pd.to_datetime(transactions_df["TransactionDate"])

# Aggregate transaction data per customer
customer_transactions = transactions_df.groupby("CustomerID").agg(
    total_spent=("TotalValue", "sum"),
    total_purchases=("TransactionID", "count"),
    avg_purchase_value=("TotalValue", "mean"),
).reset_index()

# Merge with customer profiles
customer_features = customers_df.merge(customer_transactions, on="CustomerID", how="left").fillna(0)


**2.Compute Customer Similarity**

We'll use cosine similarity to find customers who are most similar

based on transaction behavior.

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Select numerical features for similarity calculation
features = ["total_spent", "total_purchases", "avg_purchase_value"]

# Normalize the data
scaler = StandardScaler()
customer_features_scaled = scaler.fit_transform(customer_features[features])

# Compute similarity matrix
similarity_matrix = cosine_similarity(customer_features_scaled)
similarity_df = pd.DataFrame(similarity_matrix, index=customer_features["CustomerID"], columns=customer_features["CustomerID"])


**3.Find the 3 Most Similar Customers**

For each customer, find the top 3 most similar customers.

In [4]:
lookalike_results = {}

for customer_id in customer_features["CustomerID"]:
    # Get top 3 similar customers (excluding itself)
    similar_customers = similarity_df[customer_id].sort_values(ascending=False).iloc[1:4]
    lookalike_results[customer_id] = list(zip(similar_customers.index, similar_customers.values))

# Save as CSV
import csv

with open("Abhishek_Bitling_Lookalike.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["CustomerID", "LookalikeCustomers"])
    for key, value in lookalike_results.items():
        writer.writerow([key, value])