Task 2: Lookalike Model

Create a Similarity Model: The goal is to calculate the similarity between customers based on their transaction history and product

preferences. We use cosine similarity to compare customers.

Save the Results: After computing the top 3 similar customers for each of the first 20 customers, we store the results in a CSV file (Lookalike.csv).

In [1]:
import pandas as pd

customers = pd.read_csv('/content/Customers.csv')
products = pd.read_csv('/content/Products.csv')
transactions = pd.read_csv('/content/Transactions.csv')

In [2]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Merge customer and transaction data
customer_transactions = pd.merge(transactions, customers, on='CustomerID')

# Create a matrix of product preferences per customer (using pivot table)
customer_profile = customer_transactions.pivot_table(index='CustomerID', columns='ProductID', values='Quantity', aggfunc='sum', fill_value=0)

# Calculate cosine similarity between customers
similarity_matrix = cosine_similarity(customer_profile)

# Create a DataFrame to store similarity scores
similarity_df = pd.DataFrame(similarity_matrix, index=customer_profile.index, columns=customer_profile.index)

# Function to get top 3 similar customers for each customer
def get_similar_customers(customer_id, top_n=3):
    similar_customers = similarity_df[customer_id].sort_values(ascending=False)[1:top_n+1]
    return similar_customers.index.tolist(), similar_customers.values.tolist()

similar_customers, scores = get_similar_customers('C0001')
print(similar_customers, scores)


['C0097', 'C0194', 'C0199'] [0.5477225575051661, 0.469668218313862, 0.4381780460041329]


In [3]:
lookalike_data = []
for customer_id in customer_profile.index[:20]:  # For first 20 customers
    similar_customers, scores = get_similar_customers(customer_id)
    for cust, score in zip(similar_customers, scores):
        lookalike_data.append({'CustomerID': customer_id, 'LookalikeID': cust, 'Score': score})

lookalike_df = pd.DataFrame(lookalike_data)
lookalike_df.to_csv('Lookalike.csv', index=False)