**Task 2: Lookalike Model**

Build a Lookalike Model that takes a user's information as input and recommends 3 similar
customers based on their profile and transaction history.

The model should:

  ●
Use both customer and product information.

  ●
Assign a similarity score to each recommended customer.

Deliverables:

  ●
Give the top 3 lookalikes with there similarity scores for the first 20 customers
(CustomerID: C0001 - C0020) in Customers.csv. Form an “Lookalike.csv” which has
just one map: Map<cust
_
id, List<cust
_
id, score>>

●
A Jupyter Notebook/Python script explaining your model development.

Evaluation Criteria:

●
Model accuracy and logic.

●
Quality of recommendations and similarity scores.

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

In [3]:
# Load datasets
customers = pd.read_csv("Customers.csv")
products = pd.read_csv("Products.csv")
transactions = pd.read_csv("Transactions.csv")

In [4]:
# Merge datasets for feature engineering
merged_data = transactions.merge(products, on='ProductID', how='left')
merged_data = merged_data.merge(customers, on='CustomerID', how='left')

In [5]:
# Feature Engineering
# 1. Lifetime value (total spending)
lifetime_value = merged_data.groupby('CustomerID')['TotalValue'].sum().reset_index()
lifetime_value.columns = ['CustomerID', 'LifetimeValue']

In [6]:
# 2. Product preferences (category-wise spending)
category_spending = merged_data.groupby(['CustomerID', 'Category'])['TotalValue'].sum().unstack(fill_value=0).reset_index()


In [7]:
# 3. Frequency of purchases (number of transactions)
transaction_frequency = merged_data.groupby('CustomerID')['TransactionID'].count().reset_index()
transaction_frequency.columns = ['CustomerID', 'TransactionFrequency']

In [8]:
# Combine features into a single DataFrame
features = customers.merge(lifetime_value, on='CustomerID', how='left')
features = features.merge(transaction_frequency, on='CustomerID', how='left')
features = features.merge(category_spending, on='CustomerID', how='left')

In [9]:
# Fill missing values with 0
features.fillna(0, inplace=True)

In [10]:
# Normalize features for similarity calculation
scaler = MinMaxScaler()
feature_columns = features.select_dtypes(include=['number']).columns[2:]
features_normalized = pd.DataFrame(scaler.fit_transform(features[feature_columns]), columns=feature_columns, index=features.index)
features_normalized['CustomerID'] = features['CustomerID']

In [11]:
# Calculate similarity using cosine similarity
similarity_matrix = cosine_similarity(features_normalized.drop('CustomerID', axis=1))
similarity_df = pd.DataFrame(similarity_matrix, index=features['CustomerID'], columns=features['CustomerID'])

In [12]:
# Generate Lookalike Recommendations
lookalike_results = {}
for customer_id in features['CustomerID'][:20]:  # For the first 20 customers (C0001 to C0020)
    similar_customers = similarity_df[customer_id].sort_values(ascending=False).iloc[1:4]  # Top 3 similar customers
    lookalike_results[customer_id] = list(zip(similar_customers.index, similar_customers.values))

In [13]:
# Create Lookalike.csv
lookalike_df = pd.DataFrame({
    'CustomerID': lookalike_results.keys(),
    'Lookalikes': [str(value) for value in lookalike_results.values()]
})
lookalike_df.to_csv("Lookalike.csv", index=False)

print("Lookalike recommendations saved to Lookalike.csv")


Lookalike recommendations saved to Lookalike.csv
