<h2><center>Lookalike Model Development<center><b>

<b>Task 2: Recommend top 3 similar customers for each of the first 20 customers based on their profile and transaction history.


<h3>Introduction

The goal is to develop a Lookalike Model for customer recommendations by identifying three similar customers for each of the first 20 customers in the dataset. This involves using both customer and transaction data, combining features such as total spending, transaction count, and top product category. Customer similarity will be calculated using cosine similarity, and the recommendations will be generated and saved in a structured format for easy interpretation.

<h3>Data Loading and Preprocessing


In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity

# Load the datasets
customers = pd.read_csv("Customers.csv")
products = pd.read_csv("Products.csv")
transactions = pd.read_csv("Transactions.csv")


Display the first few rows of each dataset to understand their structure

In [2]:
print("Customers Dataset Preview:")
display(customers.head())

Customers Dataset Preview:


Unnamed: 0,CustomerID,CustomerName,Region,SignupDate
0,C0001,Lawrence Carroll,South America,2022-07-10
1,C0002,Elizabeth Lutz,Asia,2022-02-13
2,C0003,Michael Rivera,South America,2024-03-07
3,C0004,Kathleen Rodriguez,South America,2022-10-09
4,C0005,Laura Weber,Asia,2022-08-15


In [3]:
print("Products Dataset Preview:")
display(products.head())

Products Dataset Preview:


Unnamed: 0,ProductID,ProductName,Category,Price
0,P001,ActiveWear Biography,Books,169.3
1,P002,ActiveWear Smartwatch,Electronics,346.3
2,P003,ComfortLiving Biography,Books,44.12
3,P004,BookWorld Rug,Home Decor,95.69
4,P005,TechPro T-Shirt,Clothing,429.31


In [4]:
print("Transactions Dataset Preview:")
display(transactions.head())



Transactions Dataset Preview:


Unnamed: 0,TransactionID,CustomerID,ProductID,TransactionDate,Quantity,TotalValue,Price
0,T00001,C0199,P067,2024-08-25 12:38:23,1,300.68,300.68
1,T00112,C0146,P067,2024-05-27 22:23:54,1,300.68,300.68
2,T00166,C0127,P067,2024-04-25 07:38:55,1,300.68,300.68
3,T00272,C0087,P067,2024-03-26 22:55:37,2,601.36,300.68
4,T00363,C0070,P067,2024-03-21 15:10:10,3,902.04,300.68


<h3>Feature Engineering

In this step, additional features are calculated for each customer to enhance the recommendation process. These features include Total Spending (TotalSpending), representing the total amount spent by the customer; Total Transaction Count (TransactionCount), indicating the total number of transactions made; and Most Purchased Product Category (TopCategory), which identifies the product category the customer purchased most frequently. These engineered features provide a comprehensive view of customer behavior for the model.









In [5]:

# Calculate total spending per customer
spending = transactions.groupby('CustomerID')['TotalValue'].sum().rename('TotalSpending')
spending

CustomerID
C0001    3354.52
C0002    1862.74
C0003    2725.38
C0004    5354.88
C0005    2034.24
          ...   
C0196    4982.88
C0197    1928.65
C0198     931.83
C0199    1979.28
C0200    4758.60
Name: TotalSpending, Length: 199, dtype: float64

In [6]:
# Calculate total transaction count per customer
transaction_count = transactions.groupby('CustomerID').size().rename('TransactionCount')
transaction_count

CustomerID
C0001    5
C0002    4
C0003    4
C0004    8
C0005    3
        ..
C0196    4
C0197    3
C0198    2
C0199    4
C0200    5
Name: TransactionCount, Length: 199, dtype: int64

In [7]:
# Get the most purchased product category per customer
merged = transactions.merge(products, on='ProductID', how='left')
top_category = (
    merged.groupby('CustomerID')['Category']
    .apply(lambda x: x.value_counts().idxmax())
    .rename('TopCategory')
)
top_category

CustomerID
C0001    Electronics
C0002     Home Decor
C0003     Home Decor
C0004          Books
C0005    Electronics
            ...     
C0196     Home Decor
C0197    Electronics
C0198    Electronics
C0199    Electronics
C0200       Clothing
Name: TopCategory, Length: 199, dtype: object

In [8]:
# Combine features into a single DataFrame
customer_features = customers.set_index('CustomerID')
customer_features = customer_features.join([spending, transaction_count, top_category])


In [9]:
# Drop non-numeric columns like CustomerName and SignupDate
customer_features = customer_features.drop(['CustomerName', 'SignupDate'], axis=1, errors='ignore')


In [10]:
# Encode categorical features
encoder = LabelEncoder()
customer_features['Region'] = encoder.fit_transform(customer_features['Region'])
customer_features['TopCategory'] = encoder.fit_transform(customer_features['TopCategory'].fillna('Unknown'))


In [11]:
# Fill missing values with 0
customer_features.fillna(0, inplace=True)


<H3>Scaling Features

In [12]:

# Scale features
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(customer_features)

print("Scaled Features Shape:", scaled_features.shape)



Scaled Features Shape: (200, 4)


<H3> Model Development

Cosine similarity is employed to compute pairwise similarity between customers by measuring the cosine of the angle between their feature vectors. This metric ranges from 0, indicating least similarity, to 1, indicating maximum similarity, and helps identify customers with similar behavioral patterns for recommendations.









In [13]:
# Calculate cosine similarity using cosine similarity
similarity_matrix = cosine_similarity(scaled_features)



<H3>Generating Lookalike Recommendations

In [14]:
# Generate top 3 lookalike customers for each of the first 20 customers
lookalikes = {}
for idx, customer_id in enumerate(customer_features.index[:20]):
    # Get similarity scores for the current customer
    similarity_scores = list(enumerate(similarity_matrix[idx]))
    # Sort by similarity score in descending order, excluding the customer itself
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    top_3 = [
        (customer_features.index[i], score)
        for i, score in similarity_scores[1:4]
    ]
    lookalikes[customer_id] = top_3



In [15]:
# Save results to Lookalike.csv
lookalike_df = pd.DataFrame({
    "CustomerID": lookalikes.keys(),
    "Lookalikes": [
        [{"CustomerID": cust_id, "Score": score} for cust_id, score in value]
        for value in lookalikes.values()
    ]
})
lookalike_df.to_csv("Nandhana_Rajeev_Lookalike.csv", index=False)




<H3>Results

In [16]:
# Display a preview of the lookalike recommendations
print("Lookalike Recommendations for the First 20 Customers:")

print(lookalike_df.head())

Lookalike Recommendations for the First 20 Customers:
  CustomerID                                         Lookalikes
0      C0001  [{'CustomerID': 'C0107', 'Score': 0.9998102200...
1      C0002  [{'CustomerID': 'C0186', 'Score': 0.9960609898...
2      C0003  [{'CustomerID': 'C0133', 'Score': 0.9999395988...
3      C0004  [{'CustomerID': 'C0132', 'Score': 0.9986039151...
4      C0005  [{'CustomerID': 'C0186', 'Score': 0.9991655638...


<H3>Conclusion

The Lookalike Model was successfully developed to recommend the top 3 similar customers for the first 20 customers based on their profiles and transaction histories. By combining engineered features such as demographics, transaction behavior, and product preferences, and utilizing cosine similarity to calculate similarity scores, the model effectively identified customer similarities. The recommendations are stored in Lookalike.csv for straightforward interpretation and analysis.







