### I am building a Lookalike Model to recommend 3 similar customers based on their profiles and transaction histories. The model uses both customer and product information and assigns a similarity score to each recommended customer.

In [3]:
#Loading the datasets
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

In [4]:
#The datasets used in the model are:
customers_df = pd.read_csv("Customers.csv") #Contains customer information such as CustomerID, Region, and other demographic details.
products_df = pd.read_csv("Products.csv") #Contains product information like ProductID, Category, etc.
transactions_df = pd.read_csv("Transactions.csv") #Contains transaction data, including CustomerID, ProductID, TransactionID, and TotalValue.

Data Preprocessing:
To build the Lookalike Model, I perform several preprocessing steps:

1)Merging the data: I combine the Customers.csv, Products.csv, and Transactions.csv to get a comprehensive dataset that includes customer information, product information, and transaction details.

2)Feature Engineering: I create customer-based features such as:

Total money spent by the customer (total_spent).
Number of transactions (num_transactions).
Average transaction value (avg_transaction_value).

In [None]:
#Merging the data:
transactions_products_df = transactions_df.merge(products_df, on="ProductID", how="left")

In [4]:
full_data_df = transactions_products_df.merge(customers_df, on="CustomerID", how="left")


In [5]:
#That shows important purchase details for each customer using the transaction data.
customer_purchase_df = full_data_df.groupby("CustomerID").agg(
    total_spent=pd.NamedAgg(column="TotalValue", aggfunc="sum"),
    total_transactions=pd.NamedAgg(column="TransactionID", aggfunc="count"),
    unique_products=pd.NamedAgg(column="ProductID", aggfunc="nunique")
).reset_index()


In [6]:
# It Creates one-hot encoded region data for each customer ensuring unique rows.
region_encoded = pd.get_dummies(full_data_df[["CustomerID", "Region"]], columns=["Region"]).drop_duplicates()
#It Aggregates one-hot encoded product category data to show total purchases per category for each customer.
category_encoded = pd.get_dummies(full_data_df[["CustomerID", "Category"]], columns=["Category"]).groupby("CustomerID").sum().reset_index()


In [7]:
#It Joins all the customer data, including purchase details, region, and category information, into one table using CustomerID
customer_features_df = customer_purchase_df.merge(region_encoded, on="CustomerID", how="left").merge(category_encoded, on="CustomerID", how="left")


In [8]:
#Converts the customer data into a matrix format by removing the CustomerID column, leaving only the numerical features.
customer_matrix = customer_features_df.drop(columns=["CustomerID"]).values
similarity_matrix = cosine_similarity(customer_matrix)

In [9]:
#Created a DataFrame from the similarity matrix, setting CustomerID as both the row and column labels for easy lookup of similarity scores between customers
customer_similarity_df = pd.DataFrame(similarity_matrix, index=customer_features_df["CustomerID"], columns=customer_features_df["CustomerID"])


In [10]:
#It Selects the first 20 CustomerIDs from the customer_features_df DataFrame to focus on the top 20 customers for generating lookalike recommendations.
first_20_customers = customer_features_df["CustomerID"].head(20)


In [11]:
#It Finds and stores the top 3 similar customers for each of the first 20 customers based on similarity scores
lookalike_results = {}
for customer in first_20_customers:
    top_similar = customer_similarity_df[customer].drop(index=customer).nlargest(3)
    lookalike_results[customer] = [
        (sim_customer, top_similar[sim_customer]) for sim_customer in top_similar.index[:3]
    ]

In [12]:
#It Creates a DataFrame with top 3 lookalikes and similarity scores for each customer.
lookalike_df = pd.DataFrame.from_dict(
    {customer: [f"{sim_customer} (Score: {score})" for sim_customer, score in lookalikes]
     for customer, lookalikes in lookalike_results.items()},
    orient="index", columns=["Lookalike_1", "Lookalike_2", "Lookalike_3"]
)

In [16]:
#Saves the lookalike DataFrame to a CSV file and then reads it back into a DataFrame.
lookalike_df.to_csv("Shruti_S_Lookalike.csv", index_label="CustomerID")
lookalike_df = pd.read_csv("Shruti_S_Lookalike.csv")
lookalike_df


Unnamed: 0,CustomerID,CustomerID.1,Lookalike_1,Lookalike_2,Lookalike_3
0,0,C0001,C0112 (Score: 0.9999999009076495),C0190 (Score: 0.999999882230633),C0035 (Score: 0.9999998626594053)
1,1,C0002,C0134 (Score: 0.9999998529498997),C0103 (Score: 0.9999996089281792),C0106 (Score: 0.9999995689803689)
2,2,C0003,C0195 (Score: 0.9999999771328825),C0129 (Score: 0.9999999131825837),C0039 (Score: 0.999999912415006)
3,3,C0004,C0113 (Score: 0.9999999498747234),C0075 (Score: 0.9999999166630951),C0039 (Score: 0.9999999034280684)
4,4,C0005,C0146 (Score: 0.99999988617527),C0007 (Score: 0.9999998703086884),C0162 (Score: 0.9999998068548354)
5,5,C0006,C0082 (Score: 0.9999999765472458),C0187 (Score: 0.9999999604096683),C0185 (Score: 0.9999999050024936)
6,6,C0007,C0140 (Score: 0.9999999389319292),C0045 (Score: 0.9999999171740697),C0005 (Score: 0.9999998703086884)
7,7,C0008,C0098 (Score: 0.9999997918884213),C0116 (Score: 0.999999758017654),C0189 (Score: 0.999999704567475)
8,8,C0009,C0049 (Score: 0.9999990366562563),C0111 (Score: 0.9999979723595955),C0010 (Score: 0.9999979138683003)
9,9,C0010,C0034 (Score: 0.9999994508185107),C0111 (Score: 0.9999994266540899),C0030 (Score: 0.9999993608300406)


### I have successfully built a Lookalike Model that recommends the top 3 similar customers based on their transaction history and profile information.The results were saved in a CSV file, providing the lookalikes along with their similarity scores. This model can be used to identify customers with similar behaviors, which is useful for targeted marketing and personalized recommendations.