**LOOKALIKE MODEL CODE**

•	DATA LOADING
Data was loaded from the given URLs with the help of pandas. These datasets serves as the foundation for constructing the lookalike model by providing data:

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
customers_url = "https://drive.google.com/uc?id=1bu_--mo79VdUG9oin4ybfFGRUSXAe-WE"
products_url = "https://drive.google.com/uc?id=1IKuDizVapw-hyktwfpoAoaGtHtTNHfd0"
transactions_url = "https://drive.google.com/uc?id=1saEqdbBB-vuk2hxoAf4TzDEsykdKlzbF"
customers = pd.read_csv(customers_url)
products = pd.read_csv(products_url)
transactions = pd.read_csv(transactions_url)


•	DATA MERGING
For developing a lookalike model we will need all the data in one dataset, so here we create a comprehensive dataset, ‘data’ by merging transactions with customer and product datasets on ‘CustomerID’ and ‘ProductID’:


In [None]:
# datasets Merging
data = transactions.merge(customers, on='CustomerID').merge(products, on='ProductID')

•	CUSTOMER PRODUCT MATRIX
A pivot table was created to represent the transaction values between customers and products. This serves as the foundation for computing customer similarity:


In [None]:
# customer-product matrix
customer_product_matrix = data.pivot_table(index='CustomerID', columns='ProductID', values='TotalValue', aggfunc='sum').fillna(0)

This matrix represents each customer as a vector of their spending across different products. It is crucial for calculating similarities as it aligns customers in a structured format based on their purchasing behavior.

•	NORMALIZATION
Each customer's transaction data was normalized by dividing each value by the sum of the row to account for differences in total spending:


In [None]:
# Normalize data
normalized_matrix = customer_product_matrix.div(customer_product_matrix.sum(axis=1), axis=0)

Customers with higher spending would otherwise dominate similarity calculations. Normalization ensures that the focus is on purchasing patterns rather than absolute spending.

•	COSINE SIMILARITY COMPUTATION
Cosine similarity was calculated to measure the similarity between customer vectors:


In [None]:
# Computing cosine similarity
similarity_matrix = cosine_similarity(normalized_matrix)
similarity_df = pd.DataFrame(similarity_matrix, index=normalized_matrix.index, columns=normalized_matrix.index)

It is ideal for high-dimensional data and focuses on the angle between vectors, making it suitable for comparing patterns of purchasing behavior regardless of magnitude.

•	GENERATING THE SIMILAR CUSTOMERS
For the first 20 customers (CustomerIDs C0001 - C0020), the top 3 most similar customers were identified:


In [None]:
# Get top 3 similar customers for each of the first 20 customers
lookalike_results = {}

for customer_id in similarity_df.index[:20]:
    similar_customers = similarity_df.loc[customer_id].sort_values(ascending=False).iloc[1:4]
    lookalike_results[customer_id] = [(sim_id, round(score, 2)) for sim_id, score in similar_customers.items()]

   CustomerID SimilarCustomerID  Score
0       C0001             C0050   0.53
1       C0001             C0100   0.53
2       C0001             C0105   0.52
3       C0002             C0109   0.54
4       C0002             C0079   0.53
5       C0002             C0117   0.52
6       C0003             C0181   0.62
7       C0003             C0186   0.56
8       C0003             C0067   0.55
9       C0004             C0063   0.46
10      C0004             C0070   0.44
11      C0004             C0076   0.38
12      C0005             C0096   0.67
13      C0005             C0192   0.64
14      C0005             C0072   0.63
15      C0006             C0058   0.68
16      C0006             C0040   0.66
17      C0006             C0046   0.43
18      C0007             C0020   0.58
19      C0007             C0031   0.44
20      C0007             C0079   0.42
21      C0008             C0165   0.48
22      C0008             C0169   0.42
23      C0008             C0143   0.38
24      C0009            

•	SAVING THE RESULT

In [None]:
# Save to CSV
lookalike_df = pd.DataFrame(
    [(cust, *sim) for cust, sims in lookalike_results.items() for sim in sims],
    columns=['CustomerID', 'SimilarCustomerID', 'Score']
)
lookalike_df.to_csv('FirstName_LastName_Lookalike.csv', index=False)
# Show some of the results
print(lookalike_df)