# Lookalike Model: Customer Similarity Based on Profile and Transaction History

The objective of this task is to build a Lookalike Model. The model will assign a similarity score to customer.

### 1. Importing Required Libraries

We begin by importing the necessary libraries:

In [18]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler

### 2. Loading Data

Next, we load the datasets:


In [19]:
df_customers = pd.read_csv('Customers.csv')
df_products = pd.read_csv('Products.csv')
df_transactions = pd.read_csv('Transactions.csv')

### 3. Merging Data

To build a comprehensive dataset for similarity calculations, we merge the transactions, customer details, and product information.

In [20]:
merged_data = df_transactions.merge(df_customers, on='CustomerID')
merged_data = merged_data.merge(df_products, on='ProductID')

### 4. Extracting Customer Features  

To compute similarity between customers, we extract key behavioral features from the merged dataset.  

In [21]:
customer_features = merged_data.groupby('CustomerID').agg(
    total_spending=('TotalValue', 'sum'),
    transaction_count=('TransactionID', 'nunique'),
    avg_spending=('TotalValue', 'mean'),
).reset_index()

### 5. Scaling Customer Features  

To ensure that all features contribute equally to the similarity calculation, we standardize them using **StandardScaler**.

In [22]:
scaler = StandardScaler()
customer_features[['total_spending', 'transaction_count', 'avg_spending']] = scaler.fit_transform(
    customer_features[['total_spending', 'transaction_count', 'avg_spending']]
)

### 6. Computing Customer Similarity  

We calculate the similarity between customers using **cosine similarity**, which measures the cosine of the angle between feature vectors.

In [23]:
cos_sim = cosine_similarity(customer_features[['total_spending', 'transaction_count', 'avg_spending']])

### 7. Creating the Lookalike Recommendations File  

After computing customer similarity, we generate a structured dataset that maps each customer to their top lookalikes along with their similarity scores.

In [29]:
lookalike_df = pd.DataFrame(flattened_data, columns=['CustomerID', 'Lookalike_ID', 'Similarity_Score'])
lookalike_df.to_csv('Lookalike.csv', index=False, header=True)

### 8. Finding the Top 3 Lookalike Customers  

For each of the first 20 customers, we identify their 3 most similar customers based on cosine similarity.  

In [32]:
for i in range(20):
    customer_id = customer_features.iloc[i]['CustomerID']
    similarity_scores = cos_sim[i]


    similar_customers = sorted(
        [(customer_features.iloc[j]['CustomerID'], round(similarity_scores[j], 6))
         for j in range(len(similarity_scores)) if i != j],
        key=lambda x: x[1], reverse=True
    )[:3]

    lookalike_map[customer_id] = similar_customers

### 10. Saving the Lookalike Data  

After identifying the top 3 similar customers for each of the first 20 customers, we store the results in a structured CSV file.  



In [33]:
lookalike_df = pd.DataFrame({'CustomerID': list(lookalike_map.keys()), 'Lookalikes': list(lookalike_map.values())})
lookalike_df.to_csv('Lookalike.csv', index=False)