In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from datetime import datetime

In [None]:
customers = pd.read_csv("Customers.csv")
products = pd.read_csv("Products.csv")
transactions = pd.read_csv("Transactions.csv")

In [None]:
transactions['TransactionDate'] = pd.to_datetime(transactions['TransactionDate'])
customers['SignupDate'] = pd.to_datetime(customers['SignupDate'], format='%d-%m-%Y')

merged_data = transactions.merge(customers, on='CustomerID').merge(products, on='ProductID')

In [None]:
customer_profile = merged_data.pivot_table(index='CustomerID', 
                                            columns='Category', 
                                            values='TotalValue', 
                                            aggfunc='sum', 
                                            fill_value=0)

customer_profile['Total_Spending'] = merged_data.groupby('CustomerID')['TotalValue'].sum()
customer_profile['Transaction_Count'] = merged_data.groupby('CustomerID')['TransactionID'].count()

last_purchase_date = merged_data.groupby('CustomerID')['TransactionDate'].max()
recency_days = (datetime.now() - last_purchase_date).dt.days
customer_profile['Recency'] = recency_days

In [None]:
scaler = StandardScaler()
scaled_features = scaler.fit_transform(customer_profile)

cosine_similarities = cosine_similarity(scaled_features)

similarity_df_cosine = pd.DataFrame(cosine_similarities, index=customer_profile.index, columns=customer_profile.index)

In [None]:
kmeans = KMeans(n_clusters=5, random_state=42)
clusters = kmeans.fit_predict(scaled_features)
customer_profile['Cluster'] = clusters

In [None]:
lookalike_map_cosine = {}
lookalike_map_euclidean = {}

for customer_id in customer_profile.index[:20]: 
    # Get top 3 lookalikes based on cosine similarity (excluding self)
    top_lookalikes_cosine = similarity_df_cosine[customer_id].nlargest(4).iloc[1:]
    lookalike_map_cosine[customer_id] = list(zip(top_lookalikes_cosine.index, top_lookalikes_cosine.values))

In [10]:
lookalike_list_cosine = []
lookalike_list_euclidean = []

for cust_id, lookalikes in lookalike_map_cosine.items():
    for lookalike in lookalikes:
        lookalike_list_cosine.append({'cust_id': cust_id, 'lookalike_cust_id': lookalike[0], 'similarity_score': lookalike[1]})

pd.DataFrame(lookalike_list_cosine).to_csv("Lookalike.csv", index=False)

In this project, I developed a Lookalike Model to identify customers with similar characteristics and purchasing behaviors based on their profiles and transaction histories. Here’s a detailed breakdown of the steps I took:

### 1. Data Loading
I began by loading three essential datasets: **Customers**, **Products**, and **Transactions**. This foundational step was crucial for gathering all necessary information about customer demographics, product details, and transaction history.

### 2. Data Exploration
Next, I explored the datasets to understand their structure and contents. I examined the first few rows of each dataset to get an overview of customer demographics, product details, and transaction records. I also checked for missing values and data types to ensure data quality.

### 3. Feature Engineering
I created a comprehensive customer profile by aggregating transaction data. This involved calculating total spending per customer, counting the number of transactions, and determining the recency of purchases. These features are vital for understanding customer behavior and forming meaningful comparisons.

### 4. Similarity Calculations
To find similar customers, I employed **cosine similarity** metric. Cosine similarity measures how closely related two customers are based on their spending patterns.

### 5. Clustering
I applied K-means clustering to group customers into segments based on their profiles. This clustering allowed me to refine my recommendations by ensuring that similar customers were clustered together, enhancing the relevance of suggested lookalikes.

### 6. Generating Lookalikes
For each of the first 20 customers in my dataset, I identified the top three lookalikes based on both similarity metrics. This step directly addressed my goal of providing actionable recommendations for identifying potential customers who share similar traits.

### 7. Output Generation
Finally, I compiled my findings into structured CSV files for easy access and further analysis. These outputs can be utilized to inform future marketing strategies or customer engagement efforts.

### Conclusion
Through this structured approach, I successfully created a Lookalike Model that identifies similar customers based on their profiles and transaction histories. The model's similarity metric and clustering techniques enhance its effectiveness, making it a valuable tool for understanding customer behavior and improving targeting strategies.