# Lookalike Model for Customer Recommendation

## Introduction
In this task, we are asked to build a Lookalike Model to recommend customers who are similar to a given customer based on their purchase history and profile information. The goal is to identify the top 3 lookalike customers for each of the first 20 customers in the dataset.

We will use the `Cosine Similarity` measure to compare customers based on features such as their total spend, purchase frequency, and average transaction value.

### Step 1: Importing Libraries
We begin by importing the necessary libraries to process the data, perform the calculations, and visualize the results.


In [10]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# We will load the three datasets (Customers.csv, Products.csv, Transactions.csv), clean them, and merge them to create a unified dataset.

In [11]:
# Load datasets
customers = pd.read_csv('Customers.csv')
products = pd.read_csv('Products.csv')
transactions = pd.read_csv('Transactions.csv')


# Merging the data to combine customer info, product info, and transaction details


In [12]:
merged_data = transactions.merge(customers, on='CustomerID', how='left').merge(products, on='ProductID', how='left')


In [13]:
# Create additional features
customer_data = merged_data.groupby('CustomerID').agg(
    total_spend=('TotalValue', 'sum'),
    purchase_frequency=('TransactionID', 'nunique'),
    avg_transaction_value=('TotalValue', 'mean')
).reset_index()

In [14]:
# Check the first few rows of the new dataset
customer_data.head()

Unnamed: 0,CustomerID,total_spend,purchase_frequency,avg_transaction_value
0,C0001,3354.52,5,670.904
1,C0002,1862.74,4,465.685
2,C0003,2725.38,4,681.345
3,C0004,5354.88,8,669.36
4,C0005,2034.24,3,678.08


# We will scale the numerical features and, optionally, apply PCA for dimensionality reduction. Scaling ensures that the features are comparable, and PCA helps with reducing complexity.

In [15]:
# Scaling the features
scaler = StandardScaler()
customer_data_scaled = scaler.fit_transform(customer_data[['total_spend', 'purchase_frequency', 'avg_transaction_value']])

# Optional: Apply PCA for dimensionality reduction (if needed for visualization)
pca = PCA(n_components=2)  # Reduce to 2 dimensions for easier visualization
customer_data_pca = pca.fit_transform(customer_data_scaled)

# Add PCA components to the dataset
customer_data['pca1'] = customer_data_pca[:, 0]
customer_data['pca2'] = customer_data_pca[:, 1]

# Check the transformed data
customer_data[['CustomerID', 'pca1', 'pca2']].head()


Unnamed: 0,CustomerID,pca1,pca2
0,C0001,-0.078946,-0.050559
1,C0002,-1.273037,-0.492051
2,C0003,-0.567253,0.249271
3,C0004,1.481057,-0.847656
4,C0005,-1.103176,0.502019


# We will now calculate the Cosine Similarity between customers based on the features we created. The cosine similarity score will help us identify the most similar customers for a given customer.

In [16]:
# Compute cosine similarity between customers
cosine_sim = cosine_similarity(customer_data_scaled)

# Create a function to get top N lookalikes for each customer
def get_top_lookalikes(customer_id, top_n=3):
    idx = customer_data[customer_data['CustomerID'] == customer_id].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top N similar customers (excluding the customer itself)
    top_lookalikes = [customer_data.iloc[i[0]]['CustomerID'] for i in sim_scores[1:top_n+1]]
    top_scores = [i[1] for i in sim_scores[1:top_n+1]]
    
    return top_lookalikes, top_scores

# Example for CustomerID: C0001

In [17]:
get_top_lookalikes('C0001')

(['C0137', 'C0152', 'C0121'],
 [np.float64(0.9993600788417096),
  np.float64(0.9956575062125335),
  np.float64(0.9930123335059389)])

# We will generate the top 3 lookalikes for the first 20 customers and store them in the required format (Lookalike.csv).

In [18]:
# Generate top 3 lookalikes for customers C0001 to C0020
lookalike_results = {}

for customer_id in customer_data['CustomerID'][:20]:
    lookalikes, scores = get_top_lookalikes(customer_id)
    lookalike_results[customer_id] = {
        'Lookalike1': lookalikes[0], 'Score1': scores[0],
        'Lookalike2': lookalikes[1], 'Score2': scores[1],
        'Lookalike3': lookalikes[2], 'Score3': scores[2]
    }

# Convert the results to a DataFrame
lookalike_df = pd.DataFrame.from_dict(lookalike_results, orient='index')
lookalike_df.to_csv('Lookalike.csv')  # Save the recommendations

# Check the first few recommendations
lookalike_df.head()


Unnamed: 0,Lookalike1,Score1,Lookalike2,Score2,Lookalike3,Score3
C0001,C0137,0.99936,C0152,0.995658,C0121,0.993012
C0002,C0029,0.999638,C0199,0.998867,C0010,0.998831
C0003,C0005,0.999894,C0178,0.999565,C0144,0.999217
C0004,C0067,0.999991,C0021,0.999658,C0075,0.999288
C0005,C0003,0.999894,C0073,0.999495,C0063,0.999259


# Model Evaluation
The cosine similarity scores for each pair of customers are very high, indicating that the model is accurately identifying similar customers based on their purchasing behavior. Similarity scores close to 1 suggest near-identical customer profiles, which is ideal for recommending lookalikes.

# Business Insights:

By recommending customers with similar purchasing behaviors, businesses can create targeted marketing campaigns, offer personalized discounts, and increase customer retention.


# Conclusion
In this task, we successfully built a lookalike model that recommends the top 3 similar customers based on purchasing behavior. We calculated similarity scores using cosine similarity and saved the results in a CSV file as per the assignment instructions. The model provides valuable recommendations for targeted marketing or customer retention strategies.

