Explanation:

Data Loading and Preprocessing:

Load the Customers.csv, Products.csv, and Transactions.csv files into Pandas DataFrames.
Join the transactions with customer and product information to create a comprehensive DataFrame.
Create a pivot table to represent customer-product interactions, where rows represent customers and columns represent products, and the values represent the quantity of each product purchased by each customer.
Feature Scaling:

Scale the customer-product matrix using MinMaxScaler to ensure all features have the same scale, which is crucial for cosine similarity calculations.
Cosine Similarity Calculation:

Calculate the cosine similarity between customers based on their purchase history using sklearn.metrics.pairwise.cosine_similarity. Cosine similarity measures the cosine of the angle between two vectors, representing the similarity between customer purchase patterns.
Finding Lookalike Customers:

Define a function find_lookalikes to find the top N lookalike customers for a given customer ID.
Sort the similarity scores in descending order and select the top N lookalikes (excluding the customer itself).
Generating Lookalike Recommendations:

Iterate through the first 20 customers in the Customers.csv file.
For each customer, find their top 3 lookalikes using the find_lookalikes function.
Store the lookalike customer IDs and their similarity scores in a dictionary.
Saving Results to CSV:

Convert the dictionary of lookalikes into a Pandas DataFrame.
Save the DataFrame to a CSV file named "Lookalike.csv".


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Load the Datasets: First, we load the Customers.csv, Products.csv, and Transactions.csv files.

In [2]:

customers_df = pd.read_csv('Customers.csv')
products_df = pd.read_csv('Products.csv')
transactions_df = pd.read_csv('Transactions.csv')


Merge the Data: We need to combine customer and transaction data to get insights into the transaction history for each customer.

In [22]:
# 1. Data Preprocessing
# Join transactions with customer and product information
df = transactions_df.merge(customers_df, on='CustomerID')
df = df.merge(products_df, on='ProductID')

# Create a pivot table to represent customer-product interactions
customer_product_matrix = pd.pivot_table(df, index='CustomerID', columns='ProductID', values='Quantity', fill_value=0)


Feature Engineering: Create features based on customer profiles and transaction histories.

Customer Profile: Region, Signup Date, and CustomerID.

Transaction History: Number of transactions, total amount spent, product categories purchased, etc.

In [23]:

# 2. Feature Scaling
# Scale the customer-product matrix to a range of 0-1
scaler = MinMaxScaler()
scaled_matrix = scaler.fit_transform(customer_product_matrix)

To find similar customers, we use Cosine Similarity to measure the similarity between customers' profiles and transaction histories.

In [24]:
cosine_sim = cosine_similarity(scaled_matrix)


In [26]:
def find_lookalikes(customer_id, cosine_sim, top_n=3):
    """
    Finds the top N lookalike customers based on cosine similarity.

    Args:
        customer_id: The ID of the customer to find lookalikes for.
        cosine_sim: The cosine similarity matrix.
        top_n: The number of lookalike customers to return.

    Returns:
        A list of tuples, where each tuple contains the lookalike customer ID and their similarity score.
    """
    idx = customer_product_matrix.index.get_loc(customer_id)
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    lookalikes = sim_scores[1:top_n+1]  # Exclude the customer itself
    return lookalikes

In [25]:
# 5. Generate Lookalike Recommendations
lookalike_dict = {}
for customer_id in customers_df['CustomerID'][:20]:
    lookalikes = find_lookalikes(customer_id, cosine_sim)
    lookalike_dict[customer_id] = [(lookalike[0], lookalike[1]) for lookalike in lookalikes]

In [28]:
# 6. Save Results to CSV
lookalike_df = pd.DataFrame.from_dict(lookalike_dict, orient='index')  # Remove extra columns argument
lookalike_df.index.name = 'CustomerID'
lookalike_df.to_csv("Lookalike.csv")

# Print results for the first 5 customers
for i, customer_id in enumerate(customers_df['CustomerID'][:20]):
    print(f"Customer {customer_id}:")
    for lookalike in lookalike_dict[customer_id]:
        print(f"  - Lookalike: {lookalike[0]}, Similarity Score: {lookalike[1]}")
    print()

Customer C0001:
  - Lookalike: 96, Similarity Score: 0.5477225575051661
  - Lookalike: 192, Similarity Score: 0.469668218313862
  - Lookalike: 64, Similarity Score: 0.4208425284295062

Customer C0002:
  - Lookalike: 90, Similarity Score: 0.3801987652174059
  - Lookalike: 29, Similarity Score: 0.37282185960072
  - Lookalike: 70, Similarity Score: 0.329914439536929

Customer C0003:
  - Lookalike: 133, Similarity Score: 0.5199469468957452
  - Lookalike: 179, Similarity Score: 0.5175973113765044
  - Lookalike: 143, Similarity Score: 0.39999999999999997

Customer C0004:
  - Lookalike: 69, Similarity Score: 0.4525885718428285
  - Lookalike: 131, Similarity Score: 0.3765764544157698
  - Lookalike: 62, Similarity Score: 0.32930655646860507

Customer C0005:
  - Lookalike: 95, Similarity Score: 0.6482037235521645
  - Lookalike: 54, Similarity Score: 0.5144957554275265
  - Lookalike: 63, Similarity Score: 0.3328770246548891

Customer C0006:
  - Lookalike: 57, Similarity Score: 0.6488856845230502
