# Customer Lookalike Model Development

This notebook explains the development of a customer lookalike model that recommends similar customers based on their profiles and transaction history.

## 1. Data Loading and Preprocessing

The model uses three main data sources:
1. `customers.csv`: Customer profile information
2. `products.csv`: Product catalog information
3. `transactions.csv`: Customer transaction history

Let's examine the structure of each dataset:

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Load datasets
customers_df = pd.read_csv('customers.csv')
products_df = pd.read_csv('products.csv')
transactions_df = pd.read_csv('transactions.csv')

print("Customers Dataset:")
print(customers_df.head())
print("\nProducts Dataset:")
print(products_df.head())
print("\nTransactions Dataset:")
print(transactions_df.head())

## 2. Feature Engineering

The model combines various features to capture customer behavior:

### 2.1 Customer Profile Features
- Days on platform (calculated from signup date)
- Region (used for context but not in similarity calculation)

In [None]:
# Convert dates to datetime
customers_df['SignupDate'] = pd.to_datetime(customers_df['SignupDate'])
transactions_df['TransactionDate'] = pd.to_datetime(transactions_df['TransactionDate'])

# Calculate days on platform
current_date = pd.Timestamp('2025-01-27')
customers_df['DaysOnPlatform'] = (current_date - customers_df['SignupDate']).dt.days

### 2.2 Transaction Features
- Number of transactions
- Total quantity purchased
- Total amount spent
- Average transaction value
- Purchase frequency

In [None]:
# Aggregate transaction data
customer_transactions = transactions_df.groupby('CustomerID').agg({
    'TransactionID': 'count',
    'Quantity': 'sum',
    'TotalValue': 'sum',
    'TransactionDate': ['min', 'max']
}).reset_index()

# Calculate average transaction value and purchase frequency
customer_transactions['AvgTransactionValue'] = (
    customer_transactions['TotalValue'] / customer_transactions['TransactionID']
)

customer_transactions['PurchaseFrequencyDays'] = np.where(
    customer_transactions['TransactionID'] > 1,
    (customer_transactions['TransactionDate']['max'] - 
     customer_transactions['TransactionDate']['min']).dt.days / 
    (customer_transactions['TransactionID'] - 1),
    0
)

### 2.3 Category Preferences
Calculate spending distribution across product categories

In [None]:
# Merge transactions with products to get categories
trans_with_categories = pd.merge(
    transactions_df,
    products_df[['ProductID', 'Category']],
    on='ProductID'
)

# Calculate category preferences
category_spending = trans_with_categories.pivot_table(
    index='CustomerID',
    columns='Category',
    values='TotalValue',
    aggfunc='sum',
    fill_value=0
)

## 3. Similarity Calculation

The model uses cosine similarity to find similar customers based on their feature vectors:

In [None]:
def find_similar_customers(customer_features, customer_id, n_recommendations=3):
    """Find similar customers using cosine similarity."""
    # Get customer index
    customer_idx = customer_features[
        customer_features['CustomerID'] == customer_id
    ].index[0]
    
    # Calculate similarities
    similarities = cosine_similarity(
        customer_features_scaled[customer_idx].reshape(1, -1),
        customer_features_scaled
    )[0]
    
    # Get top similar customers
    similar_indices = np.argsort(similarities)[::-1][1:n_recommendations + 1]
    
    return [(customer_features.iloc[idx]['CustomerID'], similarities[idx])
            for idx in similar_indices]

## 4. Model Evaluation

The model's effectiveness can be evaluated by examining the recommendations:

In [None]:
# Example: Find similar customers for C0001
from lookalike_model import LookalikeModel

model = LookalikeModel()
model.load_data()
model.prepare_features()

target_profile, recommendations = model.find_similar_customers('C0001')
print("Target Customer:")
print(f"Name: {target_profile['CustomerName']}")
print(f"Region: {target_profile['Region']}")
print(f"Total Spent: ${target_profile['TotalSpent']:.2f}")
print("\nTop 3 Similar Customers:")
for rec in recommendations:
    print(f"\nCustomer: {rec['CustomerName']}")
    print(f"Similarity Score: {rec['Similarity']:.4f}")
    print(f"Region: {rec['Region']}")
    print(f"Total Spent: ${rec['TotalSpent']:.2f}")

## 5. Key Findings

1. The model successfully identifies customers with similar:
   - Shopping patterns (transaction frequency and value)
   - Category preferences
   - Overall spending behavior

2. Regional patterns emerge in the recommendations, suggesting geographical clustering of customer behavior

3. The model balances multiple factors:
   - Recent vs. historical behavior
   - Category-specific spending
   - Transaction patterns

4. Recommendations are provided with similarity scores, allowing for confidence-based filtering