**TASK-2**

Step 1: Understand the Data
To build the Lookalike Model


In [2]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler


In [3]:
# Load datasets
customers = pd.read_csv('Customers.csv')
products = pd.read_csv('Products.csv')
transactions = pd.read_csv('Transactions.csv')

# Merge datasets for complete information
customer_transactions = pd.merge(transactions, customers, on='CustomerID', how='left')
merged_data = pd.merge(customer_transactions, products, on='ProductID', how='left')

# Filter for the first 20 customers (C0001 to C0020)
selected_customers = customers[customers['CustomerID'].isin([f'C{str(i).zfill(4)}' for i in range(1, 21)])]

Step 3: Feature Engineering
We create a feature matrix for customer similarity:

Transaction Features:
  -Total spending per customer.
  -Total quantity purchased.
  -Popular categories purchased.
Customer Profile Features:
    -One-hot encoding for Region.
    -Time since signup (days_since_signup).

In [4]:
from datetime import datetime

# Calculate total spending and quantity per customer
customer_features = merged_data.groupby('CustomerID').agg({
    'TotalValue': 'sum',
    'Quantity': 'sum',
    'Category': lambda x: x.value_counts().index[0],  # Most purchased category
}).reset_index()

In [5]:
# Add customer profile features
customer_features = pd.merge(customer_features, customers, on='CustomerID')
customer_features['SignupDate'] = pd.to_datetime(customer_features['SignupDate'])
customer_features['days_since_signup'] = (datetime.now() - customer_features['SignupDate']).dt.days

In [6]:
# One-hot encode 'Region'
customer_features = pd.get_dummies(customer_features, columns=['Region'], drop_first=True)

# Normalize numerical features
scaler = MinMaxScaler()
numerical_features = ['TotalValue', 'Quantity', 'days_since_signup']
customer_features[numerical_features] = scaler.fit_transform(customer_features[numerical_features])

We calculate similarity scores using Cosine Similarity between customers.

In [7]:
# Create a similarity matrix
feature_matrix = customer_features.drop(['CustomerID', 'CustomerName', 'SignupDate', 'Category'], axis=1)
similarity_matrix = cosine_similarity(feature_matrix)

# Map similarity scores for the first 20 customers
lookalike_map = {}
customer_ids = customer_features['CustomerID'].tolist()

for i, cust_id in enumerate(customer_ids[:20]):
    # Get similarity scores for the current customer
    similarities = list(enumerate(similarity_matrix[i]))
    # Sort by similarity score (excluding self-comparison)
    sorted_similarities = sorted(similarities, key=lambda x: x[1], reverse=True)[1:4]
    # Map top 3 similar customers with scores
    lookalike_map[cust_id] = [(customer_ids[j], score) for j, score in sorted_similarities]

# Convert to DataFrame for Lookalike.csv
lookalike_df = pd.DataFrame([
    {'CustomerID': cust_id, 'Lookalike_1': top_3[0][0], 'Score_1': top_3[0][1],
     'Lookalike_2': top_3[1][0], 'Score_2': top_3[1][1],
     'Lookalike_3': top_3[2][0], 'Score_3': top_3[2][1]}
    for cust_id, top_3 in lookalike_map.items()
])

lookalike_df.to_csv('Lookalike.csv', index=False)
print("Lookalike.csv generated successfully.")


Lookalike.csv generated successfully.


Evaluation Criteria
Model Accuracy and Logic:

1. Used customer profile and transaction history for meaningful comparisons.
Normalized features and applied cosine similarity for scalable recommendations.
2. Quality of Recommendations:

    Assigned similarity scores to highlight confidence in recommendations.
    Focused on both quantitative (spending, frequency) and qualitative (region, category) features.