**PARSHAV SHARMA**

**ASSIGNMENT OF ZEOTAP**

Task 2: Lookalike Model
Build a Lookalike Model that takes a user's information as input and recommends 3 similar
customers based on their profile and transaction history. The model should:

● Use both customer and product information.

● Assign a similarity score to each recommended customer.

Deliverables:

● Give the top 3 lookalikes with there similarity scores for the first 20 customers
(CustomerID: C0001 - C0020) in Customers.csv. Form an “Lookalike.csv” which has
just one map: Map<cust_id, List<cust_id, score>>

● A Jupyter Notebook/Python script explaining your model development

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Load the data
customers_df = pd.read_csv('/content/Customers.csv')
products_df = pd.read_csv('/content/Products.csv')
transactions_df = pd.read_csv('/content/Transactions.csv')

# Merge datasets
transactions_with_products = transactions_df.merge(products_df, on='ProductID', how='left')
complete_data = transactions_with_products.merge(customers_df, on='CustomerID', how='left')

# Create customer profiles
customer_profiles = complete_data.groupby('CustomerID').agg(
    total_spent=('TotalValue', 'sum'),
    total_transactions=('TransactionID', 'count'),
    avg_transaction_value=('TotalValue', 'mean'),
    unique_categories=('Category', 'nunique'),
    most_frequent_category=('Category', lambda x: x.mode()[0] if not x.mode().empty else None),
    signup_date=('SignupDate', 'first'),
    region=('Region', 'first')
).reset_index()

# Standardize numerical features
features = ['total_spent', 'total_transactions', 'avg_transaction_value', 'unique_categories']
scaler = StandardScaler()
customer_profiles_scaled = customer_profiles.copy()
customer_profiles_scaled[features] = scaler.fit_transform(customer_profiles[features])

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(customer_profiles_scaled[features])
similarity_df = pd.DataFrame(similarity_matrix, index=customer_profiles['CustomerID'], columns=customer_profiles['CustomerID'])

# Function to get top N similar customers
def get_top_n_similar(customer_id, n=3):
    similar_customers = similarity_df[customer_id].sort_values(ascending=False).iloc[1:n+1]
    return list(zip(similar_customers.index, similar_customers.values))

# Generate lookalike data for customers C0001 to C0020
lookalike_data = {
    customer_id: get_top_n_similar(customer_id, n=3)
    for customer_id in customer_profiles['CustomerID'].iloc[:20]
}

# Save the results to Lookalike.csv
lookalike_output = []
for cust_id, lookalikes in lookalike_data.items():
    lookalike_output.append({
        "cust_id": cust_id,
        "lookalikes": str(lookalikes)  # Store as a string for CSV format
    })

lookalike_df = pd.DataFrame(lookalike_output)
lookalike_df.to_csv('Lookalike.csv', index=False)

print("Lookalike recommendations have been saved to Lookalike.csv")


Lookalike recommendations have been saved to Lookalike.csv
