# **Lookalike Model**

Build a Lookalike Model that takes a user's information as input and recommends 3 similar customers based on their profile and transaction history. The model should: Use both customer and product information, and Assign a similarity score to each recommended customer.



### **Technique:**

Some of the potential modelling techniques we could use are: K-Means Clustering, Hierarchical Clustering, Collaborative Filtering: Support Vector Decomposition, Graph Neural Network, Similarity Based Lookalike Model, etc. Here, we would prefer the use of Similarity-Based Lookalike Model for the following reasons:
1. Models patterns based on the feature values.
2. It doesn't require labelled data, i.e. it is an unsupervised learning algorithm.
3. Uses similarity measure such as a simple cosine similarity in our case to compare customer vectors. Cosine similarity is based on the direction of feature vectors rather than their magnitude: in our case we want to get users with similar spending pattern which is based on behavioural attributes thus making it the best choice at hand.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
customers = pd.read_csv('drive/MyDrive/Zeotap Assignment/Data/Customers.csv')
transactions = pd.read_csv('drive/MyDrive/Zeotap Assignment/Data/Transactions.csv')
products = pd.read_csv('drive/MyDrive/Zeotap Assignment/Data/Products.csv')

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder, MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

transactions = pd.merge(
    transactions,
    products[['ProductID', 'ProductName', 'Category', 'Price']],
    on='ProductID',
    how='left',
    suffixes=('_transaction', '_product')
)

transactions = transactions.drop(columns=['Price_transaction'])
transactions = transactions.rename(columns={'Price_product': 'Price'})
customer_transactions = transactions.groupby('CustomerID').agg({
    'TotalValue': 'sum',
    'Quantity': 'sum',
    'Price': 'mean',
    'Category': lambda x: x.mode()[0]
}).reset_index()

customer_data = pd.merge(customers, customer_transactions, on='CustomerID')
customer_data['SignupDate'] = (pd.to_datetime('today') - pd.to_datetime(customer_data['SignupDate'])).dt.days
customer_data = customer_data.drop(columns=['CustomerName'])

In [None]:
numerical_features = ['SignupDate', 'TotalValue', 'Quantity', 'Price']
categorical_features = ['Region', 'Category']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', MinMaxScaler(), numerical_features),
        ('cat', OneHotEncoder(), categorical_features)
    ])

customer_features = preprocessor.fit_transform(customer_data)
feature_names = numerical_features + list(preprocessor.named_transformers_['cat'].get_feature_names_out(categorical_features))
customer_features_df = pd.DataFrame(customer_features, columns=feature_names, index=customer_data['CustomerID'])

In [None]:
similarity_matrix = cosine_similarity(customer_features_df)
similarity_df = pd.DataFrame(similarity_matrix, index=customer_data['CustomerID'], columns=customer_data['CustomerID'])

In [None]:
def get_top_lookalikes(customer_id, similarity_df, top_n=3):
    similarities = similarity_df[customer_id]
    sorted_similarities = similarities.sort_values(ascending=False)
    top_lookalikes = sorted_similarities.iloc[1:top_n+1]

    return top_lookalikes

lookalike_map = {}
for customer_id in customer_data['CustomerID'][:20]:
    top_lookalikes = get_top_lookalikes(customer_id, similarity_df)
    lookalike_map[customer_id] = list(zip(top_lookalikes.index, top_lookalikes.values))

lookalike_df = pd.DataFrame(lookalike_map.items(), columns=['CustomerID', 'Lookalikes'])
lookalike_df['Lookalikes'] = lookalike_df['Lookalikes'].apply(lambda x: [list(item) for item in x])

lookalike_df.to_csv('Akanksha_Joshi_Lookalike.csv', index=False)