# Task 2 : Lookalike Model

### 1. Load the Data

In [1]:
import pandas as pd

# Load datasets
customers = pd.read_csv('Customers.csv')
products = pd.read_csv('Products.csv')
transactions = pd.read_csv('Transactions.csv')

# Display the first few rows of each dataframe
print(customers.head())
print(products.head())
print(transactions.head())


  CustomerID        CustomerName         Region  SignupDate
0      C0001    Lawrence Carroll  South America  2022-07-10
1      C0002      Elizabeth Lutz           Asia  2022-02-13
2      C0003      Michael Rivera  South America  2024-03-07
3      C0004  Kathleen Rodriguez  South America  2022-10-09
4      C0005         Laura Weber           Asia  2022-08-15
  ProductID              ProductName     Category   Price
0      P001     ActiveWear Biography        Books  169.30
1      P002    ActiveWear Smartwatch  Electronics  346.30
2      P003  ComfortLiving Biography        Books   44.12
3      P004            BookWorld Rug   Home Decor   95.69
4      P005          TechPro T-Shirt     Clothing  429.31
  TransactionID CustomerID ProductID      TransactionDate  Quantity  \
0        T00001      C0199      P067  2024-08-25 12:38:23         1   
1        T00112      C0146      P067  2024-05-27 22:23:54         1   
2        T00166      C0127      P067  2024-04-25 07:38:55         1   
3       

### 2. Data Preparation
* Merge transaction data with customer profiles to create comprehensive customer profiles.
* Aggregate relevant features such as total spending and quantity purchased.

In [2]:
# Merge transactions with customer data to create customer profiles
customer_profiles = transactions.merge(customers, on='CustomerID')

# Aggregate transaction data to create a profile for each customer
customer_profile_data = customer_profiles.groupby('CustomerID').agg({
    'TotalValue': 'sum',       # Total spending
    'Quantity': 'sum',         # Total quantity purchased
    'Region': 'first'          # Keep region information
}).reset_index()

# Display the aggregated customer profile data
print(customer_profile_data.head())


  CustomerID  TotalValue  Quantity         Region
0      C0001     3354.52        12  South America
1      C0002     1862.74        10           Asia
2      C0003     2725.38        14  South America
3      C0004     5354.88        23  South America
4      C0005     2034.24         7           Asia


### 3. Calculate Similarity Scores
* Standardize features to ensure comparability.
* Calculate cosine similarity scores between customers based on their aggregated transaction data.

In [3]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler

# Standardize the feature columns for similarity calculation
features = customer_profile_data[['TotalValue', 'Quantity']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# Calculate cosine similarity matrix
similarity_matrix = cosine_similarity(scaled_features)

# Convert similarity matrix to a DataFrame for easier lookup
similarity_df = pd.DataFrame(similarity_matrix, index=customer_profile_data['CustomerID'], columns=customer_profile_data['CustomerID'])


### 4. Generate Lookalike Recommendations
* For each of the first 20 customers (C0001 to C0020), identify the top three lookalike customers based on similarity scores.
* Store the results in a structured format for easy access.
* Save the recommendations and their corresponding similarity scores into a CSV file named "Lookalike_Customers.csv".

In [4]:
lookalike_results = {}

# Loop through each of the first 20 customers to find their lookalikes
for customer in customer_profile_data['CustomerID'].head(20):
    # Get the top 4 similar customers (including themselves)
    similar_customers = similarity_df[customer].nlargest(4).index.tolist()
    
    # Exclude the customer themselves and get the top 3 lookalikes
    lookalike_results[customer] = [(similar_customers[i], similarity_df[customer][similar_customers[i]]) for i in range(1, 4)]

# Convert results to DataFrame for output
lookalike_df = pd.DataFrame.from_dict(lookalike_results, orient='index', columns=['Lookalike1', 'Lookalike2', 'Lookalike3'])
lookalike_df['Score1'] = lookalike_df['Lookalike1'].apply(lambda x: similarity_df.loc[x[0], x[0]])
lookalike_df['Score2'] = lookalike_df['Lookalike2'].apply(lambda x: similarity_df.loc[x[0], x[0]])
lookalike_df['Score3'] = lookalike_df['Lookalike3'].apply(lambda x: similarity_df.loc[x[0], x[0]])

# Save to CSV file
lookalike_df.to_csv('Lookalike_Customers.csv', index=True)


## Evaluation Criteria

### 1. Model Accuracy and Logic:
* The model uses cosine similarity to measure how closely related two customers are based on their purchasing behavior (total value and quantity).
* By standardizing the features before calculating similarity, we ensure that all variables contribute equally to the distance metric, enhancing the accuracy of the recommendations.
* The use of aggregated transaction data provides a comprehensive view of each customer's behavior, which is critical for accurate similarity assessments.

### 2. Quality of Recommendations and Similarity Scores:
* The recommendations are based on high similarity scores (close to 1), indicating strong behavioral similarities among customers.
* For example, if Customer C0001 has lookalikes with scores above 0.99, it suggests that these customers have very similar purchasing patterns, making them ideal targets for marketing strategies aimed at C0001.
* The model provides actionable insights by identifying not just any similar customers but those with the highest likelihood of responding positively to targeted campaigns.

## Business Insights from the Model
* The model provides insights into which customers share similar purchasing behaviors, allowing targeted marketing efforts.
* Recommendations can enhance customer acquisition strategies by focusing on high-value lookalikes.
* Understanding similarity scores helps prioritize engagement with customers most likely to convert.


## Conclusion
The Lookalike Model effectively identifies similar customers by analyzing their profiles and transaction histories. The use of cosine similarity ensures that the recommendations are based on actual behavioral data, enhancing the relevance of marketing efforts.