**1. Data Preparation**

**Load Data:** Import the Customers.csv, Products.csv, and Transactions.csv files into your Python environment using pandas.

**Merge Datasets:** Combine the datasets to create a comprehensive view of each customer's transaction history.

In [1]:
import pandas as pd

# Load datasets
customers = pd.read_csv('/content/Customers.csv')
products = pd.read_csv('/content/Products.csv')
transactions = pd.read_csv('/content/Transactions.csv')

# Merge datasets
merged_data = transactions.merge(customers, on='CustomerID', how='left')
merged_data = merged_data.merge(products, on='ProductID', how='left')


**2. Feature Engineering**

**Customer Profile Features:**
**Region:** Encode the 'Region' feature using one-hot encoding.

**Signup Date:** Extract features such as 'Signup Year' and 'Signup Month'.


**Transaction History Features:**

**Total Spend:** Calculate the total amount spent by each customer.

**Purchase Frequency:**Determine the number of transactions made by each customer.

**Average Order Value:** Compute the average transaction value for each customer.

**Product Categories Purchased:** Identify the variety of product categories each customer has purchased.

In [2]:
# Convert 'SignupDate' to datetime
customers['SignupDate'] = pd.to_datetime(customers['SignupDate'])

# Extract 'Signup Year' and 'Signup Month'
customers['SignupYear'] = customers['SignupDate'].dt.year
customers['SignupMonth'] = customers['SignupDate'].dt.month

# One-hot encode 'Region'
customers = pd.get_dummies(customers, columns=['Region'])

# Calculate transaction-based features
customer_transactions = merged_data.groupby('CustomerID').agg({
    'TotalValue': ['sum', 'count', 'mean'],
    'Category': lambda x: x.nunique()
}).reset_index()

# Rename columns
customer_transactions.columns = ['CustomerID', 'TotalSpend', 'PurchaseFrequency', 'AvgOrderValue', 'UniqueCategoriesPurchased']

# Merge with customer profile data
customer_profiles = customers.merge(customer_transactions, on='CustomerID', how='left')


**3. Similarity Computation**

**Normalize Features:** Standardize the numerical features to have a mean of 0 and a standard deviation of 1.

**Compute Similarity:**Use the cosine similarity metric to measure the similarity between customers.

In [3]:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Select features for similarity computation
features = ['SignupYear', 'SignupMonth', 'TotalSpend', 'PurchaseFrequency', 'AvgOrderValue', 'UniqueCategoriesPurchased'] + \
           [col for col in customer_profiles.columns if col.startswith('Region_')]

# Fill missing values with 0
customer_profiles[features] = customer_profiles[features].fillna(0)

# Standardize features
scaler = StandardScaler()
standardized_features = scaler.fit_transform(customer_profiles[features])

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(standardized_features)


**4. Generate Lookalike Recommendations**

**Identify Top 3 Lookalikes:** For each target customer, find the top 3 most similar customers based on the similarity scores.

In [8]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# Assuming customer_profiles DataFrame is already created and preprocessed

# Select features for similarity computation
features = ['SignupYear', 'SignupMonth', 'TotalSpend', 'PurchaseFrequency', 'AvgOrderValue', 'UniqueCategoriesPurchased'] + \
           [col for col in customer_profiles.columns if col.startswith('Region_')]

# Fill missing values with 0
customer_profiles[features] = customer_profiles[features].fillna(0)

# Standardize features
scaler = StandardScaler()
standardized_features = scaler.fit_transform(customer_profiles[features])

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(standardized_features)

# Create a DataFrame to store lookalike recommendations
lookalike_recommendations = pd.DataFrame(columns=['CustomerID', 'Lookalike1', 'Score1', 'Lookalike2', 'Score2', 'Lookalike3', 'Score3'])

# Iterate over the first 20 customers
for idx in range(20):
    customer_id = customer_profiles.iloc[idx]['CustomerID']
    similarity_scores = similarity_matrix[idx]
    # Exclude the customer itself by setting its similarity score to -1
    similarity_scores[idx] = -1
    # Get indices of top 3 similar customers
    top_indices = np.argsort(similarity_scores)[-3:][::-1]
    # Get corresponding customer IDs and similarity scores
    lookalikes = customer_profiles.iloc[top_indices]['CustomerID'].values
    scores = similarity_scores[top_indices]
    # Create a DataFrame for the new row
    new_row = pd.DataFrame({
        'CustomerID': [customer_id],
        'Lookalike1': [lookalikes[0]],
        'Score1': [scores[0]],
        'Lookalike2': [lookalikes[1]],
        'Score2': [scores[1]],
        'Lookalike3': [lookalikes[2]],
        'Score3': [scores[2]]
    })
    # Drop all-NA columns from the new_row DataFrame
    new_row = new_row.dropna(axis=1, how='all')
    # Concatenate the new row to the recommendations DataFrame
    lookalike_recommendations = pd.concat([lookalike_recommendations, new_row], ignore_index=True)

# Save to 'Lookalike.csv'
lookalike_recommendations.to_csv('Lookalike.csv', index=False)


  lookalike_recommendations = pd.concat([lookalike_recommendations, new_row], ignore_index=True)
