# Cross-Category Product Recommendation System

This notebook builds a recommendation system that suggests products across different categories using collaborative filtering techniques. The system recommends complementary products based on user purchase patterns, ratings, and price points.

## Key Features:
- Recommends products from different categories that are frequently purchased together
- Considers user purchase history, product ratings, and price in making recommendations
- Uses matrix factorization for collaborative filtering
- Provides both similar product recommendations and cross-category recommendations

## 1. Import Libraries

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
import random
from collections import defaultdict

# For more advanced modeling
from surprise import Dataset, Reader
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

## 2. Data Loading and Exploration

In [None]:
# Load the dataset
df = pd.read_csv('product_dataset.csv')

# Display the first few rows
print("Dataset Preview:")
df.head()

In [None]:
# Check the dataset info
print("\nDataset Info:")
df.info()

# Check for missing values
print("\nMissing Values:")
df.isnull().sum()

In [None]:
# Explore the distribution of categories
print("\nCategory Distribution:")
category_counts = df['Category'].value_counts()
print(category_counts)

# Visualize category distribution
plt.figure(figsize=(12, 6))
sns.barplot(x=category_counts.index, y=category_counts.values, palette='viridis')
plt.title('Number of Products by Category')
plt.xticks(rotation=45, ha='right')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

In [None]:
# Explore basic statistics of the dataset
print("\nBasic Statistics:")
df.describe()

## 3. Data Preprocessing

In [None]:
# Create a unique identifier for each product
if 'Product ID' not in df.columns:
    df['Product ID'] = df.index

# Handle any missing values if they exist
df = df.dropna()

# Convert relevant columns to appropriate data types
df['Product ID'] = df['Product ID'].astype(str)
df['Rating'] = pd.to_numeric(df['Rating'], errors='coerce')
df['Users Purchased'] = pd.to_numeric(df['Users Purchased'], errors='coerce')
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')

# Create a clean version of the dataset after handling potential errors
df_clean = df.dropna()

# Display the processed data
print("Processed Data:")
df_clean.head()

## 4. Feature Engineering

We'll create a matrix representation for collaborative filtering and also develop features for cross-category recommendations.

In [None]:
# Create a simulated user-item interaction matrix based on Users Purchased
# Since we don't have actual user IDs, we'll simulate this data

# First, let's create a normalized popularity score combining ratings and purchase numbers
scaler = MinMaxScaler()
df_clean['normalized_rating'] = scaler.fit_transform(df_clean[['Rating']].values)
df_clean['normalized_purchases'] = scaler.fit_transform(df_clean[['Users Purchased']].values)
df_clean['normalized_price'] = 1 - scaler.fit_transform(df_clean[['Price']].values)  # Lower price is better

# Create a composite score
df_clean['popularity_score'] = (df_clean['normalized_rating'] * 0.4 + 
                              df_clean['normalized_purchases'] * 0.4 + 
                              df_clean['normalized_price'] * 0.2)

df_clean.head()

In [None]:
# Create simulated user data
# This is needed since our dataset doesn't have user-specific interactions
num_users = 1000  # Simulate 1000 users

# Create empty dataframe for user-item interactions
interactions = []

# For each user, generate synthetic purchase data
for user_id in range(1, num_users + 1):
    # Each user buys a random number of items (between 5 and 20)
    num_purchases = random.randint(5, 20)
    
    # User has preferred categories (randomly choose 2-3 categories)
    num_preferred_categories = random.randint(2, 3)
    preferred_categories = random.sample(list(df_clean['Category'].unique()), num_preferred_categories)
    
    # 70% chance to buy from preferred categories
    for _ in range(num_purchases):
        if random.random() < 0.7:
            # Buy from preferred category with higher probability
            category = random.choice(preferred_categories)
            category_items = df_clean[df_clean['Category'] == category]
            
            # If the category has items, select one with probability weighted by popularity
            if len(category_items) > 0:
                # Use popularity score to weight the selection
                weights = category_items['popularity_score'].values / category_items['popularity_score'].sum()
                product_idx = np.random.choice(category_items.index, p=weights)
                product_id = df_clean.loc[product_idx, 'Product ID']
                
                # Add the interaction with a rating between 3-5 (since this is from preferred category)
                rating = random.uniform(3.0, 5.0)
                interactions.append({'User ID': user_id, 'Product ID': product_id, 'Rating': rating})
        else:
            # Buy random product
            product_idx = random.choice(df_clean.index)
            product_id = df_clean.loc[product_idx, 'Product ID']
            
            # Add the interaction with a broader rating range
            rating = random.uniform(1.0, 5.0)
            interactions.append({'User ID': user_id, 'Product ID': product_id, 'Rating': rating})

# Create a dataframe from the interactions
interactions_df = pd.DataFrame(interactions)

# Display the generated interactions
print(f"Generated {len(interactions_df)} user-product interactions")
interactions_df.head()

In [None]:
# Merge interactions with product data
merged_df = interactions_df.merge(df_clean, on='Product ID', how='left')
merged_df.head()

In [None]:
# Create a pivot table for user-item matrix
user_item_matrix = interactions_df.pivot_table(
    index='User ID', 
    columns='Product ID', 
    values='Rating'
).fillna(0)

print(f"User-Item Matrix Shape: {user_item_matrix.shape}")
user_item_matrix.iloc[:5, :5]  # Display a small part of the matrix

## 5. Collaborative Filtering Model Training

In [None]:
# Convert our matrix to a scipy sparse matrix
user_item_matrix_np = user_item_matrix.values

# Apply SVD (Singular Value Decomposition)
# This helps in reducing dimensionality and finding latent factors
num_latent_factors = 50
U, sigma, Vt = svds(user_item_matrix_np, k=num_latent_factors)

# Convert the matrices back to diagonal form
sigma_diag = np.diag(sigma)

# Make the prediction matrix
predicted_ratings = np.dot(np.dot(U, sigma_diag), Vt)

# Convert back to a DataFrame
predictions_df = pd.DataFrame(
    predicted_ratings, 
    index=user_item_matrix.index, 
    columns=user_item_matrix.columns
)

predictions_df.iloc[:5, :5]  # Display a part of the predictions

## 6. Creating Product Similarity Matrix for Cross-Category Recommendations

In [None]:
# Create a product-user matrix (transpose of user-item matrix)
product_user_matrix = user_item_matrix.T

# Calculate cosine similarity between products
product_similarity = cosine_similarity(product_user_matrix)

# Create a DataFrame for product similarity
product_similarity_df = pd.DataFrame(
    product_similarity,
    index=product_user_matrix.index,
    columns=product_user_matrix.index
)

print("Product Similarity Matrix:")
product_similarity_df.iloc[:5, :5]

In [None]:
# Create a mapping between product IDs and their categories
product_category_map = dict(zip(df_clean['Product ID'], df_clean['Category']))

# Create a function to get product details
def get_product_details(product_id):
    product = df_clean[df_clean['Product ID'] == product_id].iloc[0]
    return {
        'Product ID': product_id,
        'Category': product['Category'],
        'Rating': product['Rating'],
        'Users Purchased': product['Users Purchased'],
        'Price': product['Price']
    }

## 7. Building Recommendation Functions

In [None]:
def recommend_similar_products(product_id, n=5):
    """
    Recommend products similar to a given product
    """
    if product_id not in product_similarity_df.columns:
        print(f"Product ID {product_id} not found in the dataset.")
        return pd.DataFrame()
    
    # Get similarity scores
    similarity_scores = product_similarity_df[product_id]
    
    # Sort products by similarity scores
    similar_products = similarity_scores.sort_values(ascending=False).iloc[1:n+1]
    
    # Get details of the recommended products
    recommendations = []
    for rec_product_id, score in similar_products.items():
        product_details = get_product_details(rec_product_id)
        product_details['Similarity Score'] = score
        recommendations.append(product_details)
    
    return pd.DataFrame(recommendations)

In [None]:
def recommend_cross_category_products(product_id, n=5, exclude_categories=None):
    """
    Recommend products from different categories
    """
    if product_id not in product_similarity_df.columns:
        print(f"Product ID {product_id} not found in the dataset.")
        return pd.DataFrame()
    
    # Get the category of the input product
    input_category = product_category_map.get(product_id)
    if not input_category:
        print(f"Category for Product ID {product_id} not found.")
        return pd.DataFrame()
    
    # Initialize excluded categories list
    if exclude_categories is None:
        exclude_categories = []
    
    # Always exclude the input product's category
    exclude_categories.append(input_category)
    
    # Get similarity scores
    similarity_scores = product_similarity_df[product_id]
    
    # Filter out products from the same category
    other_category_products = []
    for rec_product_id, score in similarity_scores.items():
        if rec_product_id != product_id:
            rec_category = product_category_map.get(rec_product_id)
            if rec_category and rec_category not in exclude_categories:
                other_category_products.append((rec_product_id, score))
    
    # Sort and get the top n
    other_category_products.sort(key=lambda x: x[1], reverse=True)
    top_products = other_category_products[:n]
    
    # Get details of the recommended products
    recommendations = []
    for rec_product_id, score in top_products:
        product_details = get_product_details(rec_product_id)
        product_details['Similarity Score'] = score
        recommendations.append(product_details)
    
    return pd.DataFrame(recommendations)

In [None]:
def recommend_for_user(user_id, n=10, include_cross_category=True):
    """
    Recommend products for a specific user
    """
    # Check if the user exists in our dataset
    if user_id not in predictions_df.index:
        print(f"User ID {user_id} not found in the dataset.")
        return pd.DataFrame()
    
    # Get this user's predicted ratings
    user_ratings = predictions_df.loc[user_id]
    
    # Get the products this user has already rated
    already_rated = interactions_df[interactions_df['User ID'] == user_id]['Product ID']
    
    # Remove already rated products
    recommendations = user_ratings.drop(already_rated, errors='ignore')
    
    # Sort by predicted rating
    recommendations = recommendations.sort_values(ascending=False).head(n)
    
    # Get details of recommended products
    rec_details = []
    for product_id, predicted_rating in recommendations.items():
        product_details = get_product_details(product_id)
        product_details['Predicted Rating'] = predicted_rating
        rec_details.append(product_details)
    
    rec_df = pd.DataFrame(rec_details)
    
    # If cross-category recommendations are requested
    if include_cross_category and not rec_df.empty:
        # Get the most common category in the recommendations
        top_category = rec_df['Category'].value_counts().index[0]
        
        # Get a sample product from this category
        sample_product = rec_df[rec_df['Category'] == top_category]['Product ID'].iloc[0]
        
        # Get cross-category recommendations
        cross_category_recs = recommend_cross_category_products(sample_product, n=3)
        
        # Add an identifier for cross-category recommendations
        if not cross_category_recs.empty:
            cross_category_recs['Recommendation Type'] = 'Cross-Category'
            rec_df['Recommendation Type'] = 'Standard'
            
            # Combine both types of recommendations
            rec_df = pd.concat([rec_df, cross_category_recs])
    
    return rec_df

## 8. Testing and Evaluating the Recommendations

In [None]:
# Test the similar products recommendation function
# Choose a random product ID from our dataset
test_product_id = df_clean['Product ID'].iloc[0]
test_product_details = get_product_details(test_product_id)

print(f"Selected Product: {test_product_details}")
print("\nSimilar Products:")
similar_products = recommend_similar_products(test_product_id, n=5)
similar_products

In [None]:
# Test the cross-category recommendation function
print("\nCross-Category Recommendations:")
cross_category_recs = recommend_cross_category_products(test_product_id, n=5)
cross_category_recs

In [None]:
# Test the user recommendation function
test_user_id = interactions_df['User ID'].iloc[0]
print(f"Selected User ID: {test_user_id}")
print("\nRecommendations for User:")
user_recommendations = recommend_for_user(test_user_id, n=10)
user_recommendations

## 9. Creating Special Cross-Category Recommendation Pairs

In [None]:
def create_special_recommendation_pairs():
    """
    Create specific cross-category pairs based on purchase patterns
    """
    # Get unique categories
    categories = df_clean['Category'].unique()
    
    # Let's find the pairs of categories that are frequently purchased together
    # First, we group the user interactions by user and create a list of purchased categories
    user_categories = defaultdict(set)
    for _, row in merged_df.iterrows():
        user_categories[row['User ID']].add(row['Category'])
    
    # Count co-occurrences of categories
    category_pairs = defaultdict(int)
    for user, user_cats in user_categories.items():
        # Only consider users who bought from multiple categories
        if len(user_cats) > 1:
            # Count all pairs of categories bought by this user
            for cat1 in user_cats:
                for cat2 in user_cats:
                    if cat1 != cat2:
                        # Store as a sorted tuple to avoid duplicates
                        pair = tuple(sorted([cat1, cat2]))
                        category_pairs[pair] += 1
    
    # Convert to DataFrame for easy visualization
    pairs_df = pd.DataFrame([(cat1, cat2, count) for (cat1, cat2), count in category_pairs.items()],
                          columns=['Category 1', 'Category 2', 'Co-occurrence'])
    
    # Sort by co-occurrence count
    pairs_df = pairs_df.sort_values(by='Co-occurrence', ascending=False)
    
    print("Top Category Pairs:")
    print(pairs_df.head(10))
    
    # Create and return the top pairs
    top_pairs = []
    for _, row in pairs_df.head(10).iterrows():
        cat1, cat2 = row['Category 1'], row['Category 2']
        
        # Get a popular product from each category
        product1 = df_clean[df_clean['Category'] == cat1].sort_values(
            by='popularity_score', ascending=False).iloc[0]
        product2 = df_clean[df_clean['Category'] == cat2].sort_values(
            by='popularity_score', ascending=False).iloc[0]
        
        top_pairs.append({
            'Category Pair': f"{cat1} + {cat2}",
            'Product 1': product1['Product ID'],
            'Product 1 Details': get_product_details(product1['Product ID']),
            'Product 2': product2['Product ID'],
            'Product 2 Details': get_product_details(product2['Product ID']),
            'Co-occurrence': row['Co-occurrence']
        })
    
    return top_pairs

In [None]:
# Create special recommendation pairs
special_pairs = create_special_recommendation_pairs()

# Display a few example pairs
for pair in special_pairs[:3]:
    print(f"\nRecommended Pair: {pair['Category Pair']}")
    print(f"Product 1: {pair['Product 1 Details']['Category']} - ID: {pair['Product 1']}")
    print(f"Product 2: {pair['Product 2 Details']['Category']} - ID: {pair['Product 2']}")

## 10. Function to Generate Recommendations for New Products

In [None]:
def recommend_for_product(product_id, include_cross_category=True):
    """
    Generate recommendations based on a selected product
    """
    if product_id not in product_similarity_df.columns:
        print(f"Product ID {product_id} not found in the dataset.")
        return None
    
    # Get the product details
    product_details = get_product_details(product_id)
    product_category = product_details['Category']
    
    print(f"Selected Product: {product_details}")
    
    # Get similar products (from the same category)
    similar_products = []
    similarity_scores = product_similarity_df[product_id].sort_values(ascending=False)[1:]
    
    for similar_id, score in similarity_scores.items():
        similar_details = get_product_details(similar_id)
        if similar_details['Category'] == product_category:
            similar_details['Similarity Score'] = score
            similar_products.append(similar_details)
            if len(similar_products) >= 5:
                break
    
    # Recommend cross-category products
    cross_category_products = []
    if include_cross_category:
        # Find the special pairs that contain this product's category
        related_categories = set()
        for pair in special_pairs:
            if product_category in pair['Category Pair']:
                cats = pair['Category Pair'].split(' + ')
                for cat in cats:
                    if cat != product_category:
                        related_categories.add(cat)
        
        # If we have related categories from special pairs, use those first
        if related_categories:
            for similar_id, score in similarity_scores.items():
                similar_details = get_product_details(similar_id)
                if similar_details['Category'] in related_categories:
                    similar_details['Similarity Score'] = score
                    similar_details['Recommendation Type'] = 'Cross-Category (Special Pair)'
                    cross_category_products.append(similar_details)
                    if len(cross_category_products) >= 3:
                        break
        
        # If we still need more cross-category recommendations
        if len(cross_category_products) < 3:
            for similar_id, score in similarity_scores.items():
                similar_details = get_product_details(similar_id)
                if similar_details['Category'] != product_category and \
                   not any(p['Product ID'] == similar_id for p in cross_category_products):
                    similar_details['Similarity Score'] = score
                    similar_details['Recommendation Type'] = 'Cross-Category'
                    cross_category_products.append(similar_details)
                    if len(cross_category_products) >= 3:
                        break
    
    # Convert lists to DataFrames
    similar_df = pd.DataFrame(similar_products)
    cross_df = pd.DataFrame(cross_category_products)
    
    return {
        'product_details': product_details,
        'similar_products': similar_df,
        'cross_category_products': cross_df
    }

## 11. Testing the Final Product Recommendation System

In [None]:
# Test the recommendation system with a random product
test_product = df_clean['Product ID'].sample(1).values[0]
recommendations = recommend_for_product(test_product)

if recommendations:
    print("\nSimilar Products in the Same Category:")
    print(recommendations['similar_products'])
    
    print("\nCross-Category Recommendations:")
    print(recommendations['cross_category_products'])

## 12. Example Use Cases

In [None]:
# Example 1: Recommend sunglasses and wallet with watches
# First, find a popular watch
watch_product = None
try:
    watch_product = df_clean[df_clean['Category'] == 'Watches'].sort_values(
        by='popularity_score', ascending=False).iloc[0]['Product ID']
    print(f"Selected Watch Product ID: {watch_product}")
    
    # Generate recommendations
    watch_recs = recommend_for_product(watch_product)
    if watch_recs:
        print("\nCross-Category Recommendations for Watch:")
        print(watch_recs['cross_category_products'])
except:
    print("No 'Watches' category found in the dataset. Please check the actual categories.")

In [None]:
# Example 2: Recommend shoes with bags
# First, find a popular shoe
shoe_product = None
try:
    shoe_product = df_clean[df_clean['Category'] == 'Shoes'].sort_values(
        by='popularity_score', ascending=False).iloc[0]['Product ID']
    print(f"Selected Shoe Product ID: {shoe_product}")
    
    # Generate recommendations
    shoe_recs = recommend_for_product(shoe_product)
    if shoe_recs:
        print("\nCross-Category Recommendations for Shoes:")
        print(shoe_recs['cross_category_products'])
except:
    print("No 'Shoes' category found in the dataset. Please check the actual categories.")

In [None]:
# Example 3: Recommend earbuds with smartphones
# First, find a popular smartphone
smartphone_product = None
try:
    smartphone_product = df_clean[df_clean['Category'] == 'Smartphones'].sort_values(
        by='popularity_score', ascending=False).iloc[0]['Product ID']
    print(f"Selected Smartphone Product ID: {smartphone_product}")
    
    # Generate recommendations
    smartphone_recs = recommend_for_product(smartphone_product)
    if smartphone_recs:
        print("\nCross-Category Recommendations for Smartphone:")
        print(smartphone_recs['cross_category_products'])
except:
    print("No 'Smartphones' category found in the dataset. Please check the actual categories.")

## 13. Function to Get Recommendations for a New Product Search

In [None]:
def get_recommendations(category, num_recommendations=5):
    """
    Get recommendations for a specific category, including cross-category recommendations
    """
    if category not in df_clean['Category'].values:
        print(f"Category '{category}' not found in the dataset. Available categories are:")
        print(df_clean['Category'].unique())
        return None
    
    # Get popular products in this category
    category_products = df_clean[df_clean['Category'] == category].sort_values(
        by='popularity_score', ascending=False).head(num_recommendations)
    
    print(f"Top {num_recommendations} recommended products in {category}:")
    print(category_products[['Product ID', 'Category', 'Rating', 'Users Purchased', 'Price']])
    
    # Choose the most popular product for cross-category recommendations
    top_product = category_products.iloc[0]['Product ID']
    
    # Get cross-category recommendations
    print(f"\nCross-category recommendations to complement {category}:")
    cross_recs = recommend_cross_category_products(top_product, n=3)
    
    return {
        'top_products': category_products,
        'cross_category': cross_recs
    }

In [None]:
# Example usage: Get recommendations for electronics
try:
    electronics_recs = get_recommendations('Electronics')
    if electronics_recs:
        print("\nCross-Category Recommendations:")
        print(electronics_recs['cross_category'])
except Exception as e:
    print(f"Error: {e}")
    print("\nAvailable categories in the dataset:")
    print(df_clean['Category'].unique())

## 14. Conclusion and Next Steps

In this notebook, we've built a cross-category recommendation system that can suggest complementary products from different categories based on user purchase patterns, ratings, and price considerations.

Key accomplishments:

1. Created a collaborative filtering model using matrix factorization (SVD)
2. Built a product similarity matrix for finding similar products
3. Developed specialized functions for cross-category recommendations
4. Identified frequently co-purchased category pairs
5. Implemented a recommendation system that can suggest both similar products and complementary products from different categories

Next steps to enhance this system:

1. Fine-tune the model with real user data when available
2. Add more features like product descriptions, images, or seasonal factors
3. Implement A/B testing to evaluate the effectiveness of recommendations
4. Develop a mechanism to refresh recommendations periodically as new data becomes available
5. Create an API service to integrate these recommendations into an e-commerce platform