# Product Recommendation System

## Project Overview

This project implements a hybrid recommendation system using collaborative filtering techniques to suggest products to users based on their ratings and preferences. The system analyzes customer reviews from Amazon's food products dataset to deliver personalized product recommendations.

## Tools & Libraries Used

### Core Libraries:
- **Pandas** - Data manipulation and analysis
- **NumPy** - Numerical computing and array operations
- **Scikit-learn** - Machine learning implementation
  - `TfidfVectorizer` - Text feature extraction
  - `KMeans` - Clustering algorithm
  - `cosine_similarity` - Similarity computation

### Techniques Applied:
- **Collaborative Filtering** - User-based and item-based recommendation
- **Cosine Similarity** - Computing similarity between users and products
- **Pivot Tables** - Creating user-item rating matrices
- **Data Filtering** - Rating threshold application (minimum score of 3)

## Dataset

- **Source**: Amazon Fine Food Reviews (Reviews.csv)
- **Size**: 422,393 total reviews (subset of 10,000 used for analysis)
- **Key Features**: Product ID, User ID, Rating Score, Review Summary, Review Text

## Project Outcomes

✅ **User-Based Recommendations**: Successfully identifies top-k similar users and recommends products they liked

✅ **Item Similarity Matrix**: Generates a comprehensive similarity matrix for all products in the dataset

✅ **Scalable Architecture**: Implements efficient pivot table structure for handling large-scale recommendations

✅ **Personalized Results**: Filters recommendations based on user preferences and excludes already-rated products

✅ **Rating-Based Filtering**: Ensures quality recommendations by filtering products with scores ≥ 3

## Key Findings

- The system identifies similar users based on their rating patterns using cosine similarity
- Products are recommended based on what similar users with comparable tastes have rated highly
- The collaborative filtering approach successfully handles sparse user-item matrices
- The recommendation engine can scale to accommodate new users and products dynamically

In [None]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import cosine_similarity

df = pd.read_csv('/content/sample_data/Reviews.csv', on_bad_lines='skip', encoding='utf-8', engine='python')

df.head()

In [None]:
df.shape


In [None]:
df=df.loc[:9999,:]
df.shape

In [None]:


df  = df[['Id','ProductId','Score','Summary','Text']]

df

df['Score'].value_counts().plot(kind='bar')


In [None]:
# Select the required columns for recommendation
ratings_df = df[['Id', 'ProductId', 'Score']]
ratings_df



In [None]:
pivot_table = ratings_df.pivot_table(index='Id', columns='ProductId', values='Score', fill_value=0)
pivot_table


In [None]:
items_similarity = cosine_similarity(pivot_table)
items_similarity

In [None]:

# Example: Get top-k recommendations for a given user
user_id = 4
k = 5

user_ratings = pivot_table.loc[user_id,:].values.reshape(1,-1)
user_ratings
# Calculate the similarity between the user's ratings and all items
user_item_similarity = cosine_similarity(user_ratings,pivot_table)



In [None]:
# Get the indices of top-k similar users (excluding the user itself)
# Note: user_item_similarity here represents user-user similarity
similar_user_indices = user_item_similarity.argsort()[0, ::-1][1:k+1] # Exclude self-similarity

# Get the actual IDs of similar users
similar_user_ids = pivot_table.index[similar_user_indices]

# Placeholder for recommended products
recommended_product_candidates = {}

# Get products already rated by the target user
user_rated_products = set(pivot_table.columns[pivot_table.loc[user_id] > 0])

for sim_user_id in similar_user_ids:
    # Get products rated by the similar user (score >= 3 as a threshold for 'liked')
    sim_user_liked_products = pivot_table.columns[pivot_table.loc[sim_user_id] >= 3].tolist()

    for product_id in sim_user_liked_products:
        # Recommend only products not yet rated by the target user
        if product_id not in user_rated_products:
            recommended_product_candidates[product_id] = recommended_product_candidates.get(product_id, 0) + 1

# Sort products by the number of similar users who liked them
top_recommendations = sorted(recommended_product_candidates.items(), key=lambda item: item[1], reverse=True)[:k]

print("Recommended Product IDs (User-Based):")
for product_id, count in top_recommendations:
    print(f"Product ID: {product_id} (Liked by {count} similar users)")

In [None]:

def recommend_items(ratings_df):
    # Filter recommendations based on minimum rating
    filtered_recommendations = ratings_df[ratings_df['Score'] >= 3].head(50)

    return filtered_recommendations


recommendations = recommend_items(ratings_df)

if not recommendations.empty:
    for index, row in recommendations.iterrows():
        print("Product ID:", row['ProductId'], "Score:", row['Score'])
else:
    print("No recommendations found.")