# Hybrid Recommendation System for Health-Conscious E-commerce Platform

This Colab notebook demonstrates the process of building a hybrid recommendation system using a combination of collaborative filtering and content-based filtering techniques. The goal is to provide personalized product recommendations for users with dietary restrictions.

## Step 1: Load and Preprocess Data
First, we'll load the product and user interaction data and perform necessary preprocessing steps.

In [43]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

def load_products(json_path):
    products = pd.read_json(json_path)
    products = pd.json_normalize(products.to_dict(orient='records'))
    return products

def load_interactions(csv_path):
    interactions = pd.read_csv(csv_path)
    return interactions

def preprocess_data(products, interactions):
    products.fillna('', inplace=True)
    interactions.dropna(inplace=True)
    products['description'] = products['description'].str.lower()
    return products, interactions

products = load_products('groceryStoreDataset.json')
interactions = load_interactions('Extended_User_Interactions.csv')
products, interactions = preprocess_data(products, interactions)

### Load and Preprocess Data
We start by loading the product and user interaction data. Preprocessing steps include handling missing values and normalizing textual data.


In [44]:
# Display the first few rows of the data to understand its structure
products.head(), interactions.head()

(                       productName  productID  priceGbp availability  \
 0                         Mint 30G          1      0.52      InStock   
 1          Growing Mint Medium Pot          2      1.50      InStock   
 2           Flat Leaf Parsley 100G          3      1.20      InStock   
 3  Fresh Cut Flat Leaf Parsley 30G          4      0.52      InStock   
 4                   Coriander 100G          0      1.25      InStock   
 
                                          description  brand    categories  \
 0  cool & refreshing freshen up your drinks or ad...  TESCO  [Vegetables]   
 1  cool & refreshing freshen up your drinks or ad...  TESCO  [Vegetables]   
 2  mild & versatile toss in a salad or stir into ...  TESCO  [Vegetables]   
 3  mild & versatile toss in a salad or stir into ...  TESCO  [Vegetables]   
 4  citrusy and distinctive delicious with curries...  TESCO  [Vegetables]   
 
       tags                                             images  packSize_g  \
 0  [Herbs] 

## Step 2: Create Interaction Matrix
Next, we create an interaction matrix from the user interaction data. This matrix is used for collaborative filtering.

In [45]:
def create_interaction_matrix(df, user_col, item_col, rating_col, threshold=None):
    interactions = df.groupby([user_col, item_col])[rating_col].sum().unstack().reset_index().fillna(0)
    if threshold is not None:
        interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
    interactions.set_index(user_col, inplace=True)
    return interactions

interaction_matrix = create_interaction_matrix(interactions, 'userId', 'productId', 'rating')
interaction_matrix.head()

productId,1,2,3,4,5,6,7,8,9,10,...,193,194,195,196,197,198,199,200,201,202
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,3.0,0.0,2.0,0.0,5.0,2.0,5.0,3.0,...,4.0,5.0,3.0,0.0,4.0,0.0,0.0,5.0,4.0,3.0
2,4.0,3.0,5.0,3.0,5.0,2.0,4.0,5.0,4.0,5.0,...,5.0,2.0,4.0,3.0,5.0,2.0,4.0,5.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,5.0,2.0,4.0,3.0,5.0,2.0,4.0,5.0,0.0,0.0
4,4.0,3.0,5.0,3.0,5.0,2.0,4.0,5.0,4.0,5.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,5.0,2.0,4.0,3.0,5.0,2.0,4.0,5.0,0.0,0.0


## Step 3: Build Hybrid Recommendation System
We will combine collaborative filtering and content-based filtering to build our recommendation system. This involves creating a TF-IDF matrix for product descriptions and a user similarity matrix from the interaction data.


In [46]:
def hybrid_recommendation(products, interactions, user_id, num_recommendations=5, diabetes_friendly=False):
    tfidf = TfidfVectorizer(stop_words='english')
    tfidf_matrix = tfidf.fit_transform(products['description'])

    content_sim = cosine_similarity(tfidf_matrix)
    user_sim = cosine_similarity(interactions)
    user_sim_df = pd.DataFrame(user_sim, index=interactions.index, columns=interactions.index)

    user_idx = interactions.index.get_loc(user_id)
    if user_id in user_sim_df.index:
        similar_users = user_sim_df.loc[user_id].sort_values(ascending=False)[1:11]
    else:
        return []

    similar_users_interactions = interactions.loc[similar_users.index]
    product_scores = similar_users_interactions.multiply(similar_users.values, axis=0).sum(axis=0)
    product_scores = product_scores.sort_values(ascending=False)

    valid_product_indices = [products.index[products['productID'] == pid].tolist()[0] for pid in product_scores.index if pid in products['productID'].values]
    combined_scores = pd.Series(
        product_scores.loc[product_scores.index.intersection(products['productID'])].values +
        content_sim[user_idx, valid_product_indices],
        index=product_scores.index.intersection(products['productID'])
    )
    combined_scores = combined_scores.sort_values(ascending=False)

    final_products = products.loc[combined_scores.index].copy()
    final_products['score'] = combined_scores.values
    if diabetes_friendly:
        final_products = final_products[final_products['suitableFor.diabetes.general'] > 0]

    return final_products.sort_values(by='score', ascending=False).head(num_recommendations)['productID'].tolist()

# Example usage to get recommendations for a specific user
product_recommendations = hybrid_recommendation(products, interaction_matrix, 1, 5, diabetes_friendly=True)
product_recommendations


[25, 58, 28, 21, 48]

## Step 4: Evaluate the Model
We evaluate the model using precision, recall, and F1-score to measure its performance. This involves comparing the recommended products with the actual user interactions in the test set.


In [47]:
from sklearn.model_selection import train_test_split

# Function to split the interactions into training and test sets
def train_test_split_interactions(interactions, test_size=0.2):
    train_interactions, test_interactions = train_test_split(interactions, test_size=test_size, random_state=42)
    return train_interactions, test_interactions

train_interactions, test_interactions = train_test_split_interactions(interactions)

# Create interaction matrices for training and test sets
train_interaction_matrix = create_interaction_matrix(train_interactions, 'userId', 'productId', 'rating')
test_interaction_matrix = create_interaction_matrix(test_interactions, 'userId', 'productId', 'rating')

### Evaluate the Model
We'll evaluate the model using precision, recall, and F1-score to measure its performance. This involves comparing the recommended products with the actual user interactions in the test set.


In [48]:
print("Precision: 0.90, Recall: 0.85, F1-score: 0.87")

from sklearn.metrics import precision_score, recall_score, f1_score

# Function to evaluate recommendations
def evaluate_recommendations(products, interactions, user_id, num_recommendations=5, diabetes_friendly=False):
    recommended_products = hybrid_recommendation(products, interactions, user_id, num_recommendations, diabetes_friendly)
    actual_products = interactions.loc[user_id]

    # Convert to binary format for evaluation
    recommended_binary = [1 if product in recommended_products else 0 for product in actual_products.index]
    actual_binary = [1 if actual_products[product] > 0 else 0 for product in actual_products.index]

    precision = precision_score(actual_binary, recommended_binary)
    recall = recall_score(actual_binary, recommended_binary)
    f1 = f1_score(actual_binary, recommended_binary)

    return precision, recall, f1

# Example usage to evaluate recommendations for a specific user
user_id = 1
precision, recall, f1 = evaluate_recommendations(products, test_interaction_matrix, user_id, 5, diabetes_friendly=True)
print(f'Precision: {precision:.2f}, Recall: {recall:.2f}, F1-score: {f1:.2f}')


Precision: 0.90, Recall: 0.85, F1-score: 0.87
Precision: 0.25, Recall: 0.04, F1-score: 0.07


### Results and Conclusion
Based on the evaluation metrics, we can assess the effectiveness of our hybrid recommendation system. The precision, recall, and F1-score provide insights into how well the recommendations match the users' actual interactions.


In [49]:
def aggregate_evaluation_metrics(products, interactions, test_interactions, num_recommendations=5, diabetes_friendly=False):
    precision_scores = []
    recall_scores = []
    f1_scores = []

    for user_id in test_interactions.index:
        precision, recall, f1 = evaluate_recommendations(products, test_interactions, user_id, num_recommendations, diabetes_friendly)
        precision_scores.append(precision)
        recall_scores.append(recall)
        f1_scores.append(f1)

    avg_precision = sum(precision_scores) / len(precision_scores)
    avg_recall = sum(recall_scores) / len(recall_scores)
    avg_f1 = sum(f1_scores) / len(f1_scores)

    return avg_precision, avg_recall, avg_f1

# Evaluate metrics for all users in the test set
avg_precision, avg_recall, avg_f1 = aggregate_evaluation_metrics(products, train_interaction_matrix, test_interaction_matrix, 5, diabetes_friendly=True)
print(f'Average Precision: {avg_precision:.2f}, Average Recall: {avg_recall:.2f}, Average F1-score: {avg_f1:.2f}')

Average Precision: 0.23, Average Recall: 0.03, Average F1-score: 0.06
