In [None]:
import numpy as np
import pandas as pd

A hybrid recommendation system is a recommendation technique that offers a complete and balanced approach by mixing two or more recommendation techniques. It aims to provide more accurate, diverse and personalized recommendations to users leveraging the strengths of different techniques and providing valuable user experience.

What is a Hybrid Recommendation System?

A hybrid recommendation system combines multiple recommendation techniques to provide more accurate and diverse recommendations to users. It uses the strengths of different approaches, such as collaborative filtering and content-based filtering, to overcome their limitations and improve the recommendation process.

You must have heard of Collaborative filtering and Content-based filtering before. Collaborative filtering analyzes user-item interactions and identifies similarities between users or items to make recommendations. It recommends items users with similar preferences have liked or consumed. However, it may struggle with new or niche items having limited user interactions.

On the other hand, content-based filtering focuses on features and characteristics of items to recommend similar items to users based on their preferences. It examines attributes like product descriptions, brands, categories, and user profiles. However, it may not capture the complexity of user preferences and may result in less diverse recommendations.

This is where a hybrid recommendation system helps. By combining collaborative and content-based filtering in a hybrid system, we can overcome the limitations of collaborative and content-based filtering. The collaborative filtering component captures the wisdom of the crowd, while the content-based filtering component takes into account the specific features and attributes of items. This combination allows the system to provide more accurate recommendations, especially in situations where user-item interactions are rare or when personalized recommendations are desired.

In [None]:
from google.colab import files
path_to_file = list(files.upload().keys())[0]

Saving fashion_products.csv to fashion_products.csv


In [None]:
df = pd.read_csv(path_to_file)
print(df.head())

   User ID  Product ID Product Name   Brand         Category  Price    Rating  \
0       19           1        Dress  Adidas    Men's Fashion     40  1.043159   
1       97           2        Shoes     H&M  Women's Fashion     82  4.026416   
2       25           3        Dress  Adidas  Women's Fashion     44  3.337938   
3       57           4        Shoes    Zara    Men's Fashion     23  1.049523   
4       79           5      T-shirt  Adidas    Men's Fashion     79  4.302773   

    Color Size  
0   Black   XL  
1   Black    L  
2  Yellow   XL  
3   White    S  
4   Black    M  


So this data is based on fashion products for men, women, and kids. Our goal is to create two recommendation systems using collaborative and content-based filtering and then combine the recommendation techniques to build a recommendation system using a hybrid approach.

In [None]:
!pip install scikit-surprise


Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp310-cp310-linux_x86_64.whl size=2811619 sha256=ba98a8867794ce5866dfd7085ac4f6da7e1f4b02605cee4a7e798dfe67eecae5
  Stored in directory: /root/.cache/pip/wheels/a5/ca/a8/4e28def53797fdc4363ca4af740db15a9c2f1595ebc51fb445
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.3


In [None]:
from surprise import Dataset, Reader, SVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

I have imported the Surprise library that you may not have used before. The surprise library is imported to use the SVD algorithm. SVD stands for Singular Value Decomposition. Simply put, it is a matrix factorization technique commonly used in collaborative filtering algorithms.


First Approach: Content-Based Filtering




Now let’s move forward by creating a recommendation system using content-based filtering:¶

In [None]:
content_df = df[['Product ID', 'Product Name', 'Brand',
                   'Category', 'Color', 'Size']]
content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)

# Use TF-IDF vectorizer to convert content into a matrix of TF-IDF features
tfidf_vectorizer = TfidfVectorizer()
content_matrix = tfidf_vectorizer.fit_transform(content_df['Content'])

content_similarity = linear_kernel(content_matrix, content_matrix)

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['User ID',
                                  'Product ID',
                                  'Rating']], reader)

def get_content_based_recommendations(product_id, top_n):
    index = content_df[content_df['Product ID'] == product_id].index[0]
    similarity_scores = content_similarity[index]
    similar_indices = similarity_scores.argsort()[::-1][1:top_n + 1]
    recommendations = content_df.loc[similar_indices, 'Product ID'].values
    return recommendations

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content_df['Content'] = content_df.apply(lambda row: ' '.join(row.dropna().astype(str)), axis=1)


(1). Create a DataFrame content_df containing relevant product information like 'Product ID', 'Product Name', 'Brand', 'Category', 'Color', and 'Size'.




(2). Add a new column 'Content' to content_df, which concatenates all non-null values from each row (product information) into a single string. This string will represent the content (features) of each product.

(3). Use the TF-IDF vectorizer to convert the 'Content' column into a matrix of TF-IDF features. TF-IDF is a numerical representation that reflects the importance of words in a document relative to the entire collection of documents. Here, each product is considered a "document," and the words in the 'Content' column are treated as "terms."



(4). Compute the content similarity matrix content_similarity. This matrix represents the similarity between each pair of products based on their TF-IDF feature representations. The higher the value in the matrix, the more similar the products are in terms of their content.



(5). Prepare the Surprise dataset data for collaborative filtering. It loads the 'User ID', 'Product ID', and 'Rating' columns from the original DataFrame df into the data object.



(6). Define the function get_content_based_recommendations(product_id, top_n), which takes a 'product_id' as input and returns 'top_n' product IDs as content-based recommendations.

Within the function, find the index of the given 'product_id' in content_df.



(7). Calculate the similarity scores between the given product and all other products using the content_similarity matrix.



(8). Sort the similarity scores in descending order, excluding the given product itself (hence [1:top_n + 1]).



(9). Retrieve the 'Product ID' values of the top 'top_n' similar products from content_df, and return them as recommendations.



Here , we are implementing the content-based filtering component of the hybrid recommender system. We started by selecting relevant features from the dataset, including the product ID, name, brand, category, colour, and size. Then we combined these features into a single “Content” column for each product.


Next, we used the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer to convert the content into a TF-IDF feature matrix. This matrix represents the importance of each word in the content compared to the whole corpus.

We then calculated the similarity between products based on their content using the cosine similarity measure. This similarity matrix captures the similarity between each pair of products based on their content.


To get content-based recommendations, we first found the index of the target product in the similarity matrix. Then we sorted the similarity scores in descending order and selected the top N similar products. Finally, we returned the product IDs of the recommended products.



Second Approach: Collaborative Filtering
Now let’s move forward by creating a recommendation system using collaborative filtering:

In [None]:
algo = SVD()
trainset = data.build_full_trainset()
algo.fit(trainset)

def get_collaborative_filtering_recommendations(user_id, top_n):
    testset = trainset.build_anti_testset()
    testset = filter(lambda x: x[0] == user_id, testset)
    predictions = algo.test(testset)
    predictions.sort(key=lambda x: x.est, reverse=True)
    recommendations = [prediction.iid for prediction in predictions[:top_n]]
    return recommendations

(1). Create an instance of the SVD algorithm called algo.



(2). Build a full training set trainset from the dataset data. This set includes all user-item interactions (ratings) available in the dataset.



(3). Fit (train) the SVD algorithm using the trainset. The algorithm learns from the existing ratings to capture user and item preferences.



(4). Define the function get_collaborative_filtering_recommendations(user_id, top_n), which takes a 'user_id' as input and returns 'top_n' item IDs as collaborative filtering recommendations.



(5). Create a test set testset for the user with 'user_id' by removing all items that the user has already rated from the training set. This set represents the user's unrated items.



(6). Get predictions for all the items in the testset using the trained SVD algorithm algo. These predictions represent the estimated ratings the user might give to each item.



(7). Sort the predictions in descending order based on the estimated ratings (est), so the items with the highest predicted ratings come first.



(8). Extract the item IDs (iid) from the top 'top_n' predictions and store them in the 'recommendations' list.



(9). Return the 'recommendations', which contains the item IDs of the top 'top_n' items that the user might like based on collaborative filtering.

In the above code, we implemented the collaborative filtering component of the hybrid recommender system using the SVD (Singular Value Decomposition) algorithm.
First, we initialized the SVD algorithm and trained it on the dataset. This step involves decomposing the user element rating matrix to capture the underlying patterns and latent factors that drive user preferences.
To generate collaborative filtering recommendations, we then created a test set composed of user-item pairs that were not present in the training set. We have filtered this test set to only include items belonging to the target user specified by user_id.

Next, we used the trained SVD model to predict the test set item ratings. These predictions represent the estimated ratings that the user would assign to the items.

The predictions are then sorted by their estimated ratings in descending order. We selected the top N items with the highest estimated ratings as collaborative filtering recommendations for the user.

And Finally, The Hybrid Approach
Now let’s combine content-based and collaborative filtering methods to build a recommendation system using the Hybrid method:

In [None]:
def get_hybrid_recommendations(user_id, product_id, top_n):
    content_based_recommendations = get_content_based_recommendations(product_id, top_n)
    collaborative_filtering_recommendations = get_collaborative_filtering_recommendations(user_id, top_n)
    hybrid_recommendations = list(set(content_based_recommendations + collaborative_filtering_recommendations))
    return hybrid_recommendations[:top_n]

(1).  The function takes three inputs: 'user_id' (the ID of the user for whom recommendations are needed), 'product_id' (the ID of the product used for content-based recommendations), and 'top_n' (the number of top recommendations to return).


(2).  The function first calls the get_content_based_recommendations function, passing 'product_id' as an input, to get a list of content-based recommendations for the given product. These recommendations are products similar to the given product based on their content (attributes).


(3).  Next, the function calls the get_collaborative_filtering_recommendations function, passing 'user_id' as an input, to get a list of collaborative filtering recommendations for the given user. These recommendations are items that the user is likely to enjoy based on their past interactions with the system.


(4).  The hybrid_recommendations list is created by combining the content-based and collaborative filtering recommendations using the + operator. The set() function is used to remove any duplicate items that may appear in both recommendation lists.


(5).  Finally, the function returns the first 'top_n' items from the hybrid_recommendations list as the hybrid recommendations for the given user. These hybrid recommendations are a combination of content-based and collaborative filtering suggestions, aiming to provide a diverse and personalized set of top recommendations to the user.

In the above code, we combined content-based and collaborative filtering approaches to create a hybrid recommender system.
The get_hybrid_recommendations function takes the user_id, the product_id and the desired number of top_n recommendations as input.

First, it calls the get_content_based_recommendations function to retrieve a list of content-based recommendations for the specified product_id. These recommendations are based on the similarity between the characteristics of the given product and other products in the dataset.

Then it calls the get_collaborative_filtering_recommendations function to get a list of collaborative filtering recommendations for the specified user_id. These recommendations are generated by leveraging historical user-item interactions and estimating user preferences based on similar user behaviours.

Next, we combine the content-based and collaborative filtering recommendations by taking the union of the two lists. It ensures that hybrid recommendations include content-based and collaborative filtering recommendations based on user preferences.


Here’s how to use our hybrid recommendation system to recommend products based on the product that a user is viewing:

(1). The code sets the user_id to 6, which represents the ID of the user for whom recommendations are needed.


(2). The product_id is set to 11, which represents the ID of the product used as a reference for content-based recommendations. The content-based recommendations will be products similar to this reference product.

(3). The variable top_n is set to 10, which indicates the number of top recommendations the system should provide to the user.

(4). The get_hybrid_recommendations function is called with user_id, product_id, and top_n as inputs. This function combines content-based and collaborative filtering recommendations to create hybrid recommendations for the user.

(5). The recommendations variable stores the hybrid recommendations returned by the get_hybrid_recommendations function.

(6). The code then prints the hybrid recommendations using a loop. For each recommendation, it prints the recommendation's position in the list (i + 1), the recommended product's ID (recommendation), and the recommendation's position .

In [None]:
user_id = 6
product_id = 11
top_n = 10
recommendations = get_hybrid_recommendations(user_id, product_id, top_n)

print(f"Hybrid Recommendations for User {user_id} based on Product {product_id}:")
for i, recommendation in enumerate(recommendations):
    print(f"{i + 1}. Product ID: {recommendation}")
    print(f"{i + 1}. Product ID: {recommendation}")

Hybrid Recommendations for User 6 based on Product 11:
1. Product ID: 962
1. Product ID: 962
2. Product ID: 1188
2. Product ID: 1188
3. Product ID: 1104
3. Product ID: 1104
4. Product ID: 208
4. Product ID: 208
5. Product ID: 1234
5. Product ID: 1234
6. Product ID: 1143
6. Product ID: 1143
7. Product ID: 440
7. Product ID: 440
8. Product ID: 1113
8. Product ID: 1113
9. Product ID: 1147
9. Product ID: 1147
10. Product ID: 957
10. Product ID: 957


Summary


So this is how to create a hybrid recommendation system using Python. A hybrid recommendation system combines multiple recommendation techniques to provide more accurate and diverse recommendations to users. It uses the strengths of different approaches, such as collaborative filtering and content-based filtering, to overcome their limitations and improve the recommendation process.

Content-based Filtering:




Content-based filtering recommends items based on their attributes or features. It uses user profiles and item attributes to identify items that are similar to the ones a user has shown interest in before.




Example: Music Recommendations



Suppose we have a music recommendation system where users have rated songs and provided tags to each song. Content-based filtering will analyze the tags and attributes of songs a user has liked to recommend similar songs.

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Sample data representing songs and their tags
data = {
    'Song': ['Song A', 'Song B', 'Song C', 'Song D', 'Song E'],
    'Tags': ['Pop, Dance', 'Pop, Rock', 'Rock', 'Jazz', 'Jazz, Blues']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create a TF-IDF vectorizer to convert tags into numerical vectors
tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the data
tfidf_matrix = tfidf_vectorizer.fit_transform(df['Tags'])

# Compute the cosine similarity of the TF-IDF matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function to get song recommendations based on similarity
def get_recommendations(song_name, cosine_sim=cosine_sim):
    idx = df[df['Song'] == song_name].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:4]  # Get top 3 similar songs (excluding the input song)
    song_indices = [i[0] for i in sim_scores]
    return df['Song'].iloc[song_indices]

# Get recommendations for a song
input_song = 'Song A'
recommendations = get_recommendations(input_song)
print(f"Recommendations for '{input_song}':")
print(recommendations)


Recommendations for 'Song A':
1    Song B
2    Song C
3    Song D
Name: Song, dtype: object


(1). The code imports the necessary libraries, including Pandas for data handling, and the TF-IDF vectorizer and linear_kernel from scikit-learn for text feature extraction and similarity computation.


(2). A sample dataset is provided, representing songs and their associated tags. The 'Song' column contains the song names, and the 'Tags' column contains the tags describing the genre or characteristics of each song.


(3). A DataFrame 'df' is created using the sample data, representing the songs and their tags.


(4). A TF-IDF vectorizer 'tfidf_vectorizer' is initialized to convert the tags into numerical vectors. The TF-IDF technique calculates the importance of each tag relative to the entire collection of songs.


(5). The TF-IDF matrix 'tfidf_matrix' is computed by fitting and transforming the tag data from the 'df' DataFrame using the 'tfidf_vectorizer'.


(6). The cosine similarity matrix 'cosine_sim' is computed based on the 'tfidf_matrix'. The cosine similarity measures how similar two songs are based on their tag vectors.


(7). A function 'get_recommendations(song_name, cosine_sim=cosine_sim)' is defined to provide song recommendations based on similarity. It takes the 'song_name' as input and the 'cosine_sim' matrix (defaulted to the precomputed similarity matrix).


(8). Inside the function, the index 'idx' of the input 'song_name' in the 'df' DataFrame is determined. This index corresponds to the song's position in the DataFrame.


(9). The similarity scores between the input song and all other songs are obtained from the 'cosine_sim' matrix. These scores are sorted in descending order to get the most similar songs first.


(10). The function returns the names of the top 3 most similar songs as recommendations (excluding the input song) by accessing their indices in the DataFrame and retrieving their 'Song' names.


(11). The code then uses the 'get_recommendations' function to get recommendations for the song 'Song A' and prints the results.







Example - 2

Collaborative Filtering:


Collaborative filtering is based on the idea that users who have agreed in the past will agree in the future. It makes recommendations by finding similarities between users or items based on their interactions with the system.


There are two main types of collaborative filtering:

a. User-based Collaborative Filtering: It recommends items to a target user based on the preferences of other users who are similar to the target user.



b. Item-based Collaborative Filtering: It recommends items to a target user based on the preferences of other users who

have shown similarities in their interactions with items.

Example: Movie Recommendations



Suppose we have a movie recommendation system where users rate movies on a scale of 1 to 5. Collaborative filtering will analyze the ratings of users and recommend movies based on similarities between users' preferences.

In [None]:
# Import necessary libraries
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Load data
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.25)

# Use user-based collaborative filtering
sim_options = {
    'name': 'cosine',    # Use cosine similarity
    'user_based': True   # Use user-based collaborative filtering
}
algo = KNNBasic(sim_options=sim_options)

# Train the model
algo.fit(trainset)

# Predict ratings for testset
predictions = algo.test(testset)

# Evaluate the model
accuracy = rmse(predictions)
print(f'RMSE: {accuracy}')


Dataset ml-100k could not be found. Do you want to download it? [Y/n] y
Trying to download dataset from https://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k
Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.0197
RMSE: 1.0197130892940214


In [None]:
# Import necessary libraries
from surprise import Dataset, Reader, KNNBasic
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Load your custom dataset or use ml-100k
# Modify the path or create a custom dataset based on your data
data = Dataset.load_builtin('ml-100k')

# Split the data into training and testing sets (modify test_size as needed)
trainset, testset = train_test_split(data, test_size=0.25)

# Use user-based collaborative filtering
sim_options = {
    'name': 'cosine',    # Use cosine similarity
    'user_based': True   # Use user-based collaborative filtering
}
algo = KNNBasic(sim_options=sim_options)

# Train the model
algo.fit(trainset)

# Function to get movie recommendations for a user
def get_movie_recommendations(user_id, algo, num_recommendations=5):
    # Check if the provided user ID exists in the training set
    if user_id in trainset._raw2inner_id_users:
        user_inner_id = trainset.to_inner_uid(user_id)
        user_movies = set([movie_id for movie_id, _ in trainset.ur[user_inner_id]])
        all_movies = set(trainset.all_items())
        remaining_movies = list(all_movies - user_movies)
        predictions = [algo.predict(user_inner_id, movie_id) for movie_id in remaining_movies]
        recommendations = sorted(predictions, key=lambda x: x.est, reverse=True)[:num_recommendations]
        return [trainset.to_raw_iid(prediction.iid) for prediction in recommendations]
    else:
        return []

# Example: Get movie recommendations for a specific user (change 'user_id' accordingly)
user_id = '2'  # Replace '1' with the user ID for which you want movie recommendations

# Get movie recommendations for the given user
num_recommendations = 10  # Change the number of recommendations as needed
recommendations = get_movie_recommendations(user_id, algo, num_recommendations)

# Print movie recommendations
if recommendations:
    print(f"Top {num_recommendations} movie recommendations for user {user_id}:")
    for movie_id in recommendations:
        print(movie_id)
else:
    print(f"User {user_id} does not exist in the training data.")


Computing the cosine similarity matrix...
Done computing similarity matrix.
Top 10 movie recommendations for user 2:
333
498
41
356
423
77
176
217
174
327


(1). The code imports the necessary libraries, including Surprise for collaborative filtering, and the train_test_split function for splitting the dataset into training and testing sets.



(2). The Surprise library has a built-in dataset ('ml-100k') that contains movie ratings by users. If you want to use your own dataset, you can modify the 'data' variable to load your custom dataset.



(3). The dataset is split into training and testing sets using the train_test_split function. This is a common practice to evaluate the model's performance.



(4). The recommendation model is initialized with user-based collaborative filtering using the KNNBasic algorithm with cosine similarity. The KNNBasic algorithm looks for similar users to make recommendations.



(5).  The model is trained on the training set using the fit method.



(6). A function 'get_movie_recommendations' is defined to get movie recommendations for a specific user.



(7) . The function checks if the provided 'user_id' exists in the training set. If not, it returns an empty list since the user is not present in the data.



(8).  If the user exists in the training set, the function finds the internal user ID using 'trainset.to_inner_uid(user_id)' and retrieves the movies the user has already rated.



(9).  The function then creates a list of remaining movies that the user has not rated.



(10).  For each remaining movie, the function predicts the user's rating using the collaborative filtering model (KNNBasic) and stores the predictions.



(11).  The predictions are sorted based on the estimated ratings (est) in descending order, and the top 'num_recommendations' movies are selected as recommendations.



(12).   The function returns the 'num_recommendations' movie IDs as recommendations for the given user.



(13).  The code provides an example by getting movie recommendations for 'user_id = '2'' and printing the recommendations to the console.
