# Hybrid Recommender System

## Introduction
This notebook aims to create a hybrid recommender system by combining content-based filtering and collaborative filtering techniques. By leveraging the strengths of both methods, the hybrid system can provide more accurate and personalized recommendations.

### Content-Based Filtering
Content-based filtering recommends items based on the attributes of the items and the preferences of the user. It uses item features and user profiles to find items similar to those the user has liked in the past.

### Collaborative Filtering
Collaborative filtering recommends items by analyzing patterns in user-item interactions. It uses historical user ratings to find similarities between users or items, making predictions based on the collective behavior of all users.

## Hybrid Approach
By integrating content-based filtering with collaborative filtering, the hybrid recommender system can:
- **Enhance Accuracy**: Utilize both item attributes and user interaction patterns to make more informed recommendations.
- **Address Cold Start Problem**: Mitigate the limitations of each method individually, such as the cold start problem in collaborative filtering and the requirement for detailed item attributes in content-based filtering.
- **Improve Personalization**: Combine user preferences with overall user behavior to provide highly personalized recommendations.

## Outline
1. **Initialization**: Import necessary libraries and set random seeds for reproducibility.
2. **Data Preprocessing**: Prepare the dataset, normalize ratings, and handle missing values.
3. **Content-Based Filtering**: Implement the content-based filtering approach using item attributes.
4. **Collaborative Filtering**: Implement the collaborative filtering approach using user-item interaction data.
5. **Hybrid Model**: Combine the results of both content-based and collaborative filtering to create the final hybrid recommendations.
6. **Evaluation**: Assess the performance of the hybrid recommender system using appropriate metrics.
7. **Conclusion**: Summarize the benefits and performance of the hybrid approach.

By following these steps, this notebook will demonstrate how to build a robust hybrid recommender system that leverages the advantages of both content-based and collaborative filtering techniques.

In [194]:
import numpy as np
import pandas as pd
import tensorflow as tf

# Collaborative Filtering

## Prepare the data

In [195]:
rating=pd.read_csv('/content/rating_sampled.csv')
anime = pd.read_csv('/content/anime.csv')


In [196]:
rating = rating.sample(n=10000,random_state=44)

In [197]:
rating_pivot = rating.pivot(index='anime_id', columns='user_id', values='rating')
binary_pivot = rating_pivot.notna().astype(int)

In [198]:
Y = np.array(rating_pivot)
R = np.array(binary_pivot)

# Set dimensions
num_users,num_movies = Y.shape
num_features = 10

# Initialize X, W, and b with random values
X = np.random.rand(num_movies, num_features)
W = np.random.rand(num_users, num_features)
b = np.random.rand(1, num_users)

# Display shapes to confirm the correct initialization
print("Shape of Y:", Y.shape)
print("Shape of R:", R.shape)
print("Shape of X:", X.shape)
print("Shape of W:", W.shape)
print("Shape of b:", b.shape)

Shape of Y: (3395, 9588)
Shape of R: (3395, 9588)
Shape of X: (9588, 10)
Shape of W: (3395, 10)
Shape of b: (1, 3395)


## Define Function

In [199]:
# Load the pre-trained model variables from the .npz file
def load_model(file_path):
    data = np.load(file_path)
    W = tf.Variable(data['W'], dtype=tf.float64, name='W')
    X = tf.Variable(data['X'], dtype=tf.float64, name='X')
    b = tf.Variable(data['b'], dtype=tf.float64, name='b')
    Ymean = data['Ymean']
    return W, X, b, Ymean

# Normalize new user's ratings
def normalize_ratings(Y, R):
    Ymean = np.zeros((Y.shape[0], 1))
    Ynorm = np.zeros(Y.shape)
    for i in range(Y.shape[0]):
        idx = np.where(R[i] == 1)[0]
        if len(idx) > 0:
            Ymean[i] = np.mean(Y[i, idx])
            Ynorm[i, idx] = Y[i, idx] - Ymean[i]
    return Ynorm, Ymean

# Make recommendations
def recommend_movies(user_ratings, W, X, b, Ymean, movie_list, num_recommendations=10):
    num_movies, num_users = X.shape[0], W.shape[0]

    # Make predictions for the new user
    p = np.matmul(X, W.T) + b
    pm = p + Ymean

    my_predictions = pm[:, 0]

    # Sort the predictions
    ix = np.argsort(my_predictions)[::-1]  # Sort in descending order
    # Prepare recommendations
    recommendations = []
    for i in range(num_recommendations):
        j = ix[i]
        if user_ratings[j] == 0:
            recommendations.append((movie_list[j], my_predictions[j]))

    return recommendations,my_predictions


In [200]:
# Load your model
W, X, b, Ymean = load_model('/content/cofi_model8.npz')
W=np.array(W)
X=np.array(X)

# Recommendation

In [201]:
# Example usage
user_ratings = np.array(Y.T[2026])
user_ratings = np.nan_to_num(user_ratings)

# movie_list = [f"Movie {i}" for i in range(2648)]
movie_list=np.array(anime[anime['MAL_ID'].isin(rating_pivot.index)]['Name'])

In [202]:
recommendations,my_predictions = recommend_movies(user_ratings, W, X, b, Ymean, movie_list, num_recommendations=10)

print("Top recommendations for the user:")
for movie, rating in recommendations:
    print(f"Predicting rating {rating:.2f} for movie {movie}")

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(user_ratings)):
    if user_ratings[i] > 0:
        print(f'Original {user_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movie_list[i]}')


Top recommendations for the user:
Predicting rating 8.67 for movie Gakuen Handsome
Predicting rating 8.67 for movie Sin: Nanatsu no Taizai Zange-roku Specials
Predicting rating 8.67 for movie Kaidan Restaurant
Predicting rating 8.67 for movie Solty Rei
Predicting rating 8.67 for movie Quanzhi Fashi III
Predicting rating 8.67 for movie Major: World Series
Predicting rating 8.67 for movie Shiofuki Mermaid
Predicting rating 8.67 for movie Dragon Ball Z Special 2: Zetsubou e no Hankou!! Nokosareta Chousenshi - Gohan to Trunks
Predicting rating 8.67 for movie Shinryaku!! Ika Musume
Predicting rating 8.67 for movie Gakuen Utopia Manabi Straight!


Original vs Predicted ratings:

Original 8.0, Predicted 5.92 for Demi-chan wa Kataritai: Demi-chan no Natsuyasumi


In [203]:
prediction = np.array(my_predictions).reshape(1,-1)
prediction

array([[7.81035529, 6.41745071, 6.73408533, ..., 8.66745036, 3.66745038,
        4.66745036]])

In [204]:
movie_list = movie_list.reshape(-1,1)

In [205]:
len(rating_pivot.index)

3395

In [206]:
print(movie_list.shape)
print(prediction.T.shape)

(3395, 1)
(3395, 1)


In [207]:
rec_collaborative=pd.DataFrame(np.hstack([movie_list,prediction.T]),columns=['English name','Pred_Collaborative'])

# Content Based Filtering

## Prepare the data

In [208]:
rating = pd.read_csv('/content/rating_complete.csv')

In [209]:
# Create user dataframe
user=pd.DataFrame()
user['user_id']=rating['user_id'].unique()

In [210]:
# Generate random value for age, gender, and nationality
user['age'] = np.random.randint(18, 66, size=len(user))
user['gender'] = np.random.choice(['Male', 'Female'], size=len(user))
nationalities = ['American', 'Canadian', 'British', 'Australian', 'Indian', 'Chinese', 'German', 'French', 'Japanese', 'Brazilian']
user['nationality'] = np.random.choice(nationalities, size=len(user))

In [211]:
model = tf.keras.models.load_model('/content/model1.h5')

In [212]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

In [213]:
# Encode categorical variables
le_gender = LabelEncoder()
le_nationality = LabelEncoder()

user['gender'] = le_gender.fit_transform(user['gender'])
user['nationality'] = le_nationality.fit_transform(user['nationality'])

# Normalize age
scaler = MinMaxScaler()
user['age'] = scaler.fit_transform(user[['age']])


In [214]:
# Split genres and explode the dataframe
df_exploded = anime.assign(Genres=anime['Genres'].str.split(', ')).explode('Genres')

# Get the top 5 most frequent genres
top_genres = df_exploded['Genres'].value_counts().head(4).index

# One-hot encode top genres
for genre in top_genres:
    anime[genre] = anime['Genres'].apply(lambda x: 1 if genre in x else 0)

In [215]:
anime['Type'] = anime['Type'].astype('category').cat.codes


In [216]:
# Convert from object to numeric
anime['Score'] = pd.to_numeric(anime['Score'], errors='coerce').fillna(0.0)
anime['Score-1'] = pd.to_numeric(anime['Score-1'], errors='coerce').fillna(0.0)
anime['Score-2'] = pd.to_numeric(anime['Score-2'], errors='coerce').fillna(0.0)
anime['MAL_ID'] = pd.to_numeric(anime['MAL_ID'], errors='coerce').fillna(0.0)

In [217]:
user_0=user[user.user_id==0].drop('user_id',axis=1)
user_0 = np.tile(user_0, (len(anime),1))
rec=model.predict([user_0, np.array(anime[['Score','Score-1','Score-2','Type','Comedy','Action','Fantasy','Adventure']])])



In [218]:
rec_merged=np.hstack([ np.array(anime[['MAL_ID','Score']]),rec])
rec_merged_df = pd.DataFrame(rec_merged, columns=['MAL_ID','Score','Pred_Content'])

In [219]:
rec_content = rec_merged_df.merge(anime, on='MAL_ID', how='inner')
rec_content=rec_content[['MAL_ID','English name','Pred_Content']]

In [220]:
rec_content.sort_values(by='Pred_Content',ascending=False).head(10)

Unnamed: 0,MAL_ID,English name,Pred_Content
17561,48492.0,Unknown,10.0
0,1.0,Cowboy Bebop,10.0
1,5.0,Cowboy Bebop:The Movie,10.0
2,6.0,Trigun,10.0
3,7.0,Witch Hunter Robin,10.0
4,8.0,Beet the Vandel Buster,10.0
5,15.0,Unknown,10.0
6,16.0,Honey and Clover,10.0
7,17.0,Unknown,10.0
8,18.0,Unknown,10.0


## Combine Recommender

In [221]:
rec_collaborative.head()

Unnamed: 0,English name,Pred_Collaborative
0,Cowboy Bebop,7.810355
1,Cowboy Bebop: Tengoku no Tobira,6.417451
2,Trigun,6.734085
3,Witch Hunter Robin,8.66745
4,Eyeshield 21,7.167449


In [222]:
rec_content.head()

Unnamed: 0,MAL_ID,English name,Pred_Content
0,1.0,Cowboy Bebop,10.0
1,5.0,Cowboy Bebop:The Movie,10.0
2,6.0,Trigun,10.0
3,7.0,Witch Hunter Robin,10.0
4,8.0,Beet the Vandel Buster,10.0


In [223]:
rec_combine = pd.merge(rec_collaborative, rec_content, on='English name')

In [224]:
rec_combine['Pred_Combine']=(rec_combine['Pred_Collaborative']+rec_combine['Pred_Content'])/2

In [225]:
rec_combine.sort_values('Pred_Combine',ascending=False).drop('MAL_ID',axis=1).head(10)

Unnamed: 0,English name,Pred_Collaborative,Pred_Content,Pred_Combine
444,Gakuen Handsome,8.667454,10.0,9.333727
126,On Your Mark,8.66745,10.0,9.333725
6,Texhnolyze,8.66745,10.0,9.333725
48,Slam Dunk,8.66745,10.0,9.333725
282,Scan2Go,8.66745,10.0,9.333725
150,Sugar Sugar Rune,8.66745,10.0,9.333725
92,Fantastic Children,8.66745,10.0,9.333725
221,Shugo Chara!! Doki,8.66745,10.0,9.333725
154,Romeo x Juliet,8.66745,10.0,9.333725
241,Wolverine,8.66745,10.0,9.333725
