<h1> Hybrid Recommendation System </h1>

<div style="width:100%; text-align: left;"> <img align=middle src="https://www.researchgate.net/profile/Xiangjie-Kong-2/publication/330077673/figure/fig5/AS:710433577107459@1546391972632/A-hybrid-paper-recommendation-system.png"> </div>

## Table of contents
* [General info](#general-info)
* [Dataset info](#dataset-info)
* [Project info](#project-info)
* [Technologies](#technologies)
* [Setup](#setup)
* [Developments](#developments)

## General info
Recommendation systems have been around us for quite some time now. Youtube, Facebook, Amazon, and many others provide some sort of recommendations to their users.   

Here, we explore the relationship between the pair of items (the user who bought Y, also bought Z). We find the missing rating with the help of the ratings given to the other items by the user.   

It was first invented and used by Amazon in 1998. Rather than matching the user to similar customers, item-to-item collaborative filtering matches each of the user’s purchased and rated items to similar items, then combines those similar items into a recommendation list.



## Dataset info
The dataset was provided by MovieLens, a movie recommendation service. It contains the rating scores for these movies along with the movies. It contains 2,000,0263 ratings across 27,278 movies. This data was created by 138,493 users between 09 January 1995 and 31 March 2015. This data set was created on October 17, 2016. Users are randomly selected. It is known that all selected users voted for at least 20 movies.
## Project info
Make an estimate for the user whose ID is given, using the item-based and user-based recommender methods.
Here,we tried to create our own movie database with ratings by using "movies.csv" and "ratings.csv"(link down below)  
We calculated a suggestion with calculating the correlation between the entered movie and other movies. Hence we suggest a movie for user by using correlations among them.

## Technologies
Project is created with:
* PyCharm: 2021.3 
* Pandas: 1.3.4 (especially "corrwith")


	
## Setup
To run this project, just run the functions at the bottom of code and call "item_based_recommender". That's it!

## Developments 
It can be achieved more precise results merging both item-based and user-based recommendation by calculating "Weighted Average Recommendation Score" and correlations. It is developed based on the previous projects. It become more likely a recommendation system like Netflix uses which is a hybrid system.





In [1]:
import pandas as pd
import numpy as np

##### Part 1 #####

# change these paths with your own path of database
# movielens-20m-dataset is used on kaggle
movie = pd.read_csv('../input/movielens-20m-dataset/movie.csv')
rating = pd.read_csv('../input/movielens-20m-dataset/rating.csv')

def create_movie_df(nr_drop_movies):
    #merge datasets
    df = movie.merge(rating, how="left", on="movieId")
    rating_count = pd.DataFrame(df["title"].value_counts())
    #nr_drop_movies= number of dropped movies from list
    rare_movies = rating_count[rating_count["title"] <= nr_drop_movies].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

user_movie_df = create_movie_df(nr_drop_movies = 10000)
random_user = int(pd.Series(user_movie_df.index).sample(1, random_state=40).values)

user_movie_df.head()

title,10 Things I Hate About You (1999),12 Angry Men (1957),2001: A Space Odyssey (1968),28 Days Later (2002),300 (2007),A.I. Artificial Intelligence (2001),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),...,Wild Wild West (1999),William Shakespeare's Romeo + Juliet (1996),Willy Wonka & the Chocolate Factory (1971),Witness (1985),"Wizard of Oz, The (1939)","X-Files: Fight the Future, The (1998)",X-Men (2000),X2: X-Men United (2003),You've Got Mail (1998),Young Frankenstein (1974)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,3.5,3.5,,,,,,,...,,,,,3.5,,,4.0,,4.0
2.0,,,5.0,,,,,,,,...,,,,,,,,,,
3.0,,,5.0,,,,3.0,,,,...,,,5.0,4.0,4.0,5.0,,,,5.0
4.0,,,,,,,,,3.0,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,2.0,,,,,,,


In [2]:
##### Part 2 #####

random_user_df = user_movie_df[user_movie_df.index == random_user]
movies_watched = random_user_df.columns[random_user_df.notna().any()].tolist()

movies_watched

['American Beauty (1999)',
 'Austin Powers: The Spy Who Shagged Me (1999)',
 'Braveheart (1995)',
 'Casablanca (1942)',
 'Chicken Run (2000)',
 'Citizen Kane (1941)',
 'Exorcist, The (1973)',
 'Fight Club (1999)',
 'Forrest Gump (1994)',
 'Ghostbusters (a.k.a. Ghost Busters) (1984)',
 'L.A. Confidential (1997)',
 'Some Like It Hot (1959)',
 'Star Wars: Episode I - The Phantom Menace (1999)',
 'Star Wars: Episode VI - Return of the Jedi (1983)',
 'Titanic (1997)',
 'Unbreakable (2000)',
 'Who Framed Roger Rabbit? (1988)']

In [3]:
##### Part 3 #####

movies_watched_df = user_movie_df[movies_watched]
user_movie_count = movies_watched_df.T.notnull().sum()
user_movie_count = user_movie_count.reset_index()
user_movie_count.columns = ["userId", "movie_count"]
users_same_movies = user_movie_count[user_movie_count["movie_count"] > len(movies_watched)*0.6]["userId"]
final_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(users_same_movies)],
                      random_user_df[movies_watched]])

users_same_movies.head()

6        7.0
23      24.0
57      58.0
90      91.0
115    116.0
Name: userId, dtype: float64

In [4]:
final_df.head()

title,American Beauty (1999),Austin Powers: The Spy Who Shagged Me (1999),Braveheart (1995),Casablanca (1942),Chicken Run (2000),Citizen Kane (1941),"Exorcist, The (1973)",Fight Club (1999),Forrest Gump (1994),Ghostbusters (a.k.a. Ghost Busters) (1984),L.A. Confidential (1997),Some Like It Hot (1959),Star Wars: Episode I - The Phantom Menace (1999),Star Wars: Episode VI - Return of the Jedi (1983),Titanic (1997),Unbreakable (2000),Who Framed Roger Rabbit? (1988)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
7.0,3.0,3.0,,5.0,,,,4.0,4.0,4.0,3.0,,4.0,5.0,5.0,3.0,4.0
24.0,5.0,5.0,4.0,,,,,5.0,5.0,3.0,4.0,,4.0,5.0,4.0,4.0,
58.0,4.5,4.5,5.0,,4.5,5.0,5.0,5.0,4.5,4.0,5.0,,,,4.0,4.5,5.0
91.0,4.5,4.0,5.0,,4.0,,4.5,5.0,4.0,3.5,3.5,4.5,3.0,4.5,3.5,3.5,4.0
116.0,4.5,3.5,4.5,,,,1.0,5.0,4.0,2.5,4.0,,2.0,5.0,0.5,2.0,3.0


In [5]:
##### Part 4 #####

corr_df = final_df.T.corr().unstack().sort_values().drop_duplicates()
corr_df = pd.DataFrame(corr_df, columns=["corr"])
corr_df.index.names = ['user_id_1', 'user_id_2']
corr_df = corr_df.reset_index()

corr_df.head(20)

Unnamed: 0,user_id_1,user_id_2,corr
0,53799.0,71097.0,-1.0
1,78879.0,57365.0,-1.0
2,81204.0,78675.0,-1.0
3,79601.0,96593.0,-1.0
4,123681.0,109388.0,-1.0
5,123769.0,61839.0,-1.0
6,73049.0,41083.0,-1.0
7,104669.0,100366.0,-1.0
8,9709.0,63198.0,-1.0
9,73049.0,78167.0,-1.0


In [6]:
top_users = corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"] >= 0.65)][
    ["user_id_2", "corr"]].reset_index(drop=True)
top_users = top_users.sort_values(by='corr', ascending=False)
top_users.rename(columns={"user_id_2": "userId"}, inplace=True)
top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how='inner')
top_users_ratings = top_users_ratings[top_users_ratings["userId"] != random_user]

top_users_ratings.head(20)

Unnamed: 0,userId,corr,movieId,rating
0,113381.0,0.941446,1,5.0
1,113381.0,0.941446,10,2.0
2,113381.0,0.941446,17,2.0
3,113381.0,0.941446,21,4.0
4,113381.0,0.941446,28,2.0
5,113381.0,0.941446,32,4.0
6,113381.0,0.941446,35,4.0
7,113381.0,0.941446,39,2.0
8,113381.0,0.941446,50,5.0
9,113381.0,0.941446,58,4.0


In [7]:
##### Part 5 #####

top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']
top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})
top_users_ratings.head()

recommendation_df = top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})
recommendation_df = recommendation_df.reset_index()



user_movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > 3.5].sort_values("weighted_rating", ascending=False)
user_movies_to_be_recommend = user_movies_to_be_recommend.merge(movie[["movieId", "title"]])

user_movies_to_be_recommend

Unnamed: 0,movieId,weighted_rating,title
0,2776,4.707231,"Marcello Mastroianni: I Remember Yes, I Rememb..."
1,88022,4.673502,Hot Coffee (2011)
2,2086,4.334632,One Magic Christmas (1985)
3,7106,4.331753,Black and White in Color (Noirs et blancs en c...
4,26416,4.270934,Best Boy (1979)
...,...,...,...
243,68069,3.510723,Ride Lonesome (1959)
244,69988,3.507290,Humpday (2009)
245,62801,3.507050,Lone Wolf and Cub: Baby Cart to Hades (Kozure ...
246,8133,3.506060,"Inheritance, The (Arven) (2003)"


In [8]:
##### Part 6 #####

movie_id=rating[(rating["userId"]==random_user)&(rating["rating"]==5)].sort_values("timestamp",ascending=False)["movieId"][0:1].values[0]
movie_title=movie[movie["movieId"]==movie_id]["title"].values[0]

def item_based_recommender(movie_name, user_movie_df):
    movie_na = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie_na).sort_values(ascending=False)


################
item_based = item_based_recommender(movie_title, user_movie_df).reset_index()['title']
user_based = user_movies_to_be_recommend['title']


hybrid_recommendation=item_based[1:11]
hybrid_recommendation[5:10]=user_based[1:6]
hybrid_recommendation.head(10)

1                             Back to the Future (1985)
2                              Beverly Hills Cop (1984)
3     Raiders of the Lost Ark (Indiana Jones and the...
4                                       Gremlins (1984)
5                                  Lethal Weapon (1987)
6                                     Hot Coffee (2011)
7                            One Magic Christmas (1985)
8     Black and White in Color (Noirs et blancs en c...
9                                       Best Boy (1979)
10                                  Resurrection (1980)
Name: title, dtype: object