<h2 style='background:#11489c; border:0; color:white'><center>Hybrid Recommendation Systems</center></h2>

It is a study on hybrid recommendation systems, presented in detail and in a functionalized form

<a href="https://ibb.co/fNjqWSX"><img src="https://i.ibb.co/vd0VWBc/e902a8f8-1343-4987-b5ae-25dc76f32a72.png" alt="e902a8f8-1343-4987-b5ae-25dc76f32a72" border="0"></a>

<h2 style='background:#11489c; border:0; color:white'><center>Business Problem</center></h2>

* Item-based and for the user whose ID is given
* User-based recommender methods
* Make a guess using

<h2 style='background:#11489c; border:0; color:white'><center>About the Dataset</center></h2>

* The dataset was provided by MovieLens, a movie recommendation service

* It includes the movies and the rating scores made for these movies contains

* It contains 2,000,0263 ratings across 27,278 movies

* This data was created by 138,493 users from January 09, 1995 to March 31, 2015 was created between This data set was published on 17 October 2016 was created

* Users are randomly selected. All selected users voted for at least 20 movies information is available

<h2 style='background:#11489c; border:0; color:white'><center>Variables</center></h2>

<span style="color:blue">movie.csv</span>
* movieId – Unique movie number. (UniqueID)
* title – Movie name

<span style="color:blue">rating.csv</span>
* userid – Unique user number. (UniqueID)
* movieId – Unique movie number. (UniqueID)
* rating – The rating given to the movie by the user
* timestamp – Evaluation date

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
from mlxtend.frequent_patterns import apriori, association_rules

def check_df(dataframe, head=5):
    print("########## SHAPE ##########")
    print(dataframe.shape)
    print("########## TYPES ##########")
    print(dataframe.dtypes)
    print("########## HEAD ##########")
    print(dataframe.head(head))
    print("########## TAIL ##########")
    print(dataframe.tail(head))
    print("########## NA ##########")
    print(dataframe.isnull().sum())
    print("########## QUANTILES ##########")
    print(dataframe.quantile([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

def retail_data_prep(dataframe):
    dataframe.dropna(inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe = dataframe[dataframe["Price"] > 0]
    replace_with_thresholds(dataframe, "Quantity")
    replace_with_thresholds(dataframe, "Price")
    return dataframe

def create_invoice_product_df(dataframe, id=True):
    if id:
        return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)
    else:
        return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().unstack().fillna(0). \
            applymap(lambda x: 1 if x > 0 else 0)

def check_id(dataframe, stock_code):
    product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"]].values[0].tolist()
    print(product_name)

def create_rules(dataframe, id=True, country="France"):
    dataframe = dataframe[dataframe['Country'] == country]
    dataframe = create_invoice_product_df(dataframe, id)
    frequent_itemsets = apriori(dataframe, min_support=0.01, use_colnames=True)
    rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)
    return rules

def arl_recommender(rules_df, product_id, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, product in enumerate(sorted_rules["antecedents"]):
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"])[0])

    return recommendation_list[0:rec_count]

In [2]:
import pandas as pd
pd.pandas.set_option('display.max_columns', 5)

def create_user_movie_df():
    import pandas as pd
    movie = pd.read_csv('../input/movielens-20m-dataset/movie.csv')
    rating = pd.read_csv('../input/movielens-20m-dataset/rating.csv')
    df = movie.merge(rating, how="left", on="movieId")
    comment_counts = pd.DataFrame(df["title"].value_counts())
    rare_movies = comment_counts[comment_counts["title"] <= 1000].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

In [3]:
user_movie_df = create_user_movie_df()

In [4]:
# Determining the Movies Watched by the User to Suggest

random_user = 108170
random_user_df = user_movie_df[user_movie_df.index == random_user]
movies_watched = random_user_df.columns[random_user_df.notna().any()].tolist()

In [5]:
# Accessing Data and Ids of Other Users Watching Same Movies

movies_watched_df = user_movie_df[movies_watched]
user_movie_count = movies_watched_df.T.notnull().sum()
user_movie_count = user_movie_count.reset_index()
user_movie_count.columns = ["userId", "movie_count"]
percent = len(movies_watched) * 60 / 100
users_same_movies = user_movie_count[user_movie_count["movie_count"] > percent]["userId"]

In [6]:
# Determining the Users to be Suggested and the Users Most Similar to the User

final_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(users_same_movies.index)],
                      random_user_df[movies_watched]])

corr_df = final_df.T.corr().unstack().sort_values().drop_duplicates()
corr_df = pd.DataFrame(corr_df, columns=["corr"])
corr_df.index.names = ['user_id_1', 'user_id_2']
corr_df = corr_df.reset_index()

top_users = corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"] >= 0.65)][
    ["user_id_2", "corr"]].reset_index(drop=True)

top_users = top_users.sort_values(by='corr', ascending=False)

top_users.rename(columns={"user_id_2": "userId"}, inplace=True)

rating = pd.read_csv('../input/movielens-20m-dataset/rating.csv')
top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how='inner')

In [7]:
# Calculating Weighted Average Recommendation Score and Keeping Top 5 Movies

top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']

top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})

recommendation_df = top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})
recommendation_df = recommendation_df.reset_index()

In [8]:
# Let's get weighted_rating greater than 4:
recommendation_df[recommendation_df["weighted_rating"] > 4]
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > 4].sort_values("weighted_rating", ascending=False)[0:5]

movie = pd.read_csv('../input/movielens-20m-dataset/movie.csv')
movies_to_be_recommend.merge(movie[["movieId", "title"]]).index

Int64Index([0, 1, 2, 3, 4], dtype='int64')

In [9]:
###########################################
# Step 6: Item-Based Recommendation
###########################################

# Make an item-based suggestion based on the name of the movie that the user has watched with the highest score.
# Make 10 suggestions with 5 suggestions user-based and 5 suggestions item-based.

# Clue:

# user = 108170

# movie = pd.read_csv('datasets/movie_lens_dataset/movie.csv')
# rating = pd.read_csv('datasets/movie_lens_dataset/rating.csv')
#
# Receiving the id of the movie with the most recent score from the movies that the user to be recommended gives 5 points:
# movie_id = rating[(rating["userId"] == user) & (rating["rating"] == 5.0)]. \
# sort_values(by="timestamp", ascending=False)["movieId"][0:1].values[0]
#

user = 108170

movie = pd.read_csv('../input/movielens-20m-dataset/movie.csv')
rating = pd.read_csv('../input/movielens-20m-dataset/rating.csv')

# Receiving the id of the movie with the most recent score from the movies that the user to be recommended gives 5 points:
movie_id = rating[(rating["userId"] == user) & (rating["rating"] == 5.0)]. \
    sort_values(by="timestamp", ascending=False)["movieId"][0:1].values[0]

In [10]:
def item_based_recommender(movie_name, user_movie_df):
    movie = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie).sort_values(ascending=False).head(10)


movies_from_item_based = item_based_recommender(movie[movie["movieId"] == movie_id]["title"].values[0], user_movie_df)

In [11]:
#1 to 6th. 0 has the movie itself. We left him out.
movies_from_item_based[1:6]

title
My Science Project (1985)                0.570187
Mediterraneo (1991)                      0.538868
Old Man and the Sea, The (1958)          0.536192
National Lampoon's Senior Trip (1995)    0.533029
Clockwatchers (1997)                     0.483337
dtype: float64