# <p style="padding: 20px;background-color:#e88d91;margin:0;color:white;font-family:'Trebuchet MS', sans-serif;font-size:150%;text-align:center;text-shadow: -1px 1px 0 #000,1px 1px 0 #000,1px -1px 0 #000,-1px -1px 0 #000;border-radius: 1000px 1000px;overflow:hidden;font-weight:500">Anime Recommender System</p>

<p style="text-align:center; ">
<img src="https://wallpaperaccess.com/full/39033.png" style='width: 600px; height: 300px;'>
</p>

<p><h1><center><ins>Contributors</ins><br><br>Alihan Demirel<br>Cagkan Gursoy<br>Melisa Gözet</center></h1></p>


**This project is an Hybrid Recommendation System for users that rated different animes. It is based on both user preferences and anime ratings.**

## <ins>**Importing Libraries**<ins>

In [None]:
import pandas as pd
import numpy as np
import random
from mlxtend.frequent_patterns import apriori, association_rules
pd.pandas.set_option('display.max_columns', None)
pd.pandas.set_option('display.width', 300)

This dataset contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this dataset is a compilation of those ratings.

$\rightarrow$ **anime.csv**

$\bigstar$ **anime_id :** myanimelist.net's unique id identifying an anime.  
$\bigstar$ **name :** full name of anime.  
$\bigstar$ **genre :** comma separated list of genres for this anime.  
$\bigstar$ **type :** movie, TV, OVA, etc.  
$\bigstar$ **episodes :** how many episodes in this show. (1 if movie).  
$\bigstar$ **rating :** average rating out of 10 for this anime.  
$\bigstar$ **members :** number of community members that are in this anime's "group".

$\rightarrow$ **rating.csv**

$\bigstar$ **user_id :** non identifiable randomly generated user id.  
$\bigstar$ **anime_id :** the anime that this user has rated.  
$\bigstar$ **rating :** rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).

In [None]:
anime_ = pd.read_csv("/kaggle/input/anime-recommendations-database/anime.csv")


In [None]:
anime = anime_.copy()

In [None]:
rating_ = pd.read_csv("/kaggle/input/anime-recommendations-database/rating.csv")

In [None]:
rating = rating_.copy()

In [None]:
def general_info (dataframe):
    print(15*"-","Shape",15*"-")
    print(dataframe.shape,"\n")
    print(15*"-","Variable Types",15*"-")
    print(dataframe.info(),"\n")
    print(15*"-","Statistics",15*"-")
    print(dataframe.describe().T,"\n")
    print(15*"-","Null Value",15*"-")
    print(dataframe.isnull().sum(),"\n")

In [None]:
general_info(anime)

In [None]:
general_info(rating)

$\nabla$ **The -1 rating value correspnds to no rating given**

In [None]:
rating[rating["rating"] == -1].count()

$\nabla$  **Anime and rating datasets are merged on ratings,id and names to form a single dataset for analysis.**

In [None]:
anime_ratings = rating.merge(anime[["anime_id", "name"]], how = "inner" , on = "anime_id")

In [None]:
anime_ratings.head()

$\nabla$ **Ratings are filtered to drop no ratings.**

In [None]:
anime_ratings = anime_ratings[anime_ratings["rating"] >= 0]

$\nabla$ **A pivot table that includes User ID's on index, Anime names on columns and Rating on values is formed.**

In [None]:
anime_df = anime_ratings.pivot_table(index = "user_id", columns = "name", values = "rating")

In [None]:
anime_df.head()

$\nabla$ **A random user is chosen to make recommendations.**

In [None]:
random_user = random.choice(anime_df.index)
print(random_user)

$\nabla$ **random_user_df dataframe is defined which includes observations of only the random user.**

In [None]:
random_user_df = anime_df[anime_df.index == random_user]
random_user_df.shape

In [None]:
random_user_df.head()

$\nabla$ **The movies that are rated by random user is stored in a list**

In [None]:
anime_watched = random_user_df.columns[random_user_df.notna().any()].to_list()
anime_watched[0:10]

In [None]:
len(anime_watched)

$\nabla$ **The movies that are watched by random user is filtered in the anime_df dataframe.**

In [None]:
anime_watched_df = anime_df[anime_watched]
anime_watched_df.shape

$\nabla$ **A new dataframe is created, which contains the information of how many movies each user and selected random user has watched in common.**

In [None]:
anime_movie_count = anime_watched_df.T.notnull().sum()
anime_movie_count = anime_movie_count.reset_index()
anime_movie_count.columns = ["user_id", "anime_count"]
anime_movie_count.head()


In [None]:
anime_movie_count.describe([0.5,0.75,0.90,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99]).T

$\nabla$ **A threshold is defined that describes, what percent of the movies that are rated by random user is watched by other users.**

In [None]:
perc = len(anime_watched) * 25 / 100
users_same_movies = anime_movie_count[anime_movie_count["anime_count"] > perc]["user_id"]
len(users_same_movies)

$\nabla$ **anime_watched_df is filtered so that the ids of users that has anime count above the threshold are selected.**

In [None]:
final_df = anime_watched_df[anime_watched_df.index.isin(users_same_movies)]
final_df.shape

In [None]:
final_df.head()

$\nabla$ **A new dataframed is defined to find correlations between users.**

In [None]:
corr_df = final_df.T.corr().unstack().sort_values()
corr_df = pd.DataFrame(corr_df, columns=["corr"])
corr_df.index.names = ['user_id_1', 'user_id_2']
corr_df = corr_df.reset_index()
corr_df.head()

$\nabla$ **A new dataframe is defined to store users that are significantly correlated to the random user (correlation above the defined threshold).**

In [None]:
top_users = corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"] >= 0.50)][["user_id_2", "corr"]].reset_index(drop=True)
top_users = top_users.sort_values(by='corr', ascending=False)
top_users.rename(columns={"user_id_2": "user_id"}, inplace=True)
top_users.shape

In [None]:
top_users.head()

$\nabla$ **top_users dataframe is merged with anime_ratings to get  anime_id and ratings**

In [None]:
top_users_ratings = top_users.merge(anime_ratings[["user_id", "anime_id", "rating"]], how='inner')
top_users_ratings = top_users_ratings[top_users_ratings["user_id"] != random_user]
top_users_ratings["user_id"].unique()
top_users_ratings.head()

$\nabla$ **A new variable is defined as the product of correlation and rating. This includes the correlation effect on the user's ratings.**

In [None]:
top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']
top_users_ratings.head()

$\nabla$ **A dataframe is defined that includes Anime IDs and weighted rating averages.**

In [None]:
recommendation_df = top_users_ratings.groupby('anime_id').agg({"weighted_rating": "mean"})
recommendation_df = recommendation_df.reset_index()
recommendation_df.head()

$\nabla$ **recommendation_df is filtered to get animes above a certain rating threshold.**

In [None]:
recommendation_df[recommendation_df["weighted_rating"] > 5]
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > 5].sort_values("weighted_rating", ascending=False)

$\nabla$ **Lastly, 10 recommendations are made for the selected random user**

In [None]:
recommendation = movies_to_be_recommend.merge(anime[["anime_id", "name","rating"]])[["name","rating"]][0:10]
recommendation

$\nabla$ **Recommendations are exported to a .csv file**

In [None]:
recommendation.to_csv("recommendation.csv",index=False)