# Problem Statement

This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

[This dataset can be downloaded from here.](https://www.kaggle.com/datasets/CooperUnion/anime-recommendations-database)

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/anime-recommendations-database/rating.csv
/kaggle/input/anime-recommendations-database/anime.csv


In [2]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from mlxtend.frequent_patterns import apriori, association_rules
import re
from ast import literal_eval
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

In [3]:
ratings= pd.read_csv('/kaggle/input/anime-recommendations-database/rating.csv')
anime = pd.read_csv('/kaggle/input/anime-recommendations-database/anime.csv')

In [4]:
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [5]:
ratings.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [6]:
anime.shape, ratings.shape

((12294, 7), (7813737, 3))

### About the features

In Anime.csv

* **anime_id** - myanimelist.net's unique id identifying an anime.
* **name** - full name of anime.
* **genre** - comma separated list of genres for this anime.
* **type** - movie, TV, OVA, etc.
* **episodes** - how many episodes in this show. (1 if movie).
* **rating** - average rating out of 10 for this anime.
* **members** - number of community members that are in this anime's
"group".

In Rating.csv
 
* **user_id** - non identifiable randomly generated user id.
* **anime_id** - the anime that this user has rated.
* **rating** - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).

In [7]:
df = pd.merge(ratings,anime, on='anime_id')
df.head()

Unnamed: 0,user_id,anime_id,rating_x,name,genre,type,episodes,rating_y,members
0,1,20,-1,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,7.81,683297
1,3,20,8,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,7.81,683297
2,5,20,6,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,7.81,683297
3,6,20,-1,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,7.81,683297
4,10,20,-1,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,7.81,683297


# Content Based Recommendation System (Genre Based)

In [8]:
anime.isna().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [9]:
anime.dropna(inplace = True)

In [10]:
anime['genre'] = anime['genre'].transform(lambda x: ' '.join(x.split(', ')))

In [11]:
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(anime['genre'])
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [12]:
def recommend(string):
    
    index = anime[anime['name'].str.lower()==string.lower()].index
    all_anime = []

    for i in index:
        scores = list(enumerate(cosine_sim[i]))
        scores = sorted(scores,key=lambda x: x[1], reverse=True)[0:11]
        anime_list = [anime.iloc[n]['name'] for n,j in scores]
        all_anime.extend(anime_list)
    return all_anime

In [13]:
recommend('naruto')

['Boruto: Naruto the Movie',
 'Naruto: Shippuuden',
 'Naruto',
 'Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi',
 'Naruto x UT',
 'Naruto: Shippuuden Movie 4 - The Lost Tower',
 'Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono',
 'Naruto Shippuuden: Sunny Side Battle',
 'Naruto Soyokazeden Movie: Naruto to Mashin to Mitsu no Onegai Dattebayo!!',
 'Kyutai Panic Adventure!',
 'Naruto: Shippuuden Movie 6 - Road to Ninja']

In [14]:
recommend('Gintama')

['Gintama°',
 'Gintama&#039;',
 'Gintama Movie: Kanketsu-hen - Yorozuya yo Eien Nare',
 'Gintama&#039;: Enchousen',
 'Gintama',
 'Gintama: Yorinuki Gintama-san on Theater 2D',
 'Gintama Movie: Shinyaku Benizakura-hen',
 'Gintama: Shinyaku Benizakura-hen',
 'Gintama: Jump Festa 2014 Special',
 'Gintama: Nanigoto mo Saiyo ga Kanjin nano de Tasho Senobisuru Kurai ga Choudoyoi',
 'Gintama: Jump Festa 2015 Special']

In [15]:
recommend('Fullmetal Alchemist: Brotherhood')

['Fullmetal Alchemist: Brotherhood',
 'Fullmetal Alchemist',
 'Fullmetal Alchemist: The Sacred Star of Milos',
 'Fullmetal Alchemist: Brotherhood Specials',
 'Tales of Vesperia: The First Strike',
 'Tide-Line Blue',
 'Fullmetal Alchemist: Reflections',
 'Magi: The Kingdom of Magic',
 'Magi: The Labyrinth of Magic',
 'Magi: Sinbad no Bouken (TV)',
 'Magi: Sinbad no Bouken']

In [16]:
recommend('Shingeki no Kyojin')

['Shingeki no Kyojin',
 'Shingeki no Kyojin OVA',
 'Shingeki no Kyojin Movie 2: Jiyuu no Tsubasa',
 'Shingeki no Kyojin Movie 1: Guren no Yumiya',
 'Shingeki no Kyojin: Ano Hi Kara',
 'Saint Seiya: Meiou Hades Elysion-hen',
 'One Piece',
 'One Piece: Episode of Merry - Mou Hitori no Nakama no Monogatari',
 'One Piece: Episode of Nami - Koukaishi no Namida to Nakama no Kizuna',
 'One Piece: Episode of Sabo - 3 Kyoudai no Kizuna Kiseki no Saikai to Uketsugareru Ishi',
 'One Piece: Romance Dawn']

In [17]:
recommend('One Piece')

['One Piece',
 'One Piece: Episode of Merry - Mou Hitori no Nakama no Monogatari',
 'One Piece: Episode of Nami - Koukaishi no Namida to Nakama no Kizuna',
 'One Piece: Episode of Sabo - 3 Kyoudai no Kizuna Kiseki no Saikai to Uketsugareru Ishi',
 'One Piece Film: Strong World Episode 0',
 'One Piece: Episode of Luffy - Hand Island no Bouken',
 'One Piece Movie 4: Dead End no Bouken',
 'One Piece Movie 9: Episode of Chopper Plus - Fuyu ni Saku, Kiseki no Sakura',
 'One Piece: Adventure of Nebulandia',
 'One Piece Movie 5: Norowareta Seiken',
 'One Piece: Umi no Heso no Daibouken-hen']

In [18]:
recommend('Boku no hero academia')

['Boku no Hero Academia',
 'Boku no Hero Academia: Jump Festa 2016 Special',
 'Kill la Kill',
 'Kill la Kill Special',
 'Code:Breaker',
 'Katekyo Hitman Reborn!',
 'Baka to Test to Shoukanjuu: Matsuri',
 'Big Order',
 'Big Order (TV)',
 'Kyutai Panic Adventure Returns!',
 'Tokyo Ravens']

# Collaborative Filtering

I cannot process all the data since that would take a lot of processing time. Therefore, I will use only the anime that has a rating of 8.5 or above.

In [19]:
df[df['rating_y']>8.5].shape

(666794, 9)

In [20]:
df1 = df[df['rating_y']>8.5].drop_duplicates()

In [21]:
df1.name.nunique()

104

In [22]:
crosstab = pd.crosstab(df1['user_id'],df['name']).astype('bool')

In [23]:
freq_anime = apriori(crosstab,min_support=0.05,use_colnames=True)
rules = association_rules(freq_anime,metric='confidence',min_threshold=0.1)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Ano Hi Mita Hana no Namae wo Bokutachi wa Mad...,(Baccano!),0.263609,0.171726,0.077207,0.292885,1.705534,0.031939,1.171343
1,(Baccano!),(Ano Hi Mita Hana no Namae wo Bokutachi wa Mad...,0.171726,0.263609,0.077207,0.449593,1.705534,0.031939,1.337905
2,(Bakuman. 2nd Season),(Ano Hi Mita Hana no Namae wo Bokutachi wa Mad...,0.095621,0.263609,0.056901,0.595068,2.257393,0.031694,1.818556
3,(Ano Hi Mita Hana no Namae wo Bokutachi wa Mad...,(Bakuman. 2nd Season),0.263609,0.095621,0.056901,0.215854,2.257393,0.031694,1.15333
4,(Ano Hi Mita Hana no Namae wo Bokutachi wa Mad...,(Boku dake ga Inai Machi),0.263609,0.142407,0.080403,0.305008,2.1418,0.042863,1.23396


In [24]:
rules['antecedents'] = rules['antecedents'].apply(lambda x: list(x)[0])

In [25]:
rules['consequents'] = rules['consequents'].apply(lambda x: list(x)[0])

In [26]:
def search_apriori(string):
    search_df = rules[rules['antecedents'].str.lower()== string.lower()]
    search_df.sort_values(by='lift', ascending=False)
    return search_df[:10]['consequents'].to_list()

In [27]:
search_apriori('Shingeki no Kyojin')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Baccano!',
 'Bakuman. 2nd Season',
 'Bakuman. 3rd Season',
 'Boku dake ga Inai Machi',
 'Clannad: After Story',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Cowboy Bebop',
 'Death Note']

In [28]:
search_apriori('Fullmetal Alchemist: Brotherhood')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Baccano!',
 'Bakuman. 2nd Season',
 'Bakuman. 3rd Season',
 'Boku dake ga Inai Machi',
 'Clannad: After Story',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Cowboy Bebop',
 'Death Note']

In [29]:
search_apriori('One Punch Man')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Baccano!',
 'Boku dake ga Inai Machi',
 'Clannad: After Story',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Cowboy Bebop',
 'Death Note',
 'Fate/Zero',
 'Fate/Zero 2nd Season']

In [30]:
search_apriori('Haikyuu!!')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Boku dake ga Inai Machi',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Death Note',
 'Fullmetal Alchemist: Brotherhood',
 'Haikyuu!! Second Season',
 'Hunter x Hunter (2011)',
 'Kiseijuu: Sei no Kakuritsu',
 'Kuroko no Basket 2nd Season']

In [31]:
search_apriori('Hunter x Hunter (2011)')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Boku dake ga Inai Machi',
 'Clannad: After Story',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Death Note',
 'Fate/Zero',
 'Fate/Zero 2nd Season',
 'Fullmetal Alchemist: Brotherhood',
 'Haikyuu!!']

In [32]:
search_apriori('Death Note')

['Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.',
 'Baccano!',
 'Bakuman. 2nd Season',
 'Bakuman. 3rd Season',
 'Boku dake ga Inai Machi',
 'Clannad: After Story',
 'Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Cowboy Bebop',
 'Evangelion: 2.0 You Can (Not) Advance']

In [33]:
search_apriori('Gintama')

['Code Geass: Hangyaku no Lelouch',
 'Code Geass: Hangyaku no Lelouch R2',
 'Death Note',
 'Fullmetal Alchemist: Brotherhood',
 'Gintama&#039;',
 'Shingeki no Kyojin',
 'Code Geass: Hangyaku no Lelouch R2']