# Anime Recommendation: Content-Based System

## Overview
This project aims to build an anime recommendation system for new members and current subscribers of an anime streaming service. New members can use a content-based approach to receive recommendations based on a show they may have watched or heard of previosuly. For current subscribers, collaborative filtering is used by comparing the users' ratings and returning shows similar users have rated similarly. 

## Business Understanding
The anime industry is a rapidly growing market, with new shows being released all the time. This can make it difficult for anime fans to find new shows to watch that they will enjoy. Additionally, most streaming services do not offer personalized recommendations, which can lead to users wasting time scrolling through an endless list of shows that they are not interested in.
With this project, I aim to build a recommendation system that will help anime fans discover new shows that they will enjoy. The recommendation system will use a variety of factors to make recommendations, including the user's past viewing history, the user's ratings of other shows, and the user's genre preferences. 
This recommendation system  will give a curated list to its users based on content preference and similar user's pick that will save time and provide a superb experience both novel and familiar to users. 


## Content Based
Content-based filtering is a technique that recommends items to users based on the content of those items. In the context of anime recommendation, this would involve extracting features from anime, such as the genre, plot, characters, and art style, and then recommending anime to users that are similar to anime that they have already watched.

In [1]:
import numpy as np
import pandas as pd
import re
import os
import warnings
warnings.filterwarnings('ignore')
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

In [3]:
# importing dataset
df = pd.read_csv('Data/anime_cleaned.csv', index_col=0)
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


## Content-Based With Vectorizer and Linear Kernel
Simple recommender recommends shows similar to title imputed
- for cold start
- one show watched/like and returns similar shows based on show description

In [4]:
# instantiating vectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')

tfdif_matrix = tf.fit_transform(df['genre'])

tfdif_matrix.shape

(12017, 654)

In [5]:
# linear kernel for similarity
cosine_sim = linear_kernel(tfdif_matrix, tfdif_matrix)

In [6]:
indices = pd.Series(df.index, index=df['name'])

In [7]:
indices

name
Kimi no Na wa.                                            0
Fullmetal Alchemist: Brotherhood                          1
Gintama°                                                  2
Steins;Gate                                               3
Gintama&#039;                                             4
                                                      ...  
Toushindai My Lover: Minami tai Mecha-Minami          12289
Under World                                           12290
Violence Gekiga David no Hoshi                        12291
Violence Gekiga Shin David no Hoshi: Inma Densetsu    12292
Yasuji no Pornorama: Yacchimae!!                      12293
Length: 12017, dtype: int64

In [8]:
# function to show recommendations

def show_rec(name, cosine_sim=cosine_sim):
    
    idx = indices[name]
    print(f"Title: {df['name'].iloc[idx]} | Genre: {df['genre'].iloc[idx]}")
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    show_indices = [i[0] for i in sim_scores]
    print(df[['name','rating', 'episodes']].iloc[show_indices])

In [9]:
show_rec('Naruto')

Title: Naruto | Genre: Action, Comedy, Martial Arts, Shounen, Super Power
                                                   name  rating episodes
615                                  Naruto: Shippuuden    7.94  Unknown
841                                              Naruto    7.81      220
1103  Boruto: Naruto the Movie - Naruto ga Hokage ni...    7.68        1
1343                                        Naruto x UT    7.58        1
1472        Naruto: Shippuuden Movie 4 - The Lost Tower    7.53        1
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...    7.50        1
2458               Naruto Shippuuden: Sunny Side Battle    7.26        1
2997  Naruto Soyokazeden Movie: Naruto to Mashin to ...    7.11        1
7837                      Battle Spirits: Ryuuko no Ken    4.89        1
7628                            Kyutai Panic Adventure!    5.21        1


In [11]:
show_rec('Death Note')

Title: Death Note | Genre: Mystery, Police, Psychological, Supernatural, Thriller
                               name  rating episodes
778              Death Note Rewrite    7.84        2
981                 Mousou Dairinin    7.74       13
144   Higurashi no Naku Koro ni Kai    8.41       24
1383  Higurashi no Naku Koro ni Rei    7.56        5
334       Higurashi no Naku Koro ni    8.17       26
7986                   Bloody Night    4.26        1
1238                      Shigofumi    7.62       12
1861        Himitsu: The Revelation    7.42       26
3829       Hikari to Mizu no Daphne    6.87       24
38                          Monster    8.72       74


## Recommendation
This model returns 10 shows similar to the show entered with the corresponding row numbers, the score it received and the number of episodes.

## One Hot Encoded genres, cosine similarity
- Content Based Option 2

In [12]:
genre_df = pd.read_csv('Data/one_hot_genre.csv')
genre_df

Unnamed: 0.1,Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,Unnamed: 9,",",...,n,o,p,r,s,t,u,v,w,y
0,0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,1,1,...,1,1,1,1,0,1,1,0,0,0
1,1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665,1,1,...,1,1,0,1,1,1,1,1,0,1
2,2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262,1,1,...,1,1,0,1,1,1,1,0,0,1
3,3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572,1,1,...,0,0,0,1,0,0,0,0,0,0
4,4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266,1,1,...,1,1,0,1,1,1,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211,0,0,...,1,0,0,0,0,1,0,0,0,0
12013,12290,5543,Under World,Hentai,OVA,1,4.28,183,0,0,...,1,0,0,0,0,1,0,0,0,0
12014,12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219,0,0,...,1,0,0,0,0,1,0,0,0,0
12015,12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175,0,0,...,1,0,0,0,0,1,0,0,0,0


In [43]:
features = ['episodes', 'popularity', 'score', 'Action', 'Adventure', 'Cars',
       'Comedy', 'Dementia', 'Demons', 'Drama', 'Ecchi', 'Fantasy', 'Game',
       'Harem', 'Historical', 'Horror', 'Josei', 'Kids', 'Magic',
       'Martial Arts', 'Mecha', 'Military', 'Music', 'Mystery', 'Parody',
       'Police', 'Psychological', 'Romance', 'Samurai', 'School', 'Sci-Fi',
       'Seinen', 'Shoujo', 'Shoujo Ai', 'Shounen', 'Shounen Ai',
       'Slice of Life', 'Space', 'Sports', 'Super Power', 'Supernatural',
       'Thriller', 'Vampire']

In [44]:
content_df = genre_df[features]
content_df

Unnamed: 0,episodes,popularity,score,Action,Adventure,Cars,Comedy,Dementia,Demons,Drama,...,Shoujo Ai,Shounen,Shounen Ai,Slice of Life,Space,Sports,Super Power,Supernatural,Thriller,Vampire
0,25.0,141,8.82,0,0,0,1,0,0,1,...,0,1,0,0,0,1,0,0,0,0
1,22.0,28,8.83,0,0,0,0,0,0,1,...,0,1,0,0,0,0,0,0,0,0
2,13.0,98,8.83,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,64.0,4,9.23,1,1,0,1,0,0,1,...,0,1,0,0,0,0,0,0,0,0
4,1.0,502,8.83,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13849,1.0,2382,7.50,1,0,0,1,0,0,0,...,0,1,0,0,0,0,1,0,0,0
13850,12.0,1648,7.50,0,0,0,1,0,0,0,...,0,0,0,1,0,0,0,1,0,0
13851,12.0,1547,7.56,0,0,0,1,0,0,0,...,0,1,0,1,0,0,0,0,0,0
13852,1.0,2154,7.56,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [45]:
sim = cosine_similarity(content_df)

In [46]:
sim

array([[1.        , 0.86820285, 0.99837093, ..., 0.98427982, 0.98295126,
        0.98336835],
       [0.86820285, 1.        , 0.85200872, ..., 0.76759192, 0.76290634,
        0.76458632],
       [0.99837093, 0.85200872, 1.        , ..., 0.98854292, 0.98750135,
        0.98801588],
       ...,
       [0.98427982, 0.76759192, 0.98854292, ..., 1.        , 0.99997173,
        0.99996894],
       [0.98295126, 0.76290634, 0.98750135, ..., 0.99997173, 1.        ,
        0.9999837 ],
       [0.98336835, 0.76458632, 0.98801588, ..., 0.99996894, 0.9999837 ,
        1.        ]])

In [47]:
def show_rec2(title, cosine_sim=sim):
    
    idx = indices[title]
    
    print(f"Title: {genre_df['title'].iloc[idx]} | Genre: {genre_df['genre'].iloc[idx]}")
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    show_indices = [i[0] for i in sim_scores]
    return pd.DataFrame({'Anime Name': df['title'].iloc[show_indices].values,
                        'Score': df['score'].iloc[show_indices].values,
                        'Number Of Episodes': df['episodes'].iloc[show_indices].values})


In [48]:
show_rec2('Naruto')

Title: Naruto | Genre: ['Action', 'Adventure', 'Comedy', 'Super Power', 'Martial Arts', 'Shounen']


Unnamed: 0,Anime Name,Score,Number Of Episodes
0,Naruto: Shippuuden,8.2,500.0
1,Bleach,7.87,366.0
2,Fairy Tail,7.93,175.0
3,Hunter x Hunter (2011),9.11,148.0
4,Fullmetal Alchemist: Brotherhood,9.23,64.0
5,Death Note,8.65,37.0
6,Dragon Ball Z,8.27,291.0
7,Sword Art Online,7.49,25.0
8,Shingeki no Kyojin,8.47,25.0
9,Steins;Gate,9.11,24.0


## Recommendation
This second recommendation option, uses cosine similary and returns the same features for the shows, with a slightly different selection because it is using features other than the description of the show. As we saw with the first option, the recommendations similar to Naruto, are Naruto movies and Boruto, a sequel to Naruto. 

The choice of model based on content will depend on what the user is more interested in. Option one can be used if the user wants to watch more of the same show they enjoyed, such as sequels if there are any, or movies that fill the gaps between seasons. 
If the user is looking for something new, but in the same vein as a comfort show such as Naruto, option 2 will recommend different shows that share some of the themes and elements in the show but with new characters and stories. 