# Anime Recommendation: Content-Based System

## Overview
This project aims to build an anime recommendation system for new members and current subscribers of an anime streaming service. New members can use a content-based approach to receive recommendations based on a show they may have watched or heard of previosuly. For current subscribers, collaborative filtering is used by comparing the users' ratings and returning shows similar users have rated similarly. 

## Business Understanding
The anime industry is a rapidly growing market, with new shows being released all the time. This can make it difficult for anime fans to find new shows to watch that they will enjoy. Additionally, most streaming services do not offer personalized recommendations, which can lead to users wasting time scrolling through an endless list of shows that they are not interested in.
With this project, I aim to build a recommendation system that will help anime fans discover new shows that they will enjoy. The recommendation system will use a variety of factors to make recommendations, including the user's past viewing history, the user's ratings of other shows, and the user's genre preferences. 
This recommendation system  will give a curated list to its users based on content preference and similar user's pick that will save time and provide a superb experience both novel and familiar to users. 


## Content Based
Content-based filtering is a technique that recommends items to users based on the content of those items. In the context of anime recommendation, this would involve extracting features from anime, such as the genre, plot, characters, and art style, and then recommending anime to users that are similar to anime that they have already watched.

In [79]:
import numpy as np
import pandas as pd
import re
import os
import warnings
warnings.filterwarnings('ignore')
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

In [106]:
# importing dataset
df = pd.read_csv('Data/anime_cleaned.csv')
df

Unnamed: 0.1,Unnamed: 0,uid,title,synopsis,genre,episodes,members,score
0,0,28891,Haikyuu!! Second Season,Following their participation at the Inter-Hig...,"['Comedy', 'Sports', 'Drama', 'School', 'Shoun...",25.0,489888,8.82
1,1,23273,Shigatsu wa Kimi no Uso,Music accompanies the path of the human metron...,"['Drama', 'Music', 'Romance', 'School', 'Shoun...",22.0,995473,8.83
2,2,34599,Made in Abyss,The Abyss—a gaping chasm stretching down into ...,"['Sci-Fi', 'Adventure', 'Mystery', 'Drama', 'F...",13.0,581663,8.83
3,3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, someth...","['Action', 'Military', 'Adventure', 'Comedy', ...",64.0,1615084,9.23
4,4,31758,Kizumonogatari III: Reiketsu-hen,After helping revive the legendary vampire Kis...,"['Action', 'Mystery', 'Supernatural', 'Vampire']",1.0,214621,8.83
...,...,...,...,...,...,...,...,...
15148,19002,10075,Naruto x UT,All-new animation offered throughout UNIQLO cl...,"['Action', 'Comedy', 'Super Power', 'Martial A...",1.0,34155,7.50
15149,19003,35828,Miira no Kaikata,High school student Sora Kashiwagi is accustom...,"['Slice of Life', 'Comedy', 'Supernatural']",12.0,61459,7.50
15150,19004,10378,Shinryaku!? Ika Musume,"After regaining her squid-like abilities, Ika ...","['Slice of Life', 'Comedy', 'Shounen']",12.0,67422,7.56
15151,19005,33082,Kingsglaive: Final Fantasy XV,"For years, the Niflheim Empire and the kingdom...",['Action'],1.0,41077,7.56


## Content-Based With Vectorizer and Linear Kernel
Simple recommender recommends shows similar to title imputed
- for cold start
- one show watched/like and returns similar shows based on show description

In [107]:
# instantiating vectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english')

tfdif_matrix = tf.fit_transform(df['synopsis'])

tfdif_matrix.shape

(15153, 443528)

In [108]:
# linear kernel for similarity
cosine_sim = linear_kernel(tfdif_matrix, tfdif_matrix)

In [109]:
indices = pd.Series(df.index, index=df['title'])

In [110]:
# function to show recommendations

def show_rec(name, cosine_sim=cosine_sim):
    
    idx = indices[name]
    print(f"Title: {df['title'].iloc[idx]} | Genre: {df['genre'].iloc[idx]}")
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    show_indices = [i[0] for i in sim_scores]
    print(df[['title','score', 'episodes']].iloc[show_indices])

In [94]:
show_rec('Naruto')

Title: Naruto | Genre: ['Action', 'Adventure', 'Comedy', 'Super Power', 'Martial Arts', 'Shounen']
                                                   title  score  episodes
472                                   Naruto: Shippuuden   8.20     500.0
904           Naruto: Shippuuden Movie 6 - Road to Ninja   7.77       1.0
3743                            Boruto: Naruto the Movie   7.71       1.0
8867         Naruto: Shippuuden Movie 4 - The Lost Tower   7.51       1.0
5633   Naruto: Dai Katsugeki!! Yuki Hime Shinobu Houj...   6.93       1.0
14850  Naruto: Shippuuden - Shippuu! "Konoha Gakuen" Den   7.23       1.0
14898   Naruto SD: Rock Lee no Seishun Full-Power Ninden   7.22      51.0
799                           The Last: Naruto the Movie   7.85       1.0
5084           Naruto: Shippuuden Movie 5 - Blood Prison   7.56       1.0
10183          Naruto: Akaki Yotsuba no Clover wo Sagase   6.56       1.0


In [95]:
show_rec('Death Note')

Title: Death Note | Genre: ['Mystery', 'Police', 'Psychological', 'Supernatural', 'Thriller', 'Shounen']
                              title  score  episodes
975             Death Note: Rewrite   7.78       2.0
200                      Soul Eater   7.99      51.0
4563                Yami no Matsuei   7.16      13.0
4828     YAT Anshin! Uchuu Ryokou 2   7.02      25.0
12539                Kite Liberator   6.40       1.0
1846    Ayatsuri Haramase DreamNote   6.23       2.0
5285           Shinigami no Ballad.   6.97       6.0
6912             Dia Horizon (Kabu)   4.68      12.0
15067  Bleach: Memories in the Rain   7.20       1.0
8562                Ghost Messenger   6.70       6.0


## Recommendation
This model returns 10 shows similar to the show entered with the corresponding row numbers, the score it received and the number of episodes.

## One Hot Encoded genres, cosine similarity
- Content Based Option 2

In [96]:
genre_df = pd.read_csv('Data/one_hot_genre.csv')
genre_df

Unnamed: 0.1,Unnamed: 0,uid,title,synopsis,genre,aired,episodes,members,popularity,ranked,...,Shounen Ai,Slice of Life,Space,Sports,Super Power,Supernatural,Thriller,Vampire,Yaoi,Yuri
0,0,28891,Haikyuu!! Second Season,Following their participation at the Inter-Hig...,"['Comedy', 'Sports', 'Drama', 'School', 'Shoun...","Oct 4, 2015 to Mar 27, 2016",25.0,489888,141,25.0,...,0,0,0,1,0,0,0,0,0,0
1,1,23273,Shigatsu wa Kimi no Uso,Music accompanies the path of the human metron...,"['Drama', 'Music', 'Romance', 'School', 'Shoun...","Oct 10, 2014 to Mar 20, 2015",22.0,995473,28,24.0,...,0,0,0,0,0,0,0,0,0,0
2,2,34599,Made in Abyss,The Abyss—a gaping chasm stretching down into ...,"['Sci-Fi', 'Adventure', 'Mystery', 'Drama', 'F...","Jul 7, 2017 to Sep 29, 2017",13.0,581663,98,23.0,...,0,0,0,0,0,0,0,0,0,0
3,3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, someth...","['Action', 'Military', 'Adventure', 'Comedy', ...","Apr 5, 2009 to Jul 4, 2010",64.0,1615084,4,1.0,...,0,0,0,0,0,0,0,0,0,0
4,4,31758,Kizumonogatari III: Reiketsu-hen,After helping revive the legendary vampire Kis...,"['Action', 'Mystery', 'Supernatural', 'Vampire']","Jan 6, 2017",1.0,214621,502,22.0,...,0,0,0,0,0,1,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15148,19002,10075,Naruto x UT,All-new animation offered throughout UNIQLO cl...,"['Action', 'Comedy', 'Super Power', 'Martial A...","Jan 1, 2011",1.0,34155,2382,1728.0,...,0,0,0,0,1,0,0,0,0,0
15149,19003,35828,Miira no Kaikata,High school student Sora Kashiwagi is accustom...,"['Slice of Life', 'Comedy', 'Supernatural']","Jan 12, 2018 to Mar 30, 2018",12.0,61459,1648,1727.0,...,0,1,0,0,0,1,0,0,0,0
15150,19004,10378,Shinryaku!? Ika Musume,"After regaining her squid-like abilities, Ika ...","['Slice of Life', 'Comedy', 'Shounen']","Sep 27, 2011 to Dec 25, 2011",12.0,67422,1547,1548.0,...,0,1,0,0,0,0,0,0,0,0
15151,19005,33082,Kingsglaive: Final Fantasy XV,"For years, the Niflheim Empire and the kingdom...",['Action'],"Jul 9, 2016",1.0,41077,2154,1544.0,...,0,0,0,0,0,0,0,0,0,0


In [97]:
genre_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15153 entries, 0 to 15152
Data columns (total 54 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Unnamed: 0     15153 non-null  int64  
 1   uid            15153 non-null  int64  
 2   title          15153 non-null  object 
 3   synopsis       15153 non-null  object 
 4   genre          15153 non-null  object 
 5   aired          15153 non-null  object 
 6   episodes       15153 non-null  float64
 7   members        15153 non-null  int64  
 8   popularity     15153 non-null  int64  
 9   ranked         15153 non-null  float64
 10  score          15153 non-null  float64
 11  Action         15153 non-null  int64  
 12  Adventure      15153 non-null  int64  
 13  Cars           15153 non-null  int64  
 14  Comedy         15153 non-null  int64  
 15  Dementia       15153 non-null  int64  
 16  Demons         15153 non-null  int64  
 17  Drama          15153 non-null  int64  
 18  Ecchi 

In [98]:
genre_df.columns

Index(['Unnamed: 0', 'uid', 'title', 'synopsis', 'genre', 'aired', 'episodes',
       'members', 'popularity', 'ranked', 'score', 'Action', 'Adventure',
       'Cars', 'Comedy', 'Dementia', 'Demons', 'Drama', 'Ecchi', 'Fantasy',
       'Game', 'Harem', 'Hentai', 'Historical', 'Horror', 'Josei', 'Kids',
       'Magic', 'Martial Arts', 'Mecha', 'Military', 'Music', 'Mystery',
       'Parody', 'Police', 'Psychological', 'Romance', 'Samurai', 'School',
       'Sci-Fi', 'Seinen', 'Shoujo', 'Shoujo Ai', 'Shounen', 'Shounen Ai',
       'Slice of Life', 'Space', 'Sports', 'Super Power', 'Supernatural',
       'Thriller', 'Vampire', 'Yaoi', 'Yuri'],
      dtype='object')

In [99]:
features = ['episodes', 'popularity', 'score', 'Action', 'Adventure', 'Cars',
       'Comedy', 'Dementia', 'Demons', 'Drama', 'Ecchi', 'Fantasy', 'Game',
       'Harem', 'Historical', 'Horror', 'Josei', 'Kids', 'Magic',
       'Martial Arts', 'Mecha', 'Military', 'Music', 'Mystery', 'Parody',
       'Police', 'Psychological', 'Romance', 'Samurai', 'School', 'Sci-Fi',
       'Seinen', 'Shoujo', 'Shoujo Ai', 'Shounen', 'Shounen Ai',
       'Slice of Life', 'Space', 'Sports', 'Super Power', 'Supernatural',
       'Thriller', 'Vampire']

In [100]:
content_df = genre_df[features]
content_df

Unnamed: 0,episodes,popularity,score,Action,Adventure,Cars,Comedy,Dementia,Demons,Drama,...,Shoujo Ai,Shounen,Shounen Ai,Slice of Life,Space,Sports,Super Power,Supernatural,Thriller,Vampire
0,25.0,141,8.82,0,0,0,1,0,0,1,...,0,1,0,0,0,1,0,0,0,0
1,22.0,28,8.83,0,0,0,0,0,0,1,...,0,1,0,0,0,0,0,0,0,0
2,13.0,98,8.83,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,64.0,4,9.23,1,1,0,1,0,0,1,...,0,1,0,0,0,0,0,0,0,0
4,1.0,502,8.83,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15148,1.0,2382,7.50,1,0,0,1,0,0,0,...,0,1,0,0,0,0,1,0,0,0
15149,12.0,1648,7.50,0,0,0,1,0,0,0,...,0,0,0,1,0,0,0,1,0,0
15150,12.0,1547,7.56,0,0,0,1,0,0,0,...,0,1,0,1,0,0,0,0,0,0
15151,1.0,2154,7.56,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [102]:
sim = cosine_similarity(content_df)
sim

array([[1.        , 0.86820285, 0.99837093, ..., 0.98427982, 0.98295126,
        0.98336835],
       [0.86820285, 1.        , 0.85200872, ..., 0.76759192, 0.76290634,
        0.76458632],
       [0.99837093, 0.85200872, 1.        , ..., 0.98854292, 0.98750135,
        0.98801588],
       ...,
       [0.98427982, 0.76759192, 0.98854292, ..., 1.        , 0.99997173,
        0.99996894],
       [0.98295126, 0.76290634, 0.98750135, ..., 0.99997173, 1.        ,
        0.9999837 ],
       [0.98336835, 0.76458632, 0.98801588, ..., 0.99996894, 0.9999837 ,
        1.        ]])

In [103]:
def show_rec2(title, cosine_sim=sim):
    
    idx = indices[title]
    
    print(f"Title: {genre_df['title'].iloc[idx]} | Genre: {genre_df['genre'].iloc[idx]}")
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    show_indices = [i[0] for i in sim_scores]
    return pd.DataFrame({'Anime Name': df['title'].iloc[show_indices].values,
                        'Score': df['score'].iloc[show_indices].values,
                        'Number Of Episodes': df['episodes'].iloc[show_indices].values})


In [104]:
show_rec2('Naruto')

Title: Naruto | Genre: ['Action', 'Adventure', 'Comedy', 'Super Power', 'Martial Arts', 'Shounen']


Unnamed: 0,Anime Name,Score,Number Of Episodes
0,Naruto: Shippuuden,8.2,500.0
1,Bleach,7.87,366.0
2,Fairy Tail,7.93,175.0
3,Hunter x Hunter (2011),9.11,148.0
4,Fullmetal Alchemist: Brotherhood,9.23,64.0
5,Death Note,8.65,37.0
6,Dragon Ball Z,8.27,291.0
7,Sword Art Online,7.49,25.0
8,Shingeki no Kyojin,8.47,25.0
9,Steins;Gate,9.11,24.0


## Recommendation
This second recommendation option, uses cosine similary and returns the same features for the shows, with a slightly different selection because it is using features other than the description of the show. As we saw with the first option, the recommendations similar to Naruto, are Naruto movies and Boruto, a sequel to Naruto. 

The choice of model based on content will depend on what the user is more interested in. Option one can be used if the user wants to watch more of the same show they enjoyed, such as sequels if there are any, or movies that fill the gaps between seasons. 
If the user is looking for something new, but in the same vein as a comfort show such as Naruto, option 2 will recommend different shows that share some of the themes and elements in the show but with new characters and stories. 