Uses data from https://www.kaggle.com/CooperUnion/anime-recommendations-database/version/1

# Anime Recommender

This model recommends anime to users based on the cosine similarity scores of anime names, genres, and synopsis.

# Libraries

In [1]:
import os
import numpy as np
import pandas as pd

import warnings

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

warnings.filterwarnings("always")
warnings.filterwarnings("ignore")

# Importing data

In [2]:
anime_synop_df = pd.read_csv('../input/anime-recommendation-database-2020/anime_with_synopsis.csv')
anime_complete_df = pd.read_csv('../input/anime-recommendation-database-2020/anime.csv')

## Getting to know the data

The anime dataset has columns for `MAL_ID`, `Name`, `Genre`, `Score`, `synopsis`. We are interested in `Name`, `Genres`, and `synopsis`.

In [3]:
anime_synop_df.head(2)

In [4]:
anime_synop_df.info()

In [5]:
anime_synop_df = anime_synop_df[['MAL_ID', 'Name', 'Genres', 'sypnopsis']]

In [6]:
anime_df = anime_synop_df.join(anime_complete_df, on='MAL_ID', rsuffix='r')

In [7]:
anime_df = anime_df[['Name', 'Genres', 'sypnopsis', 'Type']]

In [8]:
anime_df.columns = ['Name', 'Genres', 'Synopsis', 'Type']

In [9]:
anime_df

# Dealing with missing values

## Anime missing values

In [10]:
anime_df.isnull().sum().sort_values(ascending=False)

In [11]:
print(anime_df['Synopsis'].mode()[0])
print(anime_df['Type'].mode()[0])

Seems like the default message for no synopsis is 'No synopsis information has been added to this title. Help improve our database by adding a synopsis here .'. We will replace null values in `Synopsis` with this. For type, we fill in with TV.

In [12]:
anime_df['Synopsis'] = anime_df['Synopsis'].fillna(
    anime_df['Synopsis'].dropna().mode().values[0]
)
anime_df['Type'] = anime_df['Type'].fillna(
    anime_df['Type'].dropna().mode().values[0]
)

 Verify that we removed all nan values

In [13]:
anime_df.isnull().sum()

Only care about animes, so type being TV. Others can be dropped


In [15]:
anime_df = anime_df[anime_df['Type']=='TV']

In [18]:
anime_df.drop('Type', axis=1, inplace=True)

# Building the recommender

We will be using `name` and `genre` of each anime. We will use Tf-idf on the names of anime and find the count frequency on the genres of anime. Then, we will use cosine similarity to compute the similarities of the two frequency matrices. Finally, we use the average of the two similarity scores per anime to find recommendations.

In [20]:
indices = pd.Series(anime_df.index, index = anime_df['Name'])

In [21]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf2 = TfidfVectorizer(stop_words='english')
count = CountVectorizer(stop_words='english')

In [22]:
tfidf_matrix = tfidf.fit_transform(anime_df['Name'])
count_matrix = count.fit_transform(anime_df['Genres'])
tfidf2_matrix = tfidf2.fit_transform(anime_df['Synopsis'])

In [23]:
name_similarity = cosine_similarity(tfidf_matrix)
genre_similarity = cosine_similarity(count_matrix)
synopsis_similarity = cosine_similarity(tfidf2_matrix)

# Recommendations

In [26]:
def get_recommendations(anime):
    i = indices[anime]
    
    name_score = list(enumerate(name_similarity[i]))
    genre_score = list(enumerate(genre_similarity[i]))
    synopsis_score = list(enumerate(synopsis_similarity[i]))
    
    name_score = sorted(name_score, key = lambda x: x[0])
    genre_score = sorted(genre_score, key = lambda x: x[0])
    synopsis_score = sorted(synopsis_score, key = lambda x: x[0])
    
    combined_score = [(i, (sc_1 + sc_2 + sc_3) / 3) for (i, sc_1), (_, sc_2), (_, sc_3) in zip(name_score, genre_score, synopsis_score)]
    
    combined_score = sorted(combined_score, key = lambda x: x[1], reverse = True)
    
    anime_ids = [i[0] for i in combined_score[1:11]]
    
    anime_recs = []
    
    index = 0
    while len(anime_recs) != 10:
        anime_id = combined_score[1:][index][0]
        index += 1
        if anime in indices.iloc[[anime_id]].index[0]:
            continue
        else:
            anime_recs.append(indices.iloc[[anime_id]].index[0])
    
    
    print(f'If you liked {anime}, you should try:')
    for i, v in list(enumerate(anime_recs)):
        print(f'{i + 1}. {v}')

In [27]:
get_recommendations('Dragon Ball')