# Training Content Based Filtering Model

* Importing the necessary libraries:

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

* Importing the dataset:

In [2]:
anime = pd.read_csv('data/cleaned/anime.csv')
rating = pd.read_csv('data/cleaned/rating.csv')

* TF-IDF Vectorizer: TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a technique to quantify a word in documents, we generally compute a weight to each word which signifies the importance of the word in the document and corpus. We will use this method to get numeric vectors for each anime, which we can use to find similar contents.

In [3]:
genres_str = anime['genre'].str.split(',').astype(str)

tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 4), min_df=0)
tfidf_matrix = tfidf.fit_transform(genres_str)

tfidf_matrix.shape

(9211, 5155)

We are using the combinations of genre up to 4 elements to get the similarity between the contents. Here are the examples of the combinations:

In [5]:
tfidf.get_feature_names_out()[:10]

array(['action', 'action adventure', 'action adventure cars',
       'action adventure cars comedy', 'action adventure cars mecha',
       'action adventure cars sci', 'action adventure comedy',
       'action adventure comedy demons', 'action adventure comedy drama',
       'action adventure comedy ecchi'], dtype=object)

* Calculating the cosine similarity between each anime pair:

In [6]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

* Creating a function to get recommendations: Following function will take the name of the anime and will return the top 10 similar anime recommendations. If highest rating flag is set to True, it will return the top 10 highest rated anime from the similar anime list. If similarity flag is set to True, it will show the similarity score of each anime with the given anime.

In [9]:
indices = pd.Series(anime.index, index=anime['name'])

def get_recommendations(title, highest_rating=False, similarity=False):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]

    anime_indices = [i[0] for i in sim_scores]

    if similarity == False:
        
        result_df = pd.DataFrame({'Anime name': anime['name'].iloc[anime_indices].values,
                                'Type': anime['type'].iloc[anime_indices].values,
                                'Rating': anime['rating'].iloc[anime_indices].values})
        
    elif similarity == True:
        similarity_ = [i[1] for i in sim_scores]

        result_df = pd.DataFrame({'Anime name': anime['name'].iloc[anime_indices].values,
                                'Similarity': similarity_,
                                'Type': anime['type'].iloc[anime_indices].values,
                                'Rating': anime['rating'].iloc[anime_indices].values})
    if highest_rating == True:
        return result_df.sort_values('Rating', ascending=False)
    else:
        return result_df

* Here are some examples of the recommendations:

In [15]:
get_recommendations('Death Note', highest_rating=True, similarity=True)

Unnamed: 0,Anime name,Similarity,Type,Rating
7,Monster,0.291949,TV,8.72
1,Higurashi no Naku Koro ni Kai,0.468286,TV,8.41
2,Higurashi no Naku Koro ni,0.395048,TV,8.17
8,Jigoku Shoujo Mitsuganae,0.278119,TV,7.81
0,Mousou Dairinin,0.879472,TV,7.74
4,Shigofumi,0.350125,TV,7.62
3,Higurashi no Naku Koro ni Rei,0.380282,OVA,7.56
5,Himitsu: The Revelation,0.323786,TV,7.42
6,Hikari to Mizu no Daphne,0.291976,TV,6.87
9,Saint Luminous Jogakuin,0.278119,TV,6.17


In [25]:
get_recommendations('Ao Haru Ride', highest_rating=True, similarity=True)

Unnamed: 0,Anime name,Similarity,Type,Rating
1,Kimi ni Todoke,0.8127,TV,8.19
2,Kimi ni Todoke 2nd Season,0.8127,TV,8.17
9,Hana yori Dango,0.608665,TV,7.9
4,Tonari no Kaibutsu-kun,0.726015,TV,7.77
3,Ao Haru Ride OVA,0.8127,OVA,7.76
0,Kareshi Kanojo no Jijou,1.0,TV,7.66
5,Nijiiro Days,0.726015,TV,7.52
6,Nijiiro Days OVA,0.726015,OVA,6.73
7,Chou Kuse ni Narisou,0.726015,TV,6.59
8,Good Morning Call,0.721166,OVA,6.26
