<a href="https://colab.research.google.com/github/Sre-n/NeuroNexus/blob/main/recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**ANIME RECOMMENDATION SYSTEM**



Anime Recommendation System emerges as a pivotal tool for enthusiasts. This system, a convergence of advanced technology and the artistry inherent in anime, serves as a discerning guide through the intricate world of animated storytelling.

1. Content Filtering
2. Collaborative Filtering

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
directory_path = '/content/drive/MyDrive/dataset/anime recommendation'

**About Dataset**

This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

**Anime.csv**

    anime_id - myanimelist.net's unique id identifying an anime.
    name - full name of anime.
    genre - comma separated list of genres for this anime.
    type - movie, TV, OVA, etc.
    episodes - how many episodes in this show. (1 if movie).
    rating - average rating out of 10 for this anime.
    members - number of community members that are in this anime's
    "group".


**Rating.csv**

    user_id - non identifiable randomly generated user id.
    anime_id - the anime that this user has rated.
    rating - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).


Load anime dataset into pandas DataFrame

In [5]:
import pandas as pd
metadata = pd.read_csv('/content/drive/MyDrive/dataset/anime recommendation/anime.csv', low_memory=False)
metadata.head(3)

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262


In [6]:
# calculate the value of C, the mean rating across all movies using the pandas .mean() function:
C = metadata['rating'].mean()
print(C)

6.473901690981432


In [7]:
# Convert the 'episodes' column to numeric values
metadata['episodes'] = pd.to_numeric(metadata['episodes'], errors='coerce')

# Drop rows with NaN values in the 'episodes' column
metadata = metadata.dropna(subset=['episodes'])

# Calculate the minimum number of votes required to be in the chart, m
m = metadata['episodes'].quantile(0.90)
print(m)


26.0


In [8]:
# Filter out all qualified movies into a new DataFrame
q_movies = metadata.copy().loc[metadata['episodes'] >= m]
q_movies.shape


(1659, 7)

In [9]:
metadata.shape


(11954, 7)

In [10]:
# Function that computes the weighted rating of each movie
def weighted_rating(x, m=m, C=C):
    v = x['episodes']
    R = x['rating']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)


In [11]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)


In [12]:
#Sort movies based on score calculated above
q_movies = q_movies.sort_values('score', ascending=False)

#Print the top 15 movies
q_movies[['name', 'episodes', 'rating', 'score']].head(20)


Unnamed: 0,name,episodes,rating,score
12,Gintama,201.0,9.04,8.746086
6,Hunter x Hunter (2011),148.0,9.13,8.733112
7,Ginga Eiyuu Densetsu,110.0,9.11,8.60604
1,Fullmetal Alchemist: Brotherhood,64.0,9.26,8.455127
2,Gintama°,51.0,9.25,8.312616
4,Gintama&#039;,51.0,9.16,8.253006
20,Hajime no Ippo,75.0,8.83,8.22348
206,Dragon Ball Z,291.0,8.32,8.168585
175,Katekyo Hitman Reborn!,203.0,8.37,8.154722
69,Uchuu Kyoudai,99.0,8.59,8.149852


**Content-Based Recommender**

 A system that recommends animes that are similar to a particular anime. To achieve this compute pairwise cosine similarity scores for all animes based on their genre and recommend based on that similarity score threshold.


In [13]:
#Print plot overviews of the first 5 movies.
metadata['genre'].head()


0                 Drama, Romance, School, Supernatural
1    Action, Adventure, Drama, Fantasy, Magic, Mili...
2    Action, Comedy, Historical, Parody, Samurai, S...
3                                     Sci-Fi, Thriller
4    Action, Comedy, Historical, Parody, Samurai, S...
Name: genre, dtype: object

**Natural Language Processing problem**

Extract some kind of features from the above text data before you can compute the similarity and/or dissimilarity between them. Compute the word vectors of each overview or document,word vectors are vectorized representation of words in a document. The vectors carry a semantic meaning with it





Compute **Term Frequency-Inverse Document Frequency** (TF-IDF) vectors for each document. This will give you a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document), and each column represents a movie, as before. frequency of a word occurring in a document, down-weighted by the number of documents in which it occurs. This is done to reduce the importance of words that frequently occur in plot overviews and, therefore, their significance in computing the final similarity score.

scikit-learn gives you a built-in TfIdfVectorizer class

In [14]:
#Import TfIdfVectorizer from scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer

#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'
tfidf = TfidfVectorizer(stop_words='english')

#Replace NaN with an empty string
metadata['genre'] = metadata['genre'].fillna('')

#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(metadata['genre'])

#Output the shape of tfidf_matrix
tfidf_matrix.shape


(11954, 46)

In [15]:
#Array mapping from feature integer indices to feature name.
# Assuming tfidf is your TfidfVectorizer object
feature_names = tfidf.get_feature_names_out()
# Print the first 10 feature names as an example
print(feature_names[:10])


['action' 'adventure' 'ai' 'arts' 'cars' 'comedy' 'dementia' 'demons'
 'drama' 'ecchi']


 **TF-IDF vectorizer**, calculating the dot product between each vector will directly give you the cosine similarity score. Therefore, you will use sklearn's linear_kernel() instead of cosine_similarities() since it is faster.

In [16]:
# Import linear_kernel
from sklearn.metrics.pairwise import linear_kernel

# Compute the cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


In [17]:
cosine_sim.shape


(11954, 11954)

In [18]:
cosine_sim[1]


array([0.14701722, 1.        , 0.17842736, ..., 0.        , 0.        ,
       0.        ])

In [19]:
#Construct a reverse map of indices and movie titles
indices = pd.Series(metadata.index, index=metadata['name']).drop_duplicates()


In [20]:
indices[:10]


name
Kimi no Na wa.                                               0
Fullmetal Alchemist: Brotherhood                             1
Gintama°                                                     2
Steins;Gate                                                  3
Gintama&#039;                                                4
Haikyuu!!: Karasuno Koukou VS Shiratorizawa Gakuen Koukou    5
Hunter x Hunter (2011)                                       6
Ginga Eiyuu Densetsu                                         7
Gintama Movie: Kanketsu-hen - Yorozuya yo Eien Nare          8
Gintama&#039;: Enchousen                                     9
dtype: int64

In [21]:
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(name, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = indices[name]

    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return metadata['name'].iloc[movie_indices]


In [22]:
get_recommendations('Gintama')


4                                          Gintama&#039;
8      Gintama Movie: Kanketsu-hen - Yorozuya yo Eien...
9                               Gintama&#039;: Enchousen
12                                               Gintama
63           Gintama: Yorinuki Gintama-san on Theater 2D
65                Gintama Movie: Shinyaku Benizakura-hen
216                     Gintama: Shinyaku Benizakura-hen
306                     Gintama: Jump Festa 2014 Special
380    Gintama: Nanigoto mo Saiyo ga Kanjin nano de T...
361                     Gintama: Jump Festa 2015 Special
Name: name, dtype: object

In [23]:
get_recommendations('One Punch Man')


754                        One Punch Man Specials
770                   One Punch Man: Road to Hero
447               Tentai Senshi Sunred 2nd Season
580                          Tentai Senshi Sunred
4471           Tentai Senshi Sunred: Short Corner
557                                      Gungrave
7363     Sentou Yousei Shoujo Tasukete! Mave-chan
6995                   Ippatsu Hicchuu!! Devander
4785                                    Tokyo ESP
10657                           Ultraman Graffiti
Name: name, dtype: object

**Conclusion**

In the realm of anime, the recommender system emerges not merely as a technological feature but as a cultural curator, shaping the viewing experiences of enthusiasts. This journey into crafting an anime recommender system has been a testament to the marriage of data science and fandom passion. As the algorithmic gears turn, it's clear that the fusion of metadata intricacies — from top actors to plot keywords — fuels a recommendation engine that transcends mere predictability.