# Recommendation System

## **Data Description**

- Unique ID of each anime.
- Anime title.
- Anime broadcast type, such as TV, OVA, etc.
- anime genre.
- The number of episodes of each anime.
- The average rating for each anime compared to the number of users who gave ratings.
- Number of community members for each anime.


**Objective**
- The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset.


# Import the Libraries and Load Dataset

In [18]:
import pandas as pd

# Load the dataset
anime_data = pd.read_csv('anime.csv')

# Fill missing ratings with the median ratings
anime_data['rating'] = anime_data['rating'].fillna(anime_data['rating'].median())

## Data Preprocessing

In [19]:
anime_data.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [20]:
anime_data.shape

(12294, 7)

In [21]:
anime_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [22]:
anime_data.isnull().sum()

anime_id     0
name         0
genre       62
type        25
episodes     0
rating       0
members      0
dtype: int64

In [23]:
anime_data.describe()

Unnamed: 0,anime_id,rating,members
count,12294.0,12294.0,12294.0
mean,14058.221653,6.4757,18071.34
std,11455.294701,1.017179,54820.68
min,1.0,1.67,5.0
25%,3484.25,5.9,225.0
50%,10260.5,6.57,1550.0
75%,24794.5,7.17,9437.0
max,34527.0,10.0,1013917.0


# Finding Cosine Similarity based on ratings

In [24]:
from sklearn.metrics.pairwise import cosine_similarity

# Creating a matrix where each row is an anime and each column is the rating (since we have only one attribute, the matrix will be 1D in features)
ratings_matrix = anime_data[['rating']]

# Computer the cosine similarity matrix from the ratings matrix
similarity_matrix = cosine_similarity(ratings_matrix)

# Convert the similarity matrix to a Dataframe for better readability
similarity_df = pd.DataFrame(similarity_matrix, index=anime_data['name'], columns=anime_data['name'])

In [25]:
ratings_matrix

Unnamed: 0,rating
0,9.37
1,9.26
2,9.25
3,9.17
4,9.16
...,...
12289,4.15
12290,4.28
12291,4.88
12292,4.98


In [26]:
def get_similar_anime(anime_name, similarity_data, top_n=10):
  if anime_name not in similarity_data.index:
    return 'Anime not found in the dataset.'

  # Get similarity scores for the given anime with all other
  similarity_scores = similarity_data.loc[anime_name]

  # Sort the scores in descending order
  similarity_scores = similarity_scores.sort_values(ascending = False)

  # Return the top n most similar anime
  return similarity_scores.head(top_n + 1)[1:] # plus one because the first entry will be anime iteself with a similarity of 1

In [27]:
# display only ratings without much info

recommended_animes = get_similar_anime('Naruto', similarity_df)
recommended_animes

name
Taku Boda                                      1.0
Backkom Mission Impossible                     1.0
Backkom Specials                               1.0
Backstage Idol Story                           1.0
Bad Badtz-Maru no Ari to Kirigirisu            1.0
Bad Badtz-Maru no Ookami ga Kita!              1.0
Bad Badtz-Maru no Ore no Pochi wa Sekaiichi    1.0
Bad Badtz-Maru no Ore wa Yuutousei             1.0
Bad Badtz-Maru no Otoko Dokyou no Omoiyari     1.0
Baka Mukashibanashi Movie: Jijii Wars          1.0
Name: Naruto, dtype: float64

In [28]:
recommended_animes = get_similar_anime('Death Note', similarity_df)
recommended_animes

name
Taku Boda                                      1.0
Backkom Mission Impossible                     1.0
Backkom Specials                               1.0
Backstage Idol Story                           1.0
Bad Badtz-Maru no Ari to Kirigirisu            1.0
Bad Badtz-Maru no Ookami ga Kita!              1.0
Bad Badtz-Maru no Ore no Pochi wa Sekaiichi    1.0
Bad Badtz-Maru no Ore wa Yuutousei             1.0
Bad Badtz-Maru no Otoko Dokyou no Omoiyari     1.0
Baka Mukashibanashi Movie: Jijii Wars          1.0
Name: Death Note, dtype: float64

- Above Recommedation system with Rating based similarity doesn't give good recommendation.
- Hence, lets find genre based similarity

# Finding Cosine Similarity based on Genre

In [29]:
# user genre also

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Assuming 'genre' column is clean and ready  to use
tfidf = TfidfVectorizer(stop_words='english')
anime_data['genre'] = anime_data['genre'].fillna('')  # Fill missing values with empty string
tfidf_matrix=tfidf.fit_transform(anime_data['genre'])

# Compute the cosine similarity matrix form the TF-IDF vectors
genre_similarity_matrix = linear_kernel(tfidf_matrix, tfidf_matrix)

# Convert to DataFrame for better handling
genre_similarity_df = pd.DataFrame(genre_similarity_matrix, index=anime_data['name'], columns=anime_data['name'])

# Use this new genre based similarity along with the original ratings-based similarity
combined_similarity = (similarity_df + genre_similarity_df)/2

# Recommendation function can now use this combined similarity
recommended_animes = get_similar_anime('Naruto', combined_similarity)
recommended_animes


name
Naruto                                                                       1.000000
Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi                      1.000000
Naruto: Shippuuden Movie 4 - The Lost Tower                                  1.000000
Boruto: Naruto the Movie                                                     1.000000
Naruto: Shippuuden                                                           1.000000
Naruto Soyokazeden Movie: Naruto to Mashin to Mitsu no Onegai Dattebayo!!    1.000000
Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono                        1.000000
Naruto Shippuuden: Sunny Side Battle                                         1.000000
Kyutai Panic Adventure!                                                      0.990346
Naruto: Shippuuden Movie 6 - Road to Ninja                                   0.973593
Name: Naruto, dtype: float64

In [30]:
recommended_animes = get_similar_anime('Death Note', combined_similarity)
recommended_animes


name
Death Note                                  1.000000
Mousou Dairinin                             0.983852
Higurashi no Naku Koro ni Kai               0.939757
Higurashi no Naku Koro ni Rei               0.930528
Mirai Nikki (TV)                            0.907714
Mirai Nikki (TV): Ura Mirai Nikki           0.900327
Higurashi no Naku Koro ni                   0.897207
Monster                                     0.895435
AD Police                                   0.877512
Higurashi no Naku Koro ni Kaku: Outbreak    0.864441
Name: Death Note, dtype: float64

In [31]:
# Experimenting with different threshold values for similarity scores to adjust the recommendation list size.

def get_similar_anime(anime_name, similarity_data, threshold=0.75, top_n=10):
  if anime_name not in similarity_data.index:
    return 'Anime not found in the dataset.'

  # Get similarity scores for the given anime with all other
  similarity_scores = similarity_data.loc[anime_name]

  # Filter out scores below the threshold
  filtered_scores = similarity_scores[similarity_scores > threshold]

  # Sort the scores in descending order
  filtered_scores = filtered_scores.sort_values(ascending = False)

  # Return the top n most similar anime
  return filtered_scores.head(top_n + 1)[1:] # plus one because the first entry will be anime iteself with a similarity of 1

recommended_animes = get_similar_anime('Naruto', combined_similarity, threshold=0.85)
print("Recommendations with threshold 0.85:")
print(recommended_animes)

recommended_animes = get_similar_anime('Naruto', combined_similarity, threshold=0.95)
print("\nRecommendations with threshold 0.95:")
print(recommended_animes)

recommended_animes = get_similar_anime('Death Note', combined_similarity, threshold=0.75)
print("\nRecommendations for 'Death Note' with threshold 0.75:")
print(recommended_animes)


Recommendations with threshold 0.85:
name
Naruto: Shippuuden                                         1.000000
Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono      1.000000
Naruto: Shippuuden Movie 4 - The Lost Tower                1.000000
Naruto x UT                                                1.000000
Naruto Shippuuden: Sunny Side Battle                       1.000000
Naruto                                                     1.000000
Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi    1.000000
Boruto: Naruto the Movie                                   1.000000
Kyutai Panic Adventure!                                    0.990346
Rekka no Honoo                                             0.973593
Name: Naruto, dtype: float64

Recommendations with threshold 0.95:
name
Boruto: Naruto the Movie - Naruto ga Hokage ni Natta Hi                      1.000000
Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono                        1.000000
Naruto Soyokazeden Movie: Naruto t

# Threshold

 - `threshold` in the code above refers to the minimum similarity score that an anime must have with the target anime in order to be included in the recommended list.
 - For example, if the `threshold` is set to `0.85`, then only animes with a similarity score of 0.85 or higher will be included in the list of recommended animes for the target anime.

- This threshold value can be adjusted to control the size of the recommended list and the level of similarity between the recommended animes and the target anime.
- A higher threshold will result in a smaller list of more similar animes, while a lower threshold will result in a larger list of less similar animes.




# Interview Questions:
## 1. Can you explain the difference between user-based and item-based collaborative filtering?
- `user-based collaborative filtering :`
 - Finds K similar user based on common items they have bought
- `item-based collaborative filtering :`
 - Finds K similar items based on common users who have bought those Items

## 2. What is collaborative filtering, and how does it work?
Collaborative filtering :
- Collaborative filtering is based on the idea of similarity.
- for Examples,  if 2 `users A` and `user B` have purchased same products & have rated them similarly on a common rating scale.
- Then A and B can be considered similar in their buying behaviour.
- Hence, if A buys a new Product and rates High, then that product can be recommended to B