### Data Description:
- Unique ID of each anime.
- Anime title.
- Anime broadcast type, such as TV, OVA, etc.
- anime genre.
- The number of episodes of each anime.
- The average rating for each anime compared to the number of users who gave ratings.
- Number of community members for each anime.

### Objective:
- The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset. 
### Dataset:
- Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.

### Tasks:

### Data Preprocessing:
- Load the dataset into a suitable data structure (e.g., pandas DataFrame).
- Handle missing values, if any.
- Explore the dataset to understand its structure and attributes.

### Feature Extraction:
- Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
- Convert categorical features into numerical representations if necessary.
- Normalize numerical features if required.

### Recommendation System:
- Design a function to recommend anime based on cosine similarity.
- Given a target anime, recommend a list of similar anime based on cosine similarity scores.
- Experiment with different threshold values for similarity scores to adjust the recommendation list size.
- Analyze the performance of the recommendation system and identify areas of improvement.

### Interview Questions:
1. Can you explain the difference between user-based and item-based collaborative filtering?
2. What is collaborative filtering, and how does it work?

In [2]:
import pandas as pd
import numpy as np

anime = pd.read_csv('anime.csv')
anime

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [3]:
# Sort by number of members (popularity)
anime = anime.sort_values('members', ascending=False)
anime.reset_index(drop=True, inplace=True)

In [4]:
# 2. Data Preprocessing & Basic Stats
anime.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [5]:
anime['genre'] = anime['genre'].fillna(anime['genre'].mode()[0])
anime['rating'] = anime['rating'].fillna(anime['rating'].mean())
anime['type'] = anime['type'].fillna(anime['type'].mode()[0])

In [6]:
anime.isnull().sum()

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64

In [7]:
from sklearn.preprocessing import MinMaxScaler
# Normalize rating and members for balance
scaler = MinMaxScaler()
anime[['rating', 'members']] = scaler.fit_transform(anime[['rating', 'members']])

In [8]:
anime['type'].unique()

array(['TV', 'Movie', 'OVA', 'Special', 'ONA', 'Music'], dtype=object)

In [9]:
len(anime['genre'].unique())

3264

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer
# Convert textual genre information into numerical vectors
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(anime['genre'])

In [11]:
tfidf_matrix.shape

(12294, 46)

In [12]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [13]:
cosine_sim.shape

(12294, 12294)

In [14]:
# Combine genre features with normalized rating and members
features = np.hstack((tfidf_matrix.toarray(), anime[['rating', 'members']].values))
features

array([[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 8.45138055e-01, 1.00000000e+00],
       [3.02721737e-01, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 8.24729892e-01, 8.83926810e-01],
       [3.25663738e-01, 3.51037627e-01, 0.00000000e+00, ...,
        0.00000000e+00, 7.39495798e-01, 8.80840744e-01],
       ...,
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 1.59663866e-01, 6.90395222e-06],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 5.76698882e-01, 5.91767333e-06],
       [0.00000000e+00, 6.58696274e-01, 0.00000000e+00, ...,
        0.00000000e+00, 5.76698882e-01, 0.00000000e+00]])

In [15]:
from sklearn.metrics import pairwise_distances
# Compute cosine similarity matrix
user_sim = 1 - pairwise_distances(features, metric='cosine')
user_sim.shape

(12294, 12294)

In [16]:
movie = 'Kyutai Panic Adventure!'
top = 10

In [17]:
idx = anime[anime['name'] == movie].index[0]
user_sim[idx]

array([0.20101896, 0.57889742, 0.23909737, ..., 0.06166622, 0.19539304,
       0.19539304])

In [18]:
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

In [19]:
sim_scores = sim_scores[1:top+1]
sim_scores

[(6, np.float64(0.980692123146288)),
 (26, np.float64(0.980692123146288)),
 (690, np.float64(0.980692123146288)),
 (702, np.float64(0.980692123146288)),
 (785, np.float64(0.980692123146288)),
 (1817, np.float64(0.980692123146288)),
 (1890, np.float64(0.980692123146288)),
 (2308, np.float64(0.980692123146288)),
 (2647, np.float64(0.980692123146288)),
 (665, np.float64(0.9658332903032648))]

In [20]:
anime_indices = [i[0] for i in sim_scores]
print(anime_indices)
recommended_movies = anime.iloc[anime_indices]
recommended_movies


[6, 26, 690, 702, 785, 1817, 1890, 2308, 2647, 665]


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
6,20,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,0.737095,0.673916
26,1735,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,Unknown,0.752701,0.526252
690,8246,Naruto: Shippuuden Movie 4 - The Lost Tower,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.703481,0.083362
702,6325,Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.69988,0.082364
785,28755,Boruto: Naruto the Movie,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.763505,0.07366
1817,10659,Naruto Soyokazeden Movie: Naruto to Mashin to ...,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.653061,0.024824
1890,10075,Naruto x UT,"Action, Comedy, Martial Arts, Shounen, Super P...",OVA,1,0.709484,0.023138
2308,32365,Boruto: Naruto the Movie - Naruto ga Hokage ni...,"Action, Comedy, Martial Arts, Shounen, Super P...",Special,1,0.721489,0.016632
2647,19511,Naruto Shippuuden: Sunny Side Battle,"Action, Comedy, Martial Arts, Shounen, Super P...",Special,1,0.671068,0.012831
665,13667,Naruto: Shippuuden Movie 6 - Road to Ninja,"Action, Adventure, Martial Arts, Shounen, Supe...",Movie,1,0.740696,0.086165


In [26]:
def recommend_anime(movie, top=10):
    if movie not in anime['name'].values:
        print(f"Anime '{movie}' not found in dataset.")
        return
    idx = anime[anime['name'] == movie].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top+1]
    anime_indices = [i[0] for i in sim_scores]
    recommended_movies = anime.iloc[anime_indices]
    recommended_movies = recommended_movies.reset_index(drop=True)
    if 'anime_id' in recommended_movies.columns:
        recommended_movies = recommended_movies.drop(columns=['anime_id'])
    return recommended_movies

recommended_movies = recommend_anime('Kyutai Panic Adventure!')
recommended_movies


Unnamed: 0,name,genre,type,episodes,rating,members
0,Naruto,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,220,0.737095,0.673916
1,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",TV,Unknown,0.752701,0.526252
2,Naruto: Shippuuden Movie 4 - The Lost Tower,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.703481,0.083362
3,Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.69988,0.082364
4,Boruto: Naruto the Movie,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.763505,0.07366
5,Naruto Soyokazeden Movie: Naruto to Mashin to ...,"Action, Comedy, Martial Arts, Shounen, Super P...",Movie,1,0.653061,0.024824
6,Naruto x UT,"Action, Comedy, Martial Arts, Shounen, Super P...",OVA,1,0.709484,0.023138
7,Boruto: Naruto the Movie - Naruto ga Hokage ni...,"Action, Comedy, Martial Arts, Shounen, Super P...",Special,1,0.721489,0.016632
8,Naruto Shippuuden: Sunny Side Battle,"Action, Comedy, Martial Arts, Shounen, Super P...",Special,1,0.671068,0.012831
9,Naruto: Shippuuden Movie 6 - Road to Ninja,"Action, Adventure, Martial Arts, Shounen, Supe...",Movie,1,0.740696,0.086165
