**Recommendation System**

Data Preprocessing:

In [None]:
import pandas as pd
df = pd.read_csv('/content/anime.csv')
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [None]:
print(df.isnull().sum())

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64


In [None]:
df.dropna(subset=['genre','rating'], inplace=True)
df.reset_index(drop=True, inplace=True)
print(df.isnull().sum())

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64


In [None]:
print(df.info())
print("\n",df.describe())
print("\nUnique broadcast types:", df['type'].unique())
print("\nSample genres:", df['genre'].unique()[:10])

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12017 entries, 0 to 12016
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12017 non-null  int64  
 1   name      12017 non-null  object 
 2   genre     12017 non-null  object 
 3   type      12017 non-null  object 
 4   episodes  12017 non-null  object 
 5   rating    12017 non-null  float64
 6   members   12017 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 657.3+ KB
None

            anime_id        rating       members
count  12017.000000  12017.000000  1.201700e+04
mean   13638.001165      6.478264  1.834888e+04
std    11231.076675      1.023857  5.537250e+04
min        1.000000      1.670000  1.200000e+01
25%     3391.000000      5.890000  2.250000e+02
50%     9959.000000      6.570000  1.552000e+03
75%    23729.000000      7.180000  9.588000e+03
max    34519.000000     10.000000  1.013917e+06

Unique broadcast types: ['Movie' 'TV'

Feature Extraction:

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(tokenizer=lambda x: x.split(', '))
genre_matrix = vectorizer.fit_transform(df['genre'])



In [None]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

df['rating'] = df['rating'].fillna(df['rating'].mean())
df['members'] = df['members'].fillna(df['members'].mean())

scaler = MinMaxScaler()
numeric_features = scaler.fit_transform(df[['rating','members']])

In [None]:
from scipy.sparse import hstack
from sklearn.preprocessing import normalize

feature_matrix = hstack([genre_matrix, numeric_features])
feature_matrix = normalize(feature_matrix)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(feature_matrix)

anime_indices = pd.Series(df.index, index=df['name']).drop_duplicates()

Recommendation System:

In [None]:

def recommend_anime(title, top_n=10, score_threhold=0.0):
   if title not in anime_indices:
    return f"Anime '{title}' not found in dataset."

   idx = anime_indices[title]
   sim_scores = list(enumerate(cosine_sim[idx]))
   sim_scores = [s for s in sim_scores if s[0] != idx and s[1] >= score_threhold]
   sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[:top_n]

   anime_ids = [i[0] for i in sim_scores]
   return df[['name','genre','rating']].iloc[anime_ids].copy()
   results['similarity'] = [s[1] for i in sim_scores]

   return results.reset_index(drop=True)

Evaluation:

In [None]:
from sklearn.model_selection import train_test_split

df['relevant'] = df['rating'] >= 7.0
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

In [None]:
def evaluate_recommendation_system(df, k=10, threshold=7.0, max_samples=100):
  total_precision = total_recall = total_f1 = count = 0
  relevant_set = df[df['rating'] >= threshold]

  for _, row in df.iterrows():
     title = row['name']
     if title not in anime_indices:
         continue

     recs = recommend_anime(title, top_n=k)
     if isinstance(recs, str) or recs.empty:
        continue

     relevant_recs = recs[recs['rating'] >= threshold]
     precision = len(relevant_recs) / k
     recall = len(relevant_recs) / len(relevant_set) if len(relevant_set) else 0
     f1 = 2 * (precision * recall) / (precision + recall) if precision + recall else 0

     total_precision += precision
     total_recall += recall
     total_f1 += f1
     count += 1

     if count >= max_samples:
        break

  if count == 0:
    return 0, 0, 0

  return total_precision / count, total_recall / count, total_f1 / count

In [None]:
precision, recall, f1 = evaluate_recommendation_system(test_df, k=10)
print(f"Average Precision: {precision:.4f}")
print(f"Average Recall: {recall:.4f}")
print(f"Average F1 Score: {f1:.4f}")

Average Precision: 0.3350
Average Recall: 0.0042
Average F1 Score: 0.0083


In [None]:
print(recommend_anime('Naruto'))

                                                   name  \
615                                  Naruto: Shippuuden   
1472        Naruto: Shippuuden Movie 4 - The Lost Tower   
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...   
486                            Boruto: Naruto the Movie   
1343                                        Naruto x UT   
2996  Naruto Soyokazeden Movie: Naruto to Mashin to ...   
1103  Boruto: Naruto the Movie - Naruto ga Hokage ni...   
2458               Naruto Shippuuden: Sunny Side Battle   
175                              Katekyo Hitman Reborn!   
7617                            Kyutai Panic Adventure!   

                                                  genre  rating  
615   Action, Comedy, Martial Arts, Shounen, Super P...    7.94  
1472  Action, Comedy, Martial Arts, Shounen, Super P...    7.53  
1573  Action, Comedy, Martial Arts, Shounen, Super P...    7.50  
486   Action, Comedy, Martial Arts, Shounen, Super P...    8.03  
1343  Action, Comedy

**Analysis and Areas for improvement**

*   **Cold-start issue:** Content-based filtering works okay for new users but fails if metadata is limited.

*  **Feature Quality:** Genre and rating are coarse- adding tages, studio info and synopsis embedding may help.

* **Hybrid models:** Combine with collaborative filtering for better personalization.



**Interview Questions:**

1.  Can you explain the difference between user-based and item-based collaborative filtering?

**Answer:** User-Based recommends items to a user based on the preferences of similar users. Easy to understand and implement.

Item-Based recommends items based on the similarity between items themselves, regardless of users.Better when the number of users is very large.

2.   What is collaborative filtering, and how does it work?

**Answer:** It is a popular technique used in recommendation systems to suggest items (like movies, product, music, etc.) to user based on the preferences and behaviour of many users.

Process:

*  Data collection
*  Similarity Computation
*  Prediction
*  Recommendation


