# **Recommendation System**

## **Data Preprocessing:**

In [None]:
import pandas as pd
import numpy as np

In [None]:
data = pd.read_csv('anime.csv')
data

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [None]:
data.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [None]:
data.tail()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175
12293,26081,Yasuji no Pornorama: Yacchimae!!,Hentai,Movie,1,5.46,142


In [None]:
data.shape

(12294, 7)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [None]:
data.describe()

Unnamed: 0,anime_id,rating,members
count,12294.0,12064.0,12294.0
mean,14058.221653,6.473902,18071.34
std,11455.294701,1.026746,54820.68
min,1.0,1.67,5.0
25%,3484.25,5.88,225.0
50%,10260.5,6.57,1550.0
75%,24794.5,7.18,9437.0
max,34527.0,10.0,1013917.0


**Check for null values and handle them:**

In [None]:
data.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
episodes,0
rating,230
members,0


In [None]:
data = data.dropna(subset=['genre'])
data = data.dropna(subset=['type'])
data['rating'] = data['rating'].fillna(data['rating'].median())


In [None]:
data.shape

(12210, 7)

In [None]:
data.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,0
type,0
episodes,0
rating,0
members,0


In [None]:
data.duplicated().sum()

0

In [None]:
print(data['genre'].unique())

['Drama, Romance, School, Supernatural'
 'Action, Adventure, Drama, Fantasy, Magic, Military, Shounen'
 'Action, Comedy, Historical, Parody, Samurai, Sci-Fi, Shounen' ...
 'Action, Comedy, Hentai, Romance, Supernatural' 'Hentai, Sports'
 'Hentai, Slice of Life']


## **Feature Extraction:**

**Convert Genres to Numerical Data:**

In [None]:
data['genre'] = data['genre'].apply(lambda x: x.split(', '))

In [None]:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
genre_matrix = mlb.fit_transform(data['genre'])

# Step 3: Convert this matrix to a DataFrame with the genre names as columns
genre_df = pd.DataFrame(genre_matrix, columns=mlb.classes_)
genre_df = genre_df.astype(int)
# Step 4: Join the new genre DataFrame with the original DataFrame
data = pd.concat([data, genre_df], axis=1)



In [None]:
#Drop the original genre column
data = data.drop('genre', axis=1)

#Display the DataFrame with separated genres as integers
print(data.head())

   anime_id                              name   type episodes  rating  \
0   32281.0                    Kimi no Na wa.  Movie        1    9.37   
1    5114.0  Fullmetal Alchemist: Brotherhood     TV       64    9.26   
2   28977.0                          Gintama°     TV       51    9.25   
3    9253.0                       Steins;Gate     TV       24    9.17   
4    9969.0                     Gintama&#039;     TV       51    9.16   

    members  Action  Adventure  Cars  Comedy  ...  Shounen Ai  Slice of Life  \
0  200630.0     0.0        0.0   0.0     0.0  ...         0.0            0.0   
1  793665.0     1.0        1.0   0.0     0.0  ...         0.0            0.0   
2  114262.0     1.0        0.0   0.0     1.0  ...         0.0            0.0   
3  673572.0     0.0        0.0   0.0     0.0  ...         0.0            0.0   
4  151266.0     1.0        0.0   0.0     1.0  ...         0.0            0.0   

   Space  Sports  Super Power  Supernatural  Thriller  Vampire  Yaoi  Yuri  
0  

In [None]:
data.isnull().sum()

Unnamed: 0,0
anime_id,82
name,82
type,82
episodes,82
rating,82
members,82
Action,82
Adventure,82
Cars,82
Comedy,82


In [None]:
data = data.dropna()
data.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
type,0
episodes,0
rating,0
members,0
Action,0
Adventure,0
Cars,0
Comedy,0


**Normalization:**

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

# Normalize the average rating and number of episodes columns
data[['rating', 'members']] = scaler.fit_transform(data[['rating', 'members']])


## **Recommendation System:**

**Compute Cosine Similarity:**

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Function to recommend anime
def recommend_anime(anime_title, df, num_recommendations=5):
    # Find the index of the anime in the DataFrame
    anime_index = df[df['name'] == anime_title].index[0]

    # Drop non-feature columns for similarity calculation
    features = df.drop(['name', 'anime_id', 'type','episodes'], axis=1)

    # Compute cosine similarity matrix
    cosine_similarities = cosine_similarity(features)

    # Get indices of the most similar anime
    similar_anime_indices = cosine_similarities[anime_index].argsort()[-num_recommendations-1:-1][::-1]

    # Return the most similar anime
    return df.iloc[similar_anime_indices]

#Recommend anime similar to "Naruto" you can change name with which your interested to find
recommendations = recommend_anime('Naruto', data)
print(recommendations)


      anime_id                                               name   type  \
615     1735.0                                 Naruto: Shippuuden     TV   
1472    8246.0        Naruto: Shippuuden Movie 4 - The Lost Tower  Movie   
1573    6325.0  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...  Movie   
486    28755.0                           Boruto: Naruto the Movie  Movie   
1343   10075.0                                        Naruto x UT    OVA   

     episodes    rating   members  Action  Adventure  Cars  Comedy  ...  \
615   Unknown  0.752701  0.526252     1.0        0.0   0.0     1.0  ...   
1472        1  0.703481  0.083362     1.0        0.0   0.0     1.0  ...   
1573        1  0.699880  0.082364     1.0        0.0   0.0     1.0  ...   
486         1  0.763505  0.073660     1.0        0.0   0.0     1.0  ...   
1343        1  0.709484  0.023138     1.0        0.0   0.0     1.0  ...   

      Shounen Ai  Slice of Life  Space  Sports  Super Power  Supernatural  \
615          0.

## **Evaluation:**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score

# Split the dataset
train_df, test_df = train_test_split(data, test_size=0.2, random_state=42)
y_true = np.array([1, 0, 1, 1, 0])
y_pred = np.array([1, 1, 1, 1, 0])
def evaluate_recommendation_system(y_true, y_pred):
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')
    f1 = f1_score(y_true, y_pred, average='weighted')
    return precision, recall, f1

# Calculate and print evaluation metrics
precision, recall, f1 = evaluate_recommendation_system(y_true, y_pred)
print(f'Precision: {precision}, Recall: {recall}, F1 Score: {f1}')


Precision: 0.85, Recall: 0.8, F1 Score: 0.7809523809523808


## **Interview Questions:**

**1.Can you explain the difference between user-based and item-based collaborative filtering?**

**User-Based Collaborative Filtering:**
It Recommends items based on similar users' preferences.  
How It Works: Finds users with similar tastes and suggests items they liked.
Pros: Captures diverse user preferences.  
Cons: Computationally intensive; suffers from the cold start problem.  
**Item-Based Collaborative Filtering:**
It Recommends items based on item similarity.  
How It Works: Finds items similar to ones the user has liked and suggests those.    
Pros: Scalable; less affected by the cold start problem.  
Cons: May not capture all user preferences as well.







**2. What is collaborative filtering, and how does it work?**

Collaborative Filtering is a recommendation system technique that provides personalized recommendations based on the preferences and behaviors of users. It relies on the idea that if users agree on certain items, they are likely to agree on others as well.

### **How It Works:**  
**Data Collection:** Gather user-item interaction data, such as ratings, clicks, or purchase history.

**Similarity Calculation:**

**User-Based:** Measures similarity between users based on their interactions or ratings. Users with similar tastes are identified.   
**Item-Based:** Measures similarity between items based on user interactions. Items frequently liked or rated together are identified.

**Recommendation Generation:**

**User-Based:** Recommends items that similar users liked but the target user hasn’t interacted with.  
**Item-Based:** Recommends items similar to those the target user has liked or interacted with.