# Recommendation System

## Data Description:

#### Unique ID of each anime.
#### Anime title.
#### Anime broadcast type, such as TV, OVA, etc.
#### anime genre.
#### The number of episodes of each anime.
#### The average rating for each anime compared to the number of users who gave ratings.

#### Number of community members for each anime.
#### Objective:
#### The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset. 
#### Dataset:
#### Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.

#### Tasks:

#### Data Preprocessing:

#### Load the dataset into a suitable data structure (e.g., pandas DataFrame).
#### Handle missing values, if any.
#### Explore the dataset to understand its structure and attributes.

In [2]:
import pandas as pd

anime_df = pd.read_csv('anime.csv')

anime_df.info(), anime_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


(None,
    anime_id                              name  \
 0     32281                    Kimi no Na wa.   
 1      5114  Fullmetal Alchemist: Brotherhood   
 2     28977                          Gintama°   
 3      9253                       Steins;Gate   
 4      9969                     Gintama&#039;   
 
                                                genre   type episodes  rating  \
 0               Drama, Romance, School, Supernatural  Movie        1    9.37   
 1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
 2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
 3                                   Sci-Fi, Thriller     TV       24    9.17   
 4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   
 
    members  
 0   200630  
 1   793665  
 2   114262  
 3   673572  
 4   151266  )

### Feature Extraction:

#### Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
#### Convert categorical features into numerical representations if necessary.
#### Normalize numerical features if required.

In [16]:
from sklearn.preprocessing import MultiLabelBinarizer, MinMaxScaler
import numpy as np

anime_df["genre"].fillna("Unknown", inplace=True)
anime_df["type"].fillna("Unknown", inplace=True)
anime_df["rating"].fillna(anime_df["rating"].mean(), inplace=True)

anime_df["episodes"] = anime_df["episodes"].replace("Unknown", np.nan)
anime_df["episodes"] = pd.to_numeric(anime_df["episodes"], errors="coerce")
anime_df["episodes"].fillna(anime_df["episodes"].median(), inplace=True)

mlb = MultiLabelBinarizer()
genres_encoded = mlb.fit_transform(anime_df["genre"].str.split(", "))
genres_df = pd.DataFrame(genres_encoded, columns=mlb.classes_)

type_encoded = pd.get_dummies(anime_df["type"], prefix="type")

scaler = MinMaxScaler()
anime_df["rating"] = scaler.fit_transform(anime_df[["rating"]])

anime_features = pd.concat([genres_df, type_encoded, anime_df[["rating", "episodes"]]], axis=1)

anime_features.head()

Unnamed: 0,Action,Adventure,Cars,Comedy,Dementia,Demons,Drama,Ecchi,Fantasy,Game,...,Yuri,type_Movie,type_Music,type_ONA,type_OVA,type_Special,type_TV,type_Unknown,rating,episodes
0,0,0,0,0,0,0,1,0,0,0,...,0,True,False,False,False,False,False,False,0.92437,1.0
1,1,1,0,0,0,0,1,0,1,0,...,0,False,False,False,False,False,True,False,0.911164,64.0
2,1,0,0,1,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.909964,51.0
3,0,0,0,0,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.90036,24.0
4,1,0,0,1,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.89916,51.0


In [6]:
anime_features.columns = anime_features.columns.str.replace(r"[\[\]']", "", regex=True)

anime_features.head()


Unnamed: 0,Action,Adventure,Cars,Comedy,Dementia,Demons,Drama,Ecchi,Fantasy,Game,...,Yuri,type_Movie,type_Music,type_ONA,type_OVA,type_Special,type_TV,type_Unknown,rating,episodes
0,0,0,0,0,0,0,1,0,0,0,...,0,True,False,False,False,False,False,False,0.92437,1.0
1,1,1,0,0,0,0,1,0,1,0,...,0,False,False,False,False,False,True,False,0.911164,64.0
2,1,0,0,1,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.909964,51.0
3,0,0,0,0,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.90036,24.0
4,1,0,0,1,0,0,0,0,0,0,...,0,False,False,False,False,False,True,False,0.89916,51.0


## Recommendation System:

#### Design a function to recommend anime based on cosine similarity.
#### Given a target anime, recommend a list of similar anime based on cosine similarity scores.
#### Experiment with different threshold values for similarity scores to adjust the recommendation list size.


In [17]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(anime_features)

anime_titles = anime_df["name"]

def recommend_anime(title, top_n=10, similarity_threshold=0.5):
    if title not in anime_titles.values:
        return f"Anime '{title}' not found in dataset."
    
    idx = anime_titles[anime_titles == title].index[0]
    
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    sim_scores = [x for x in sim_scores if x[1] >= similarity_threshold and x[0] != idx]
    
    top_anime_indices = [x[0] for x in sim_scores[:top_n]]
    
    return anime_titles.iloc[top_anime_indices].tolist()

recommend_anime("Steins;Gate", top_n=5, similarity_threshold=0.5)

['RoboDz',
 'Yuusei Kamen',
 'Go-Q-Choji Ikkiman',
 'Patapata Hikousen no Bouken',
 'Groizer X']

## Evaluation:

#### Split the dataset into training and testing sets.
#### Evaluate the recommendation system using appropriate metrics such as precision, recall, and F1-score.
#### Analyze the performance of the recommendation system and identify areas of improvement.


In [18]:
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(anime_df, test_size=0.2, random_state=42, stratify=pd.qcut(anime_df["members"], q=4, labels=False))

train_titles = train_df["name"].tolist()

def precision_recall_at_k(recommended_list, relevant_list, k=10):
    recommended_list = recommended_list[:k]  

    hits = len(set(recommended_list) & set(relevant_list))
    precision = hits / k

    recall = hits / len(relevant_list) if len(relevant_list) > 0 else 0

    f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, f1

precision_scores, recall_scores, f1_scores = [], [], []

for anime in test_df["name"].head(50):  
    if anime in train_titles:
        relevant_anime = test_df[test_df["name"] != anime]["name"].tolist()  
        recommended_anime = recommend_anime_optimized(anime, top_n=10)

        precision, recall, f1 = precision_recall_at_k(recommended_anime, relevant_anime, k=10)
        precision_scores.append(precision)
        recall_scores.append(recall)
        f1_scores.append(f1)

avg_precision = np.mean(precision_scores)
avg_recall = np.mean(recall_scores)
avg_f1 = np.mean(f1_scores)

avg_precision, avg_recall, avg_f1


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


(nan, nan, nan)

## Interview Questions:
### 1. Can you explain the difference between user-based and item-based collaborative filtering?

## 1. User-Based Collaborative Filtering (UBCF)
#### Similar users have similar tastes.
#### How it works:
#### Find users with similar rating patterns.
#### Recommend items that similar users liked but the target user hasn’t seen.
#### Example:
#### If Alice and Bob have rated many movies the same way, and Alice liked a movie that Bob hasn't watched, we recommend it to Bob.
#### Challenges:
#### Doesn't work well with sparse data (many users, few ratings per user).
#### Scalability issues with a large number of users.
### 2. Item-Based Collaborative Filtering (IBCF)
#### Similar items are liked by similar users.
#### How it works:
#### Compute similarity between items based on how users rated them.
#### Recommend items similar to those a user has liked.
#### Example:
#### If many users who liked Attack on Titan also liked Death Note, then if a new user likes Attack on Titan, we recommend Death Note.
#### Advantages:
#### More stable over time (items don’t change as frequently as users).
#### Works better with sparse data compared to UBCF.

### 2. What is collaborative filtering, and how does it work?

#### Collaborative Filtering is a technique used in recommendation systems that makes predictions about a user's interests based on past interactions of users with similar behavior.

#### Instead of relying on predefined characteristics (e.g., genre or price), CF leverages user-item interactions, such as ratings or purchase history, to suggest relevant items.

#### Collaborative Filtering operates based on the assumption:
#### "If two users agree on one item, they are likely to agree on others too."

#### There are two main types of CF:

#### 1. User-Based Collaborative Filtering
#### Finds similar users based on their past interactions.
#### Steps:
#### Identify users who have rated or interacted with items similarly.
#### Recommend items that similar users have liked but the target user hasn’t seen.
#### Example:
#### Alice and Bob both gave high ratings to Attack on Titan and Death Note.
#### Alice also liked Tokyo Ghoul, but Bob hasn’t watched it.
#### Recommend Tokyo Ghoul to Bob.
#### 2. Item-Based Collaborative Filtering
#### Finds similar items based on how users interact with them.
#### Steps:
#### Compute similarity between items based on user ratings.
#### Recommend items that are similar to those a user has already liked.
#### Example:
#### Many users who liked Naruto also liked Bleach.
#### If a new user likes Naruto, recommend Bleach.
