# Recommendation System Using Cosine Similarity
**Objective**

The objective of this assignment is to design and implement an anime recommendation system using cosine similarity. The system recommends anime that are similar to a given anime based on features such as genre, number of episodes, ratings, and popularity.

# Data Preprocessing

In [2]:
import pandas as pd

# Loading data set
df = pd.read_csv("anime.csv")
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [4]:
# Missing Values
df.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [7]:
# Convert to numeric
df['episodes'] = pd.to_numeric(df['episodes'], errors='coerce')

In [9]:
# Handling missing values
df['rating'] = df['rating'].fillna(df['rating'].median())
df['episodes'] = df['episodes'].fillna(df['episodes'].median())
df['genre'] = df['genre'].fillna("Unknown")

# Feature Extraction

Feature Selection

The following features are used to compute similarity:
Genre (categorical, multi-valued)
Rating (numerical)
Episodes (numerical)
Members (numerical popularity indicator)

In [10]:
# Encoding Categorical Features (Genres)
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
genre_matrix = tfidf.fit_transform(df['genre'])

In [11]:
# Normalizing Numerical Features
from sklearn.preprocessing import MinMaxScaler
import numpy as np

scaler = MinMaxScaler()
numerical_features = scaler.fit_transform(df[['rating', 'episodes', 'members']])

numerical_matrix = np.hstack((genre_matrix.toarray(), numerical_features))

# Recommendation System Using Cosine Similarity

In [12]:
# Similarity Computation
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(numerical_matrix)

In [13]:
def recommend_anime(anime_name, df, cosine_sim, threshold=0.3, top_n=10):
    if anime_name not in df['name'].values:
        return "Anime not found in dataset"

    idx = df[df['name'] == anime_name].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Filter based on similarity threshold
    sim_scores = [i for i in sim_scores if i[1] >= threshold and i[0] != idx]

    # Sort by similarity score
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[:top_n]

    anime_indices = [i[0] for i in sim_scores]
    return df[['name', 'genre', 'rating']].iloc[anime_indices]

In [20]:
# Example Usage
recommend_anime("One Piece", df, cosine_sim, threshold=0.35)

Unnamed: 0,name,genre,rating
241,One Piece: Episode of Nami - Koukaishi no Nami...,"Action, Adventure, Comedy, Drama, Fantasy, Sho...",8.27
86,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54
231,One Piece: Episode of Merry - Mou Hitori no Na...,"Action, Adventure, Comedy, Drama, Fantasy, Sho...",8.29
896,One Piece: Episode of Sabo - 3 Kyoudai no Kizu...,"Action, Adventure, Comedy, Drama, Fantasy, Sho...",7.78
6,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",9.13
717,Shingeki no Kyojin OVA,"Action, Drama, Fantasy, Shounen, Super Power",7.88
10899,Shingeki no Kyojin Season 2,"Action, Drama, Fantasy, Shounen, Super Power",6.57
352,One Piece Film: Strong World Episode 0,"Action, Adventure, Comedy, Fantasy, Shounen, S...",8.16
941,One Piece Movie 4: Dead End no Bouken,"Action, Adventure, Comedy, Fantasy, Shounen, S...",7.76
2492,One Piece Movie 1,"Action, Adventure, Comedy, Fantasy, Shounen, S...",7.25


**Observation:**
Increasing the threshold improves relevance but reduces the number of recommendations.

**Performance Analysis & Improvements**
**Strengths:**
Simple and efficient
No user history required
Works well for content similarity

**Limitations:**
Cannot personalize recommendations per user
Cold-start problem for new anime
Genre-based similarity may ignore user taste

**Possible Improvements:**
Hybrid approach (content + collaborative filtering)
User-rating interaction matrix
Weighted features (genre > rating > popularity)

# Conclusion

The cosine similarity–based recommendation system effectively identifies anime with similar characteristics. It is suitable for content-based recommendation, especially when user interaction data is limited. However, personalization can be improved by integrating collaborative filtering techniques.

# Interview QuestiCan 
**1. Explain the difference between user-based and item-based collaborative filtering?**
->
**User-Based Collaborative Filtering:**
User-based collaborative filtering identifies users with similar behavior or preferences and recommends items that similar users have liked but the target user has not yet interacted with.

Advantages:
Intuitive and easy to understand.
Captures community-based trends.

Limitations:
Poor scalability with large user bases.
Sensitive to changes in user behavior.
Suffers from data sparsity when users have few interactions.

**Item-Based Collaborative Filtering:**
Item-based collaborative filtering focuses on similarity between items, recommending items that are similar to those a user has already liked.

Advantages:
More scalable and stable than user-based methods.
Item similarities change less frequently.
Performs well in large-scale systems (e-commerce platforms).

Limitations:
Limited personalization for users with very unique tastes.
Cold-start problem for new items.

**2.What is collaborative filtering, and how does it work?**
->
Collaborative filtering is a recommendation system technique that predicts user preferences by finding patterns in the behavior of similar users, working on the idea that "people who agreed in the past will agree in the future" to suggest items (movies, products) a user might like based on what their "neighbors" (similar users) enjoyed. It works by building a user-item matrix of interactions (ratings, views), identifying user/item similarities (e.g., via cosine similarity), and then recommending items liked by similar users but not yet seen by the target user. 

Working:
1. Data Collection
The system gathers user preferences via two types of feedback: 
Explicit Feedback: Direct input from users, such as star ratings (1–5) or likes.
Implicit Feedback: Inferred preferences from user actions, such as browsing history, purchase records, or time spent viewing an item.

3. Similarity Computation
To identify patterns, the system calculates "distance" or similarity between users or items using mathematical metrics: 
Cosine Similarity: Measures the angle between two vectors to determine how closely they align.
Pearson Correlation: Measures the linear correlation between users' ratings, often adjusting for individual rating biases (e.g., "tough raters"). 
User-Based: Recommends items that users in the target user's neighborhood liked, but the target user hasn't interacted with.
Item-Based: Recommends items similar to those the user has already liked.

4. Preference Prediction
Identify nearest neighbors (similar users or similar items).
Aggregate their preferences (weighted average of ratings).
Predict the missing value.

5. Recommendation Generation
Items with the highest predicted scores are recommended to the user.