**Recommendation System**



**Data Preprocessing:**

 - Load the dataset into a suitable data structure (e.g., pandas DataFrame).
Handle missing values, if any.
Explore the dataset to understand its structure and attributes.

In [1]:
import pandas as pd
# Load the dataset
#Upload the file Anime.Csv from google colab copy the path and execute
df = pd.read_csv('/content/sample_data/anime.csv')

In [3]:
# Handle missing values by dropping rows with missing data
df.dropna(inplace=True)
# Explore the dataset: Display basic information
print(df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 12017 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12017 non-null  int64  
 1   name      12017 non-null  object 
 2   genre     12017 non-null  object 
 3   type      12017 non-null  object 
 4   episodes  12017 non-null  object 
 5   rating    12017 non-null  float64
 6   members   12017 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 751.1+ KB
None


In [4]:
# Display the first few rows of the dataset
print(df.head())

   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  


**Feature Extraction:**

Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
Convert categorical features into numerical representations if necessary.
Normalize numerical features if required.


In [6]:
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
# Select features for similarity computation (e.g., 'genres' and 'rating')
selected_features = df[['genre', 'rating']]
# Convert 'genre' (categorical) into numerical using OneHotEncoder
encoder = OneHotEncoder(sparse=False)
genres_encoded = encoder.fit_transform(selected_features[['genre']])
# Normalize 'rating' (numerical) using MinMaxScaler
scaler = MinMaxScaler()
ratings_normalized = scaler.fit_transform(selected_features[['rating']])
# Combine encoded genres and normalized ratings into a single feature set
features = pd.DataFrame(genres_encoded, columns=encoder.get_feature_names_out(['genre']))
features['rating'] = ratings_normalized
# Display the first few rows of the processed features
print(display(features.head()))



Unnamed: 0,genre_Action,"genre_Action, Adventure","genre_Action, Adventure, Cars, Comedy, Sci-Fi, Shounen","genre_Action, Adventure, Cars, Mecha, Sci-Fi, Shounen, Sports","genre_Action, Adventure, Cars, Sci-Fi","genre_Action, Adventure, Comedy","genre_Action, Adventure, Comedy, Demons, Drama, Ecchi, Horror, Mystery, Romance, Sci-Fi","genre_Action, Adventure, Comedy, Demons, Fantasy, Magic","genre_Action, Adventure, Comedy, Demons, Fantasy, Magic, Romance, Shounen, Supernatural","genre_Action, Adventure, Comedy, Demons, Fantasy, Martial Arts, Shounen, Super Power",...,"genre_Slice of Life, Space","genre_Slice of Life, Supernatural",genre_Space,genre_Sports,"genre_Super Power, Supernatural, Vampire",genre_Supernatural,genre_Thriller,genre_Vampire,genre_Yaoi,rating
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.92437
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.911164
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.909964
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.90036
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.89916


None


**Recommendation System:**

Design a function to recommend anime based on cosine similarity.
Given a target anime, recommend a list of similar anime based on cosine similarity scores.
Experiment with different threshold values for similarity scores to adjust the recommendation list size.

In [7]:
from sklearn.metrics.pairwise import cosine_similarity
# Ensure there are no NaN values in the features DataFrame
features.fillna(0, inplace=True)  # Fill NaN values with 0
# Compute cosine similarity matrix
cosine_sim = cosine_similarity(features)
# Function to recommend anime based on cosine similarity
def recommend_anime(target_anime, df, cosine_sim, threshold=0.5):
    idx = df[df['name'] == target_anime].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = [i for i in sim_scores if i[1] > threshold and i[0] != idx]
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    anime_indices = [i[0] for i in sim_scores]
    return df['name'].iloc[anime_indices]
#Recommend similar anime to "Naruto"
recommended_anime = recommend_anime("Naruto", df, cosine_sim, threshold=0.6)
print(recommended_anime)

615                                    Naruto: Shippuuden
1103    Boruto: Naruto the Movie - Naruto ga Hokage ni...
486                              Boruto: Naruto the Movie
1343                                          Naruto x UT
1472          Naruto: Shippuuden Movie 4 - The Lost Tower
1573    Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...
2458                 Naruto Shippuuden: Sunny Side Battle
2997    Naruto Soyokazeden Movie: Naruto to Mashin to ...
Name: name, dtype: object


**Evaluation:**

Split the dataset into training and testing sets.
Evaluate the recommendation system using appropriate metrics such as precision, recall, and F1-score.
Analyze the performance of the recommendation system and identify areas of improvement.

In [11]:
# Encode 'genre' and normalize 'rating'
encoder = OneHotEncoder(sparse_output=False)
features = encoder.fit_transform(df[['genre']])
features = pd.DataFrame(features)
features['rating'] = MinMaxScaler().fit_transform(df[['rating']])
# Fill NaN values with 0
features.fillna(0, inplace=True)
# Compute cosine similarity on the entire dataset
cosine_sim = cosine_similarity(features)
print(cosine_sim)

[[1.         0.45717461 0.45684521 ... 0.24407989 0.25066034 0.28110986]
 [0.45717461 1.         0.45329096 ... 0.24218096 0.24871021 0.27892283]
 [0.45684521 0.45329096 1.         ... 0.24200646 0.24853101 0.27872187]
 ...
 [0.24407989 0.24218096 0.24200646 ... 1.         0.99994581 0.99824985]
 [0.25066034 0.24871021 0.24853101 ... 0.99994581 1.         0.99881138]
 [0.28110986 0.27892283 0.27872187 ... 0.99824985 0.99881138 1.        ]]


**1.Can you explain the difference between user-based and item-based collaborative filtering?**

**ANSWER:**

User-based filtering relies on similarities between users, while item-based filtering relies on similarities between items.

User-based collaborative filtering recommends items to a user based on the preferences of other users who have similar tastes. It looks for users who have rated items similarly and suggests items that these similar users liked but the current user hasn't yet experienced.

Item-based collaborative filtering, on the other hand, focuses on the similarity between items. It recommends items that are similar to those the user has already liked or interacted with. For example, if a user liked a particular movie, the system suggests other movies that are often rated similarly by others who liked the same movie.

**2. What is collaborative filtering, and how does it work?**

**ANSWER:**
Collaborative filtering is a technique used in recommendation systems to suggest items (like movies, products, or books) based on the preferences and behaviors of other users.The system identifies users or items with similar patterns and uses this information to make personalized recommendations. For example, if two users have similar tastes in movies, collaborative filtering will recommend movies that one user liked to the other user.






