In [8]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
df = pd.read_csv('/content/anime.csv')

In [9]:
# Data Preprocessing
# Handle missing values
df = df.dropna(subset=['name', 'genre', 'rating'])


In [11]:
# Convert 'episodes' to numeric, forcing errors to NaN (e.g., 'Unknown' will be converted to NaN)
df['num_episodes'] = pd.to_numeric(df['episodes'], errors='coerce')
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,num_episodes
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,1.0
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665,64.0
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262,51.0
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572,24.0
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266,51.0
...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211,1.0
12290,5543,Under World,Hentai,OVA,1,4.28,183,1.0
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219,4.0
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175,1.0


In [12]:

# Drop rows where 'num_episodes' is NaN
df = df.dropna(subset=['num_episodes'])
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,num_episodes
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,1.0
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665,64.0
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262,51.0
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572,24.0
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266,51.0
...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211,1.0
12290,5543,Under World,Hentai,OVA,1,4.28,183,1.0
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219,4.0
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175,1.0


In [13]:
# Encode categorical features (e.g., genre) into numerical format
le = LabelEncoder()
df['encoded_genre'] = le.fit_transform(df['genre'])
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['encoded_genre'] = le.fit_transform(df['genre'])


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,num_episodes,encoded_genre
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,1.0,2651
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665,64.0,159
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262,51.0,526
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572,24.0,3193
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266,51.0,526
...,...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211,1.0,2865
12290,5543,Under World,Hentai,OVA,1,4.28,183,1.0,2865
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219,4.0,2865
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175,1.0,2865


In [14]:
# Normalize numerical features (rating, number of episodes)
df['rating_normalized'] = (df['rating'] - df['rating'].mean()) / df['rating'].std()
df['episodes_normalized'] = (df['num_episodes'] - df['num_episodes'].mean()) / df['num_episodes'].std()
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['rating_normalized'] = (df['rating'] - df['rating'].mean()) / df['rating'].std()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['episodes_normalized'] = (df['num_episodes'] - df['num_episodes'].mean()) / df['num_episodes'].std()


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,num_episodes,encoded_genre,rating_normalized,episodes_normalized
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,1.0,2651,2.831181,-0.243894
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665,64.0,159,2.723248,1.093767
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262,51.0,526,2.713436,0.817741
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572,24.0,3193,2.634939,0.244458
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266,51.0,526,2.625127,0.817741
...,...,...,...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211,1.0,2865,-2.290747,-0.243894
12290,5543,Under World,Hentai,OVA,1,4.28,183,1.0,2865,-2.163189,-0.243894
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219,4.0,2865,-1.574462,-0.180196
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175,1.0,2865,-1.476340,-0.243894


In [15]:
# Feature extraction: Select features for similarity
features = df[['encoded_genre', 'rating_normalized', 'episodes_normalized']]
features

Unnamed: 0,encoded_genre,rating_normalized,episodes_normalized
0,2651,2.831181,-0.243894
1,159,2.723248,1.093767
2,526,2.713436,0.817741
3,3193,2.634939,0.244458
4,526,2.625127,0.817741
...,...,...,...
12289,2865,-2.290747,-0.243894
12290,2865,-2.163189,-0.243894
12291,2865,-1.574462,-0.180196
12292,2865,-1.476340,-0.243894


In [16]:

# Compute Cosine Similarity
cosine_sim = cosine_similarity(features)

In [17]:

# Recommendation Function
def recommend_anime(target_anime, top_n=5):
    # Find the index of the target anime
    idx = df[df['name'] == target_anime].index[0]

    # Get similarity scores for the target anime
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the animes based on similarity score
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get top N most similar animes
    sim_scores = sim_scores[1:top_n+1]  # Skip the first one as it’s the target anime itself
    anime_indices = [i[0] for i in sim_scores]

    # Return the top N recommended anime titles
    return df['name'].iloc[anime_indices].tolist()

In [24]:

# Test the recommendation system
target_anime = 'Haikyuu!!'  # Example target anime
recommended_animes = recommend_anime(target_anime)
print(f"Recommended animes for {target_anime}: {recommended_animes}")

Recommended animes for Haikyuu!!: ['Hajime no Ippo: Rising', 'Bakuman. 3rd Season', 'Shirobako', 'Hajime no Ippo: New Challenger', 'Igano Kabamaru']




### 1. **Can you explain the difference between user-based and item-based collaborative filtering?**

- **User-based Collaborative Filtering**:
  - This method recommends items to a user based on the preferences of similar users.
  - It looks for users who have similar tastes (e.g., who liked the same items) and recommends items that those similar users liked but the target user hasn't seen yet.
  - **Example**: If User A and User B both liked Anime 1 and Anime 2, and User A also liked Anime 3, then Anime 3 might be recommended to User B.

- **Item-based Collaborative Filtering**:
  - This method recommends items based on how similar they are to other items the user has liked in the past.
  - It analyzes the relationships between items based on user behavior (e.g., if users who liked Anime 1 also liked Anime 2).
  - **Example**: If many users who liked Anime 1 also liked Anime 2, then Anime 2 will be recommended to a user who liked Anime 1.

### 2. **What is collaborative filtering, and how does it work?**

- **Collaborative Filtering** is a technique used in recommendation systems that makes predictions based on the preferences and behaviors of multiple users. It relies on past user interactions, such as ratings or purchases, to identify patterns and similarities between users or items.
  
  - **How it works**:
    1. **Data Collection**: Collect user-item interaction data (such as ratings, views, or clicks).
    2. **Similarity Calculation**: Calculate similarity between users (user-based) or items (item-based) based on past interactions.
    3. **Recommendation**: For a given user, recommend items that are similar to those they have liked or interacted with, based on the preferences of similar users or items.

Collaborative filtering can be **memory-based**, which directly uses user-item interaction data for recommendations, or **model-based**, which uses machine learning techniques to predict ratings and identify patterns in the data.