NAME: **SULEGAMA JHANSI**

COURSE: **DATA SCIENCE (3PM TO 5PM)**

INCHARGE: **PANAM SRAVANI**


#Recommendation System

In [69]:
import pandas as pd

# Load the dataset
anime_df = pd.read_csv("anime.csv")  # Replace with the correct file path

# Display basic info
print(anime_df.info())

# View sample data
print(anime_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB
None
   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Advent

In [70]:
# Drop rows with missing titles or genres
anime_df = anime_df.dropna(subset=['name', 'genre'])

# Fill missing episodes with a default value (e.g., 0)
anime_df['episodes'] = anime_df['episodes'].replace('Unknown', 0).astype(int)

# Fill missing ratings with the mean
anime_df['rating'] = anime_df['rating'].fillna(anime_df['rating'].mean())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  anime_df['episodes'] = anime_df['episodes'].replace('Unknown', 0).astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  anime_df['rating'] = anime_df['rating'].fillna(anime_df['rating'].mean())


In [71]:
# Split genre into multiple binary features
anime_genres = anime_df['genre'].str.get_dummies(sep=', ')

# Combine with main dataframe
anime_features = pd.concat([anime_df[['name', 'type', 'episodes', 'rating', 'members']], anime_genres], axis=1)

# Optional: normalize numerical features
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
anime_features[['episodes', 'rating', 'members']] = scaler.fit_transform(anime_features[['episodes', 'rating', 'members']])



In [73]:
from sklearn.metrics.pairwise import cosine_similarity # Import the cosine_similarity function

In [74]:
# Assuming 'anime_features' is your DataFrame

# Select only the numerical features for cosine similarity calculation
numerical_features = anime_features.select_dtypes(include=['number'])

# Now calculate cosine similarity
cosine_sim = cosine_similarity(numerical_features)

# Index mapping for anime names (adjust if necessary)
anime_df = anime_df.reset_index(drop=True)
indices = pd.Series(anime_df.index, index=anime_df['name']).drop_duplicates()

In [76]:
# Step 4: Recommendation Function
def recommend_anime(title, top_n=10, similarity_threshold=0.5):
    if title not in indices:
        return f"Anime '{title}' not found in the dataset."

    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Filter by threshold and remove itself
    sim_scores = [score for score in sim_scores if score[0] != idx and score[1] >= similarity_threshold]
    top_similar = sim_scores[:top_n]

    anime_indices = [i[0] for i in top_similar]
    return anime_df[['name', 'genre', 'rating']].iloc[anime_indices]

# Example usage
title_input = "Naruto"  # Change this to test other anime
recommendations = recommend_anime(title_input)
print(f"\nTop recommendations similar to '{title_input}':\n")
print(recommendations)


Top recommendations similar to 'Naruto':

                                                   name  \
615                                  Naruto: Shippuuden   
1472        Naruto: Shippuuden Movie 4 - The Lost Tower   
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...   
486                            Boruto: Naruto the Movie   
1343                                        Naruto x UT   
2996  Naruto Soyokazeden Movie: Naruto to Mashin to ...   
1103  Boruto: Naruto the Movie - Naruto ga Hokage ni...   
2458               Naruto Shippuuden: Sunny Side Battle   
175                              Katekyo Hitman Reborn!   
7617                            Kyutai Panic Adventure!   

                                                  genre  rating  
615   Action, Comedy, Martial Arts, Shounen, Super P...    7.94  
1472  Action, Comedy, Martial Arts, Shounen, Super P...    7.53  
1573  Action, Comedy, Martial Arts, Shounen, Super P...    7.50  
486   Action, Comedy, Martial Arts, Shounen

**1. Can you explain the difference between user-based and item-based collaborative filtering?**

User-based collaborative filtering:

Recommends items based on what similar users liked.

For example, if User A and User B liked the same movies, and User A liked another movie, then User B might also like it.

Item-based collaborative filtering:

Recommends items based on similar items.

For example, if Movie X and Movie Y are liked by the same users, and a user liked Movie X, they might also like Movie Y.




**2. What is collaborative filtering, and how does it work?**

Collaborative filtering is a recommendation technique that suggests items to a user based on preferences of other users.

How it works:

It looks at user behavior (ratings, purchases, likes).

Finds users/items with similar patterns.

Predicts what a user might like based on that similarity.

