<a href="https://colab.research.google.com/github/SreeramAdari/Data_Science_Assignmnents/blob/main/Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler


In [None]:
# Load anime dataset
anime_df = pd.read_csv("anime.csv")

# View structure
print(anime_df.head())
print(anime_df.info())


   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  

In [None]:
# Drop rows with missing values in critical columns
anime_df.dropna(subset=["name", "genre", "rating"], inplace=True)

# Fill missing episodes with 0
anime_df['episodes'] = anime_df['episodes'].replace('Unknown', 0).astype(int)

# Reset index
anime_df.reset_index(drop=True, inplace=True)


In [None]:
# Combine text features for TF-IDF
anime_df['combined_features'] = anime_df['genre'] + ' ' + anime_df['type']

# TF-IDF on combined features
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(anime_df['combined_features'])

# Normalize rating and episodes
scaler = MinMaxScaler()
anime_df[['rating', 'episodes']] = scaler.fit_transform(anime_df[['rating', 'episodes']])

# Concatenate TF-IDF + numeric features
from scipy.sparse import hstack
features_matrix = hstack([tfidf_matrix, anime_df[['rating', 'episodes']]])


In [None]:
# Compute cosine similarity matrix
cos_sim = cosine_similarity(features_matrix, features_matrix)


In [None]:
def recommend_anime(title, top_n=5, threshold=0.3):
    if title not in anime_df['name'].values:
        return "Anime not found in dataset."

    # Get index of anime
    idx = anime_df[anime_df['name'] == title].index[0]

    # Get similarity scores
    sim_scores = list(enumerate(cos_sim[idx]))

    # Sort by score
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Filter scores above threshold (excluding itself)
    sim_scores = [i for i in sim_scores if i[1] >= threshold and i[0] != idx]

    # Get top n
    top_anime = [anime_df.iloc[i[0]]['name'] for i in sim_scores[:top_n]]

    return top_anime


In [None]:
recommend_anime("Naruto", top_n=5, threshold=0.2)


['Naruto: Shippuuden',
 'Naruto x UT',
 'Rekka no Honoo',
 'Dragon Ball Z',
 'Boruto: Naruto the Movie']

In [None]:
# Split into train and test by random split (for simulating unseen data)
train, test = train_test_split(anime_df, test_size=0.2, random_state=42)

# Sample evaluation: percentage of test animes appearing in any recommendation list
hits = 0
for title in test['name'].sample(50):  # Test on 50 random titles
    recs = recommend_anime(title, top_n=5, threshold=0.2)
    if isinstance(recs, list) and len(recs) > 0:
        hits += 1

print(f"Precision-like hit rate: {hits}/50 = {hits/50:.2f}")


Precision-like hit rate: 50/50 = 1.00


A recommendation method based on user interaction with items (e.g., ratings).

It finds patterns between users and items to suggest new items.



User-based collaborative filtering focuses on finding users who are similar to the target user. The idea is that if two users have similar preferences or behaviors (like similar ratings on shows), then what one user likes, the other might like too. So, the system recommends items that similar users have enjoyed but the target user hasn’t seen yet.

For example, if you and I both liked "Naruto" and "Attack on Titan", and you also liked "One Punch Man" (which I haven’t watched), the system might recommend "One Punch Man" to me.

Item-based collaborative filtering, on the other hand, looks at the relationship between items (like anime shows) rather than users. It identifies items that are similar based on user behavior. So, if many users who watched "Naruto" also watched "Bleach", then the system will recommend "Bleach" to someone who watched "Naruto".

This approach is often faster and more scalable, especially for large platforms, because the number of items is usually smaller and more stable than the number of users.