<a href="https://colab.research.google.com/github/Dasrunaik/AssignMents-ExcelR/blob/main/Assignment_11_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split

In [11]:
anime_df=pd.read_csv('/content/anime.csv')

In [12]:
anime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [13]:
anime_df.describe()

Unnamed: 0,anime_id,rating,members
count,12294.0,12064.0,12294.0
mean,14058.221653,6.473902,18071.34
std,11455.294701,1.026746,54820.68
min,1.0,1.67,5.0
25%,3484.25,5.88,225.0
50%,10260.5,6.57,1550.0
75%,24794.5,7.18,9437.0
max,34527.0,10.0,1013917.0


In [15]:
anime_df.dropna(subset=['name', 'genre', 'rating'], inplace=True)
anime_df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [16]:
anime_df['episodes'] = anime_df['episodes'].fillna('0')

**Feature Extraction (Genre + Rating)**

In [17]:
# Combine genres and ratings into a single feature
anime_df['combined_features'] = anime_df['genre'] + ' ' + anime_df['rating'].astype(str)


In [19]:
# Vectorize using TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(anime_df['combined_features'])
tfidf_matrix

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 50010 stored elements and shape (12017, 137)>

In [21]:
# Compute cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cosine_sim

array([[1.        , 0.0911411 , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.0911411 , 1.        , 0.13153768, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.13153768, 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.24854074,
        0.26344736],
       [0.        , 0.        , 0.        , ..., 0.24854074, 1.        ,
        0.24826597],
       [0.        , 0.        , 0.        , ..., 0.26344736, 0.24826597,
        1.        ]])

**Recommendation Function**

In [28]:
# Reset index to map anime titles
anime_df = anime_df.reset_index(drop=True)
indices = pd.Series(anime_df.index, index=anime_df['name']).drop_duplicates()


In [51]:
def recommend_anime(title, top_n=10, threshold=0.2):
    if title not in indices:
        return "Anime not found in the dataset."
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    filtered_scores = [
        x for x in sim_scores if x[0] != idx and x[1] >= threshold
    ]

    sorted_scores = sorted(filtered_scores, key=lambda x: x[1], reverse=True)

    # Get top N similar anime titles
    top_anime = sorted_scores[:top_n]

    recommended_titles = [anime_df['name'].iloc[i[0]] for i in top_anime]

    return recommended_titles




In [52]:
print(recommend_anime("Naruto", top_n=5, threshold=0.1))


['Iron Virgin Jun', 'Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono', 'Dragon Ball Super', 'Ikkitousen: Extravaganza Epoch', 'Tenjou Tenge']


Difference between user-based and item-based collaborative filtering:

  1)User-based recommends items based on users who have similar preferences.

  2)Item-based recommends items similar to what a user has liked based on item similarity.



What is collaborative filtering?

     It's a method of making recommendations based on the preferences of many users.
     
     It assumes that if user A likes items 1 and 2, and user B likes item 1, B might also like item 2.

