##Recommendation System

###A recommendation system, also known as a recommender system, is a type of algorithm that provides personalized suggestions to users based on their preferences, behavior, or other data. These systems are widely used in various applications, such as e-commerce, streaming services, and social media.

In [210]:
!pip install scikit-learn



In [211]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.metrics.pairwise import cosine_similarity

In [212]:
# Load the dataset
anime = pd.read_csv('/content/anime.csv')
anime


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [213]:
#Dropping the column
anime.drop(['episodes','members'], axis=1, inplace=True)

In [214]:
anime.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
rating,230


In [215]:
anime.dropna(inplace=True)

In [216]:
anime.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,0
type,0
rating,0


In [217]:
len(anime.anime_id.unique())

12017

In [218]:
anime.anime_id.unique()

array([32281,  5114, 28977, ...,  5621,  6133, 26081])

In [219]:
anime['anime_id'].value_counts()

Unnamed: 0_level_0,count
anime_id,Unnamed: 1_level_1
32281,1
7028,1
29995,1
7617,1
16602,1
...,...
1123,1
7229,1
3853,1
6855,1


In [220]:
len(anime.genre.unique())

3229

In [221]:
anime.genre.unique()

array(['Drama, Romance, School, Supernatural',
       'Action, Adventure, Drama, Fantasy, Magic, Military, Shounen',
       'Action, Comedy, Historical, Parody, Samurai, Sci-Fi, Shounen',
       ..., 'Action, Comedy, Hentai, Romance, Supernatural',
       'Hentai, Sports', 'Hentai, Slice of Life'], dtype=object)

In [222]:
anime['genre'].value_counts()

Unnamed: 0_level_0,count
genre,Unnamed: 1_level_1
Hentai,816
Comedy,521
Music,297
Kids,197
"Comedy, Slice of Life",174
...,...
"Adventure, Comedy, Horror, Shounen, Supernatural",1
"Comedy, Harem, Romance, School, Seinen, Slice of Life",1
"Comedy, Ecchi, Sci-Fi, Shounen",1
"Adventure, Shounen, Sports",1


In [223]:
#Number of peoples rated for each [Example: 141 members rated 6]
anime['rating'].value_counts()

Unnamed: 0_level_0,count
rating,Unnamed: 1_level_1
6.00,141
7.00,98
6.50,90
6.25,84
5.00,76
...,...
3.47,1
3.71,1
3.87,1
3.91,1


In [224]:
#Most popularly watched genre having the highest rating of 6
anime[anime['rating']==7]['genre'].value_counts()

Unnamed: 0_level_0,count
genre,Unnamed: 1_level_1
"Drama, Kids",9
Comedy,5
"Adventure, Fantasy",4
Kids,3
Drama,3
...,...
"Romance, Shoujo, Slice of Life",1
"Action, Mecha, School, Sci-Fi, Seinen, Space",1
"Comedy, Parody, Sci-Fi",1
"Drama, Fantasy, Historical, Kids",1


In [225]:
#Average rating for all the genres
anime.groupby('genre')['rating'].mean()

Unnamed: 0_level_0,rating
genre,Unnamed: 1_level_1
Action,5.815472
"Action, Adventure",6.187333
"Action, Adventure, Cars, Comedy, Sci-Fi, Shounen",6.865000
"Action, Adventure, Cars, Mecha, Sci-Fi, Shounen, Sports",6.460000
"Action, Adventure, Cars, Sci-Fi",6.860000
...,...
"Super Power, Supernatural, Vampire",4.760000
Supernatural,5.563571
Thriller,6.510000
Vampire,4.240000


##Here the most of rated genre is Action, Adventure, Cars, Comedy, Sci-Fi, Shounen.

In [226]:
#Here listing all the genres in column and all the anime_id in rows
anime_pivot = anime.pivot_table(index='anime_id', columns='genre', values='rating')
anime_pivot

genre,Action,"Action, Adventure","Action, Adventure, Cars, Comedy, Sci-Fi, Shounen","Action, Adventure, Cars, Mecha, Sci-Fi, Shounen, Sports","Action, Adventure, Cars, Sci-Fi","Action, Adventure, Comedy","Action, Adventure, Comedy, Demons, Drama, Ecchi, Horror, Mystery, Romance, Sci-Fi","Action, Adventure, Comedy, Demons, Fantasy, Magic","Action, Adventure, Comedy, Demons, Fantasy, Magic, Romance, Shounen, Supernatural","Action, Adventure, Comedy, Demons, Fantasy, Martial Arts, Shounen, Super Power",...,Slice of Life,"Slice of Life, Space","Slice of Life, Supernatural",Space,Sports,"Super Power, Supernatural, Vampire",Supernatural,Thriller,Vampire,Yaoi
anime_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34476,,,,,,,,,,,...,,,,,,,,,,
34490,,,,,,,,,,,...,,,,,,,,,,
34503,,,,,,,,,,,...,,,,,,,,,,
34514,,,,,,,,,,,...,,,,,,,,,,


In [227]:
#Imputing the null values nan with 0 values
anime_pivot.fillna(0, inplace=True)
anime_pivot

genre,Action,"Action, Adventure","Action, Adventure, Cars, Comedy, Sci-Fi, Shounen","Action, Adventure, Cars, Mecha, Sci-Fi, Shounen, Sports","Action, Adventure, Cars, Sci-Fi","Action, Adventure, Comedy","Action, Adventure, Comedy, Demons, Drama, Ecchi, Horror, Mystery, Romance, Sci-Fi","Action, Adventure, Comedy, Demons, Fantasy, Magic","Action, Adventure, Comedy, Demons, Fantasy, Magic, Romance, Shounen, Supernatural","Action, Adventure, Comedy, Demons, Fantasy, Martial Arts, Shounen, Super Power",...,Slice of Life,"Slice of Life, Space","Slice of Life, Supernatural",Space,Sports,"Super Power, Supernatural, Vampire",Supernatural,Thriller,Vampire,Yaoi
anime_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
34490,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
34503,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
34514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#cosine

In [228]:
from scipy.spatial import distance
from sklearn.metrics import pairwise_distances
anime_similarity = 1- pairwise_distances(anime_pivot.values, metric='cosine')
anime_similarity

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 1.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 1., 0., 1.]])

In [229]:
#storing the results in dataframe
anime_similarity_df = pd.DataFrame(anime_similarity)
anime_similarity_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,12007,12008,12009,12010,12011,12012,12013,12014,12015,12016
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
12013,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0
12014,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0
12015,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [230]:
#setting the index and columns with anime_id
anime_similarity_df.index = sorted(anime['anime_id'].unique())
anime_similarity_df.columns = sorted(anime['anime_id'].unique())
anime_similarity_df

Unnamed: 0,1,5,6,7,8,15,16,17,18,19,...,34412,34447,34453,34464,34475,34476,34490,34503,34514,34519
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
34490,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0
34503,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0
34514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [231]:
#filling the diagonal with 0 for better understanding
np.fill_diagonal(anime_similarity, 0)
anime_similarity_df

Unnamed: 0,1,5,6,7,8,15,16,17,18,19,...,34412,34447,34453,34464,34475,34476,34490,34503,34514,34519
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
34490,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
34503,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
34514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [232]:
#Most Similar pair of anime_id
anime_similarity_df.idxmax(axis=1)[0:50]

Unnamed: 0,0
1,4037
5,1
6,2097
7,1
8,613
15,995
16,1
17,30652
18,185
19,1


In [233]:
anime_similarity_df.iloc[1,4037]

0.0

In [234]:
anime_similarity_df.iloc[66,253]

0.0

In [235]:
#Both of them have same genres with similar rating
anime[(anime['anime_id']==1)| (anime['anime_id']==4037)]

Unnamed: 0,anime_id,name,genre,type,rating
22,1,Cowboy Bebop,"Action, Adventure, Comedy, Drama, Sci-Fi, Space",TV,8.82
1465,4037,Cowboy Bebop: Yose Atsume Blues,"Action, Adventure, Comedy, Drama, Sci-Fi, Space",Special,7.53


In [236]:
anime[(anime['anime_id']==66)| (anime['anime_id']==253)]

Unnamed: 0,anime_id,name,genre,type,rating
452,66,Azumanga Daioh,"Comedy, School, Slice of Life",TV,8.06
558,253,Jungle wa Itsumo Hare nochi Guu,"Comedy, School, Slice of Life",TV,7.97


In [237]:
# Function to recommend similar anime based on cosine similarity
def recommend_similar_anime(anime_id, similarity_df, threshold=0.5):
    # Get the similarity scores for the given anime_id
    sim_scores = similarity_df.loc[anime_id]

    # Filter out anime with similarity scores below the threshold
    similar_anime = sim_scores[sim_scores > threshold].sort_values(ascending=False)

    # Return the list of similar anime_ids
    return similar_anime.index.tolist()

# Example: Recommending similar anime for a given anime_id (e.g., 32281) with a threshold of 0.5
similar_anime_list = recommend_similar_anime(32739, anime_similarity_df, threshold=0.5)
print("Recommended Anime IDs:", similar_anime_list)

Recommended Anime IDs: [2265, 2734, 32385, 31467, 25543, 24053, 23325, 21107, 21077, 18227, 16620, 15879, 15847, 10763, 5051, 4208, 4073, 32927]


In [238]:
def recommend_anime(anime_id, anime_similarity_df, anime_ratings, top_n=10):
    # user_rated_anime should be a Series of anime_ids that the user has rated
    user_rated_anime = anime[anime['anime_id'] == anime_id]['anime_id'] # Fixed: Filter by anime_id & select anime_id column

    if user_rated_anime.empty:
        # Handle the case where the user hasn't rated any anime
        return []  # Or some other default behavior

    sim_scores = anime_similarity_df.loc[user_rated_anime].mean(axis=0)  # Mean similarity across all rated anime

    # Remove already rated anime from recommendations
    sim_scores = sim_scores.drop(user_rated_anime.tolist(), errors='ignore') # Fixed: Convert to list for drop

    # Get top_n recommendations
    recommended_anime = sim_scores.nlargest(top_n)
    return recommended_anime.index.tolist()

# Example: Get recommendations for a specific anime_id
anime_id = 1  # Replace with actual anime_id from your dataset
recommendations = recommend_anime(anime_id, anime_similarity_df, train_df, top_n=10)
print("Recommended Anime IDs:", recommendations)

Recommended Anime IDs: [4037, 5, 6, 7, 8, 15, 16, 17, 18, 19]


# Evaluation function


In [239]:
from sklearn.model_selection import train_test_split

# Ensure you're splitting the entire dataset
train_df, test_df = train_test_split(anime, test_size=0.2, random_state=42)

# Check if test_df still contains user_id and anime_id
print(test_df.head())


      anime_id                              name  \
6211       556       Koutetsu Tenshi Kurumi Zero   
4674     21599  Fight Ippatsu! Juuden-chan!! OVA   
8530     26209            Examurai Sengoku Recap   
6499      1037    Saint Beast: Seijuu Kourin-hen   
429       1089    Macross: Do You Remember Love?   

                                                  genre     type  rating  
6211                             Drama, Romance, Sci-Fi      OVA    6.21  
4674                              Comedy, Ecchi, Sci-Fi      OVA    6.66  
8530                                    Action, Samurai  Special    5.00  
6499               Action, Fantasy, Magic, Supernatural       TV    6.09  
429   Action, Mecha, Military, Music, Romance, Sci-F...    Movie    8.09  


In [247]:
def recommend_top_n(genre, N=10):
    # Filter anime belonging to the given genre
    genre_anime = anime[anime['genre'].str.contains(genre)]

    # Sort by rating and get top N anime_ids
    top_anime_ids = genre_anime.sort_values(by=['rating'], ascending=False)['anime_id'].head(N).tolist()

    return top_anime_ids

# Ensure both dictionaries use the same set of genres
all_genres = set(test_df["genre"].unique())

# Get top_n_pred for all genres
top_n_pred = {genre: recommend_top_n(genre, N=10) for genre in all_genres}

# Get top_n_true for all genres (modified to handle missing genres)
top_n_true = {}
for genre in all_genres:
    genre_anime = test_df[test_df['genre'].str.contains(genre, na=False)]
    top_n_true[genre] = genre_anime.sort_values(by=['rating'], ascending=False)['anime_id'].head(10).tolist()


y_true_binary = []
y_pred_binary = []

for genre in all_genres:  # Use all_genres for consistency
    true_items = set(top_n_true.get(genre, [])) # handle missing genres in top_n_true
    pred_items = set(top_n_pred.get(genre, []))  # handle missing genres in top_n_pred

    # Pad shorter list to have 10 items to maintain consistency
    true_items = list(true_items) + [0] * (10 - len(true_items))
    pred_items = list(pred_items) + [0] * (10 - len(pred_items))

    # Convert to binary lists
    y_true_binary.extend([1 if item in true_items else 0 for item in true_items]) #Fixed: Compare true_items
    y_pred_binary.extend([1 if item in true_items else 0 for item in pred_items]) #Fixed: Compare true_items

In [248]:
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_true_binary, y_pred_binary)
recall = recall_score(y_true_binary, y_pred_binary)
f1 = f1_score(y_true_binary, y_pred_binary)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")


Precision: 1.0000
Recall: 0.6328
F1-score: 0.7751


##Analysis of Performance
* Precision (1.0000):

Interpretation: The recommendation system has a precision of 1.0000, which means that every recommended anime is relevant. There are no false positives in the recommendations.

Pros: High precision ensures that users will trust the recommendations since they are consistently relevant.

* Recall (0.6328):

Interpretation: The recall of 0.6328 indicates that the system is able to identify 63.28% of all relevant anime from the test set. This means that some relevant anime are not being recommended.

Cons: Lower recall suggests that the system misses out on recommending some relevant items that users might have liked.

* F1-Score (0.7751):

###Interpretation: The F1-score is the harmonic mean of precision and recall, providing a balance between the two metrics. The F1-score of 0.7751 reflects a reasonably good overall performance but highlights room for improvement in balancing precision and recall.

##Areas of Improvement
###Improving Recall:

* Diversify Recommendations: To improve recall, consider incorporating a more diverse set of recommendations. This might include items with lower but still significant similarity scores.

* Hybrid Approaches: Combine collaborative filtering with content-based filtering to capture more relevant items that might be missed by one approach alone.

* Adjust Similarity Thresholds: Experiment with different thresholds for cosine similarity to include more potential recommendations.

###Balancing Precision and Recall:

* Tuning Parameters: Fine-tune parameters in your recommendation algorithm to find the optimal balance between precision and recall.

* Use Weighted Similarity Scores: Apply weights to different features (e.g., genre, type, rating) based on their importance to improve the overall relevance of recommendations.

###User Feedback:

* Incorporate User Feedback: Use explicit feedback (ratings, likes) and implicit feedback (clicks, views) to continuously improve the recommendation system.

* Active Learning: Implement an active learning approach where the system queries users about specific recommendations to refine the model.

###Explore Additional Features:

* Temporal Data: Consider the time dimension by incorporating recent user activity to reflect changing preferences.

* Popularity: Integrate popularity trends or social signals (e.g., recommendations from friends or other users with similar tastes).


#Interview Questions

## Difference Between User-Based and Item-Based Collaborative Filtering
###User-Based Collaborative Filtering:

* Concept: Recommends items by finding users who have similar tastes to the target user and recommending items that those similar users have liked.

* Method: Compares users based on their ratings or interactions with items to find similarities (often using metrics like cosine similarity, Pearson correlation, etc.).

* Example: If User A and User B have rated many items similarly, items that User B has liked but User A has not yet seen can be recommended to User A.

###Item-Based Collaborative Filtering:

* Concept: Recommends items by finding items that are similar to those that the target user has already liked or interacted with.

* Method: Compares items based on user ratings or interactions to find similarities between items.

* Example: If Item X and Item Y are often liked by the same users, and the target user liked Item X, Item Y can be recommended to the target user.

##2. What is collaborative filtering, and how does it work?
* Definition
Collaborative filtering (CF) is a recommendation technique that suggests items based on past user interactions without requiring explicit content information (such as genre or item features). It works by identifying patterns in user-item interactions (like ratings, purchases, or clicks).

###Collaborative filtering is like word-of-mouth recommendations but powered by data. It works by analyzing what people like and using that information to make suggestions.

###Step-by-Step Explanation
Step-1 Collect User Preferences
* Imagine you and your friends are rating anime:
* You haven't watched One Piece yet (missing rating ?).
* Alice has watched One Piece and rated it 3.



### Find Similar Users or Items
* The system sees that you and Alice have similar taste (you both rated Attack on Titan and Death Note highly).

* Since Alice gave One Piece a 3, it might predict that you’ll rate One Piece around 3 too.

###Make Predictions and Recommendations
* Since Alice liked One Piece, and she has similar taste as you, the system recommends One Piece to you!

