In [19]:

from sklearn.preprocessing import MultiLabelBinarizer, LabelEncoder, MinMaxScaler
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.feature_extraction.text import TfidfVectorizer

import pandas as pd
df = pd.read_csv('anime.csv')
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [21]:
#replace with mean for numerical and mode for categorical)
df['rating'].fillna(df['rating'].mean(), inplace=True)
df['genre'].fillna(df['genre'].mode()[0], inplace=True)  # Fill with most frequent genre

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['rating'].fillna(df['rating'].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['genre'].fillna(df['genre'].mode()[0], inplace=True)  # Fill with most frequent genre


In [3]:
# Feature Engineering
def extract_features(df):
    # Combine relevant features for the TF-IDF vectorizer
    df['combined_features'] = df['genre'] + ' ' + df['type'].astype(str) + ' ' + df['episodes'].astype(str)
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(df['combined_features'])
    return tfidf_matrix


tfidf_matrix = extract_features(df)

# Cosine Similarity
def recommend_anime(target_anime, tfidf_matrix, df):
    # Find the index of the target anime
    target_index = df[df['name'] == target_anime].index[0]

    # Calculate cosine similarity between the target anime and all other animes
    similarity_scores = cosine_similarity(tfidf_matrix[target_index], tfidf_matrix)

    # Sort the animes based on similarity scores
    similar_anime_indices = similarity_scores.argsort()[0][::-1]

    # Recommend the top 10 animes (excluding the target anime itself)
    recommendations = []
    for i in similar_anime_indices[1:11]:
        recommendations.append({'name': df['name'].iloc[i], 'similarity': similarity_scores[0, i]})

    return recommendations

# Example Usage
anime_recommendations = recommend_anime('Death Note', tfidf_matrix, df)
print(anime_recommendations)


# Split data into training and testing sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)




In [23]:
# Evaluate the recommendation system
def evaluate_recommendations(recommendations, test_df, threshold):
    # Assuming recommendations is a list of dictionaries containing name and similarity
    # Get the names of the recommended animes above the given threshold
    recommended_names = [item['name'] for item in recommendations if item['similarity'] >= threshold]

    # Check for true positives (animes in both recommendations and test set)
    true_positives = set(recommended_names) & set(test_df['name'])
    precision = len(true_positives) / len(recommended_names) if len(recommended_names) > 0 else 0

    # Recall can be computed based on the number of relevant items in the test set
    # (This requires a definition of "relevant" items beyond the scope of this basic code)
    recall = len(true_positives) / len(set(test_df['name'])) if len(set(test_df['name'])) > 0 else 0

    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

    return precision, recall, f1



In [None]:
# Example Evaluation (threshold=0.2)
precision, recall, f1 = evaluate_recommendations(anime_recommendations, test_df, 0.2)
print(f'Precision: {precision}, Recall: {recall}, F1-score: {f1}')

1. Difference Between User-Based and Item-Based Collaborative Filtering

Collaborative filtering is a technique used in recommendation systems that relies on user interactions (ratings, views, purchases, etc.) to make predictions.
User-Based Collaborative Filtering (UBCF)

    Concept: Finds users with similar tastes/preferences and recommends items that similar users liked.
    How It Works:
        Identify users who have similar behavior (e.g., watch history, ratings).
        Recommend items that those similar users have interacted with but the target user hasn't seen yet.
    Example:
        User A and User B have watched and rated 5 similar anime highly.
        User A watches Attack on Titan, but User B hasn’t.
        Since their preferences match, Attack on Titan is recommended to User B.
    Pros: Works well when users have a consistent preference pattern.
    Cons: Struggles when users have few ratings (cold start problem).

Item-Based Collaborative Filtering (IBCF)

    Concept: Finds similar items and recommends them based on past user interactions.
    How It Works:
        Identify items that are often liked together.
        If a user likes one item, recommend similar items.
    Example:
        If many users who watched Naruto also watched Bleach, then Bleach is recommended to Naruto fans.
    Pros: Works better for large-scale datasets because item similarity is more stable than user similarity.
    Cons: Struggles with new items (cold start problem) since it relies on past user interactions.

2. What is Collaborative Filtering and How Does It Work?

Collaborative Filtering (CF) is a recommendation technique that predicts a user's interest based on their past interactions and similarities with other users or items. It works without requiring explicit attributes (e.g., genre, type) but rather learns from user behavior.
Steps in Collaborative Filtering:

    Collect User-Item Interaction Data (ratings, clicks, views, purchases).
    Calculate Similarities (between users or items) using metrics like cosine similarity, Pearson correlation, or Euclidean distance.
    Generate Recommendations:
        User-Based CF: Recommend items liked by similar users.
        Item-Based CF: Recommend items similar to those the user has interacted with.
    Present Recommendations (ranked by similarity scores).

Types of Collaborative Filtering:

    Memory-Based (traditional method using similarity metrics).
    Model-Based (uses ML algorithms like Matrix Factorization, SVD, or Deep Learning).

Pros & Cons:

✅ Pros:

    Works well for personalized recommendations.
    Requires no domain knowledge about items.
    ❌ Cons:
    Struggles with cold start problem (new users/items).
    Can be computationally expensive for large datasets.