Recommendation System

Data Description:

Unique ID of each anime.
Anime title.
Anime broadcast type, such as TV, OVA, etc.
anime genre.
The number of episodes of each anime.
The average rating for each anime compared to the number of users who gave ratings.


Number of community members for each anime.
Objective:
The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset. 
Dataset:
Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.

Tasks:

Data Preprocessing:

Load the dataset into a suitable data structure (e.g., pandas DataFrame).
Handle missing values, if any.
Explore the dataset to understand its structure and attributes.

Feature Extraction:

Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
Convert categorical features into numerical representations if necessary.
Normalize numerical features if required.

Recommendation System:

Design a function to recommend anime based on cosine similarity.
Given a target anime, recommend a list of similar anime based on cosine similarity scores.
Experiment with different threshold values for similarity scores to adjust the recommendation list size.

Evaluation:

Split the dataset into training and testing sets.
Evaluate the recommendation system using appropriate metrics such as precision, recall, and F1-score.
Analyze the performance of the recommendation system and identify areas of improvement.



In [19]:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score



In [2]:


# Load the dataset
anime_data = pd.read_csv('anime.csv')

# Display the first few rows of the dataset and check its structure
print(anime_data.head())
print(anime_data.info())

   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          GintamaÂ°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 

In [3]:
# Handle missing values
anime_data = anime_data.dropna()

# Explore the dataset
print(anime_data.describe())
print(anime_data['genre'].value_counts())

           anime_id        rating       members
count  12017.000000  12017.000000  1.201700e+04
mean   13638.001165      6.478264  1.834888e+04
std    11231.076675      1.023857  5.537250e+04
min        1.000000      1.670000  1.200000e+01
25%     3391.000000      5.890000  2.250000e+02
50%     9959.000000      6.570000  1.552000e+03
75%    23729.000000      7.180000  9.588000e+03
max    34519.000000     10.000000  1.013917e+06
genre
Hentai                                                   816
Comedy                                                   521
Music                                                    297
Kids                                                     197
Comedy, Slice of Life                                    174
                                                        ... 
Adventure, Comedy, Horror, Shounen, Supernatural           1
Comedy, Harem, Romance, School, Seinen, Slice of Life      1
Comedy, Ecchi, Sci-Fi, Shounen                             1
Adventure, Sh

In [16]:


# Assuming 'anime_data' is a pandas DataFrame with 'genre' and 'rating' columns
anime_data = pd.DataFrame({
    'genre': ['Action, Adventure', 'Romance, Drama', 'Action, Fantasy'],
    'rating': [7.5, 8.3, 6.9]
})

# Convert genres to a list of genres
anime_data['genre'] = anime_data['genre'].apply(lambda x: x.split(', '))

# Convert genres to numerical representation
mlb = MultiLabelBinarizer()
genre_matrix = mlb.fit_transform(anime_data['genre'])

# Normalize numerical features if required
anime_data['rating'] = (anime_data['rating'] - anime_data['rating'].min()) / (anime_data['rating'].max() - anime_data['rating'].min())

# Display the results
print("Genre Matrix:")
print(pd.DataFrame(genre_matrix, columns=mlb.classes_))

print("\nNormalized Ratings:")
print(anime_data['rating'])


Genre Matrix:
   Action  Adventure  Drama  Fantasy  Romance
0       1          1      0        0        0
1       0          0      1        0        1
2       1          0      0        1        0

Normalized Ratings:
0    0.428571
1    1.000000
2    0.000000
Name: rating, dtype: float64


In [8]:
anime_data

Unnamed: 0,genre,rating
0,"[Action, Adventure]",0.428571
1,"[Romance, Drama]",1.0
2,"[Action, Fantasy]",0.0


In [18]:
from sklearn.preprocessing import MultiLabelBinarizer

# Check if genres are still strings before splitting
if isinstance(anime_data['genre'].iloc[0], str):
    # Convert genres to a list of genres
    anime_data['genre'] = anime_data['genre'].apply(lambda x: x.split(', '))

# Convert genres to numerical representation
mlb = MultiLabelBinarizer()
genre_matrix = mlb.fit_transform(anime_data['genre'])

# Normalize numerical features if required
anime_data['rating'] = (anime_data['rating'] - anime_data['rating'].min()) / (anime_data['rating'].max() - anime_data['rating'].min())

In [20]:
# Split the dataset into training and testing sets
train_data, test_data = train_test_split(anime_data, test_size=0.2, random_state=42)

# Implement evaluation logic here
# Example placeholder:
y_true = [1] * len(test_data)  # Placeholder for true labels
y_pred = [1] * len(test_data)  # Placeholder for predicted labels

print('Precision:', precision_score(y_true, y_pred, average='macro'))
print('Recall:', recall_score(y_true, y_pred, average='macro'))
print('F1 Score:', f1_score(y_true, y_pred, average='macro'))


Precision: 1.0
Recall: 1.0
F1 Score: 1.0
