# Anime Recommender - Content-Based Filtering

This project aims to build an anime recommender using content-based filtering, a method that recommends items based on the similarity in attributes of the items themselves rather than on the preferences of users.

The process of content-based filtering involves the following steps:

1. **Feature Extraction:** Extracting relevant features or attributes from the anime data, such as genre, theme, plot summary, and user ratings.

2. **Representing Features Mathematically**: We need to represent features such as text in a quantifiable manner for calculating similarity.

3. **Similarity Calculation:** Calculating the similarity between the users prefered anime and other anime in the dataset based on their features. This is typically done using similarity measures such as cosine similarity or Euclidean distance.

3. **Recommendation Generation:** Generating recommendations by selecting anime that are most similar to the user profile. These recommendations are based on the assumption that if a user liked certain anime in the past, they are likely to enjoy anime that are similar in terms of features.

Unlike collaborative filtering, content-based filtering does not rely on user interactions or similarities between users' preferences; instead, it focuses solely on the attributes of the items themselves. Hence suitable for use when there is a lack of user item interactions like rating.


## Import required libraries

In [26]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## Import cleaned dataset

In [12]:

anime_data = pd.read_csv("datasets/anime_2020_clean.csv")
anime_uid_list = anime_data.uid.unique()


print("anime_data")
print(anime_data.columns)


anime_data
Index(['uid', 'title', 'synopsis', 'genre', 'aired', 'episodes', 'members',
       'popularity', 'score'],
      dtype='object')



## 1. Feature Abstraction

For content-based filtering, we'll use **'synopsis'** and **'genre'** as features.  

Before proceeding, let's clean the **'name'** column by removing special characters.


In [14]:

anime_features = anime_data[['uid', 'title', 'synopsis', 'genre']]

def text_cleaning(text):
    text = re.sub(r'&quot;', '', text)
    text = re.sub(r'.hack//', '', text)
    text = re.sub(r'&#039;', '', text)
    text = re.sub(r'A&#039;s', '', text)
    text = re.sub(r'I&#039;', 'I\'', text)
    text = re.sub(r'&amp;', 'and', text)
    return text

anime_data['synopsis'] = anime_data['synopsis'].apply(text_cleaning)
print("synopsis column cleaned.")
print()



print("Selected Features:")
print(anime_features.head())

# We got this cleaning function from: https://www.kaggle.com/indralin/try-content-based-and-collaborative-filtering


Selected Features:
   uid                            title  \
0    1                     Cowboy Bebop   
1    5  Cowboy Bebop: Tengoku no Tobira   
2    6                           Trigun   
3    7               Witch Hunter Robin   
4    8                   Bouken Ou Beet   

                                            synopsis  \
0  In the year 2071, humanity has colonized sever...   
1  Another day, another bounty—such is the life o...   
2  Vash the Stampede is the man with a $$60,000,0...   
3  Witches are individuals with special powers li...   
4  It is the dark century and the people are suff...   

                                               genre  
0  ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc...  
1  ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space']  
2  ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...  
3  ['Action', 'Magic', 'Police', 'Supernatural', ...  
4  ['Adventure', 'Fantasy', 'Shounen', 'Supernatu...  



## 2. Representing features mathematically

We will be using **Term Frequency (TF) and Inverse Document Frequency (IDF)**.

TF-IDF (Term Frequency-Inverse Document Frequency) is a technique to quantify the importance of terms in a document.

We'll use it to convert text-based features ('synopsis' + 'genre') into numerical vectors.

In [16]:

tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(anime_features['synopsis'] + ' ' + anime_features['genre'])
print("TF-IDF Matrix Shape:", tfidf_matrix.shape)
print()

TF-IDF Matrix Shape: (8094, 34325)




## 3. Calculating similarity (model fitting)

We'll calculate the cosine similarity between anime titles based on their TF-IDF vectors.

In [20]:

# Step 3: Similarity Calculation
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
print("Cosine Similarity Matrix Shape:", cosine_sim.shape)


Cosine Similarity Matrix Shape: (8094, 8094)



## 4. Generating recommendations

With the model we fitted, we can now use it to generate recommendations.

In [21]:
def get_recommendations(title, cosine_sim=cosine_sim):
    
    # Get the index of the anime title
    idx = anime_features[anime_features['title'] == title].index[0]
    
    # Get similarity scores of other animes
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    # Sort the animes based on similarity
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top 10 most similar anime titles
    sim_scores = sim_scores[1:11]
    
    # Get the anime indices
    anime_indices = [i[0] for i in sim_scores]
    
    # Return the top 10 most similar anime titles
    return anime_features['title'].iloc[anime_indices]

In [22]:
recommended_anime = get_recommendations('Cowboy Bebop')
print(recommended_anime)
# anime_features['title']


1                       Cowboy Bebop: Tengoku no Tobira
2850                               Ginga Senpuu Braiger
355                           Seihou Bukyou Outlaw Star
2774                    Cowboy Bebop: Yose Atsume Blues
1207                         Sol Bianca: Taiyou no Fune
1087                      Odin: Koushi Hansen Starlight
90                                    Uchuu no Stellvia
2121      Ginga Tetsudou Monogatari: Eien e no Bunkiten
3289                              Uchuu Kuubo Blue Noah
7387    Ginga Eiyuu Densetsu: Die Neue These - Seiran 2
Name: title, dtype: object



# Interacting with the recommendation system


In [25]:
def search_anime(keyword, anime_data):
    matches = anime_data[anime_data['title'].str.contains(keyword, case=False)]
    
    if len(matches) == 0:
        print("No matching anime found.")
        return None
    
    print("Matching Anime Titles:\n")
    for i, title in enumerate(matches['title'].head(15), start=1):
        print(f"{i}. {title}")
    
    # Ask user to choose an anime
    while True:
        choice = input("Enter the number corresponding to the anime you want to choose (or 'exit' to quit): ")
        
        if choice.lower() == 'exit':
            return None
        
        try:
            choice_idx = int(choice) - 1
            if choice_idx < 0 or choice_idx >= len(matches):
                raise ValueError
            selected_anime = matches.iloc[choice_idx]['title']
            return selected_anime
        except ValueError:
            print("Invalid input. Please enter a valid number.")


keyword = input("Enter keywords or partial title of the anime you're looking for: ")
selected_anime = search_anime(keyword, anime_data)

if selected_anime:
    print(f"Selected Anime: {selected_anime}")
    recommended_anime = get_recommendations(selected_anime)
    print("\nRecommended Anime:")
    print(recommended_anime)


Enter keywords or partial title of the anime you're looking for: hunter
Matching Anime Titles:

1. Witch Hunter Robin
2. Hunter x Hunter
3. Hunter x Hunter: Yorkshin City Kanketsu-hen
4. Hunter x Hunter: Greed Island
5. Hunter x Hunter: Greed Island Final
6. Bakuretsu Hunters
7. Vampire Hunter D (2000)
8. Vampire Hunter D
9. Bio Hunter
10. Mamono Hunter Youko
11. City Hunter
12. City Hunter 2
13. City Hunter 3
14. City Hunter: Ai to Shukumei no Magnum
15. City Hunter: Goodbye My Sweetheart
Enter the number corresponding to the anime you want to choose (or 'exit' to quit): 11
Selected Anime: City Hunter

Recommended Anime:
1226                                        City Hunter 2
1227                                        City Hunter 3
1229                   City Hunter: Goodbye My Sweetheart
248                                           Angel Heart
1230    City Hunter: Kinkyuu Namachuukei!? Kyouakuhan ...
7646             City Hunter Movie: Shinjuku Private Eyes
322                   