<center>
    <h1 id='content-based-filtering' style='color:#7159c1; font-size:350%'>Content-Based Filtering</h1>
    <i style='font-size:125%'>Recommendations of Similar Items by Metadatas</i>
</center>

> **Topics**

```
- ✨ Create Sequential Texts
- ✨ Word Cloud
- ✨ Lower Case
- ✨ Remove Break Lines and Special Characters
- ✨ Calculate TF-IDF
- ✨ Calculate Cosine Similarity
- ✨ Create Search Function
```

In [1]:
# ---- Imports ----
import matplotlib.pyplot as plt             # pip install matplotlib
import mplcyberpunk                         # pip install mplcyberpunk
import numpy as np                          # pip install numpy
import pandas as pd                         # pip install pandas
import seaborn as sns                       # pip install seaborn
from sklearn.feature_extraction.text import TfidfVectorizer  # pip install sklearn
from sklearn.metrics.pairwise import linear_kernel           # pip install sklearn
import string                               # pip install string
from wordcloud import WordCloud             # pip install wordcloud

# ---- Constants ----
DATASETS_PATH = ('./datasets')
SEED = (20231227)

# ---- Settings ----
np.random.seed(SEED)
pd.set_option('display.max_columns', None)
sns.set_style('darkgrid')
plt.style.use('cyberpunk')

# ---- Functions ----
def get_recommendations(dataset, title, animes_indices, cosine_similarity, number_recommendations=10):
    """
    \ Description:
        - gets the index of the anime that matches the title;
        - gets the pairwise similarity scores of all animes with the chosen anime;
        - sort the animes based on the similarity socres on descending order;
        - gets the scores of the top 'number_recommendations' animes, excluding the chosen one;
        - gets the animes indices;
        - returns the animes indices.
    
    \ Parameters:
        - dataset: Pandas DataFrame;
        - title: string;
        - animes_indices: list of integers;
        - cosine_similarity: NumPy array of floats;
        - number_recommendation: integer.
    """
    index = animes_indices[title]
    
    similarity_scores = list(enumerate(cosine_similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda score: score[1], reverse=True)
    similarity_scores = similarity_scores[1:number_recommendations+1]
    
    recommended_animes_indices = [index[0] for index in similarity_scores]
    return dataset.title.iloc[recommended_animes_indices]

<h1 id='0-creating-sequential-texts' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>✨ | Creating Sequential Texts</h1>

In [11]:
# ---- Reading Dataset ----
animes_df = pd.read_csv(f'{DATASETS_PATH}/anime-transformed-dataset-2023.csv', index_col='id')[
    ['title', 'score', 'genres', 'type', 'producers', 'licensors', 'studios', 'source']
]
animes_df.head()

Unnamed: 0_level_0,title,score,genres,type,producers,licensors,studios,source
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,cowboy bebop,8.75,"sci-fi, award winning, action",tv,bandai visual,"bandai entertainment, funimation",sunrise,original
5,cowboy bebop tengoku no tobira,8.38,"sci-fi, action",movie,"sunrise, bandai visual",sony pictures entertainment,bones,original
6,trigun,8.22,"adventure, sci-fi, action",tv,victor entertainment,"geneon entertainment usa, funimation",madhouse,manga
7,witch hunter robin,7.25,"supernatural, drama, mystery, action",tv,"bandai visual, victor entertainment, tv tokyo ...","bandai entertainment, funimation",sunrise,original
8,bouken ou beet,6.94,"adventure, supernatural, fantasy",tv,"tv tokyo, dentsu",illumitoon entertainment,toei animation,manga


---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).