<center>
    <h1 id='content-based-filtering' style='color:#7159c1; font-size:350%'>Content-Based Filtering</h1>
    <i style='font-size:125%'>Recommendations of Similar Items by Metadatas</i>
</center>

> **Topics**

```
- 📦 Create Sequential Texts
- 📦 Hands-on
- 📦 Recommendations
- 📦 Benchmarking
```

In [1]:
# ---- Imports ----
import numpy as np                                           # pip install numpy
import pandas as pd                                          # pip install pandas
from sklearn.feature_extraction.text import TfidfVectorizer  # pip install sklearn
from sklearn.metrics.pairwise import linear_kernel           # pip install sklearn

# ---- Constants ----
DATASETS_PATH = ('./datasets')
SEED = (20240420) # April 20, 2024 (fourth Bitcoin Halving)

# ---- Settings ----
np.random.seed(SEED)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

# ---- Functions ----
def generate_metadatas_sequential_text(dataset, features):
    """
    \ Description:
        - iters each dataset row and features parameter's elements;
        - if the value at row[feature] position is different than a single hyphen:
             - the value gets all spaces replaced by underscores;
             - the value gets all commas-spaces replaced by space;
             - sequential_text is incremented by the resultant value and by a space at the end;
             - sequential_text is stripped and appended into sequential_text_list;
             - at the end, sequential_text_list is returned.
    
    \ Paramters:
        - dataset: Pandas DataFrame;
        - features: list of strings.
    """
    sequential_text_list = []
    
    for index, row in dataset.iterrows():
        current_sequential_text = ''
        for feature in features:
            if row[feature] != '-':
                current_sequential_text += row[feature].replace(' ', '_').replace(',_', ' ')
                current_sequential_text += ' '
                     
        sequential_text_list.append(current_sequential_text.strip())
    
    return sequential_text_list

def get_recommendations(dataset, title, animes_indices, cosine_similarity, number_recommendations=10):
    """
    \ Description:
        - gets the index of the anime that matches the title;
        - gets the pairwise similarity scores of all animes with the chosen anime;
        - sort the animes based on the similarity socres on descending order;
        - gets the scores of the top 'number_recommendations' animes, excluding the chosen one;
        - gets the animes indices;
        - returns the recommended animes id, title, synopsis, score, genre and image url.
    
    \ Parameters:
        - dataset: Pandas DataFrame;
        - title: string;
        - animes_indices: list of integers;
        - cosine_similarity: NumPy array of floats;
        - number_recommendation: integer.
    """
    index = animes_indices[title]
    
    similarity_scores = list(enumerate(cosine_similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda score: score[1], reverse=True)
    similarity_scores = similarity_scores[1:number_recommendations+1]
    
    recommended_animes_indices = [index[0] for index in similarity_scores]
    recommended_animes_scores = [index[1] for index in similarity_scores]
    
    recommendations_df = dataset.iloc[recommended_animes_indices][
        ['id', 'title', 'synopsis', 'score', 'genres', 'image_url']
    ].set_index('id')
    recommendations_df['cosine_similarity'] = recommended_animes_scores
    
    return recommendations_df

<h1 id='0-creating-sequential-texts' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📦 | Creating Sequential Texts</h1>

In [2]:
# ---- Reading Dataset ----
animes_df = pd.read_csv(f'{DATASETS_PATH}/anime-transformed-dataset-2023.csv', index_col='id')[
    ['title', 'synopsis', 'score', 'genres', 'type', 'source', 'image_url']
]

# ---- Generating Sequential Text for Metadatas ----
metadata_features = ['genres', 'type', 'source']
animes_df['metadatas'] = generate_metadatas_sequential_text(animes_df, metadata_features)
animes_df.head()

Unnamed: 0_level_0,title,synopsis,score,genres,type,source,image_url,metadatas
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,cowboy bebop,"crime is timeless. by the year 2071, humanity has expanded across the galaxy, filling the surface of other planets with settlements like those on earth. these new societies are plagued by murder, drug use, and theft, and intergalactic outlaws are hunted by a growing number of tough bounty hunters.\n\nspike spiegel and jet black pursue criminals throughout space to make a humble living. beneath his goofy and aloof demeanor, spike is haunted by the weight of his violent past. meanwhile, jet manages his own troubled memories while taking care of spike and the bebop, their ship. the duo is joined by the beautiful con artist faye valentine, odd child edward wong hau pepelu tivrusky iv, and ein, a bioengineered welsh corgi.\n\nwhile developing bonds and working to catch a colorful cast of criminals, the bebop crew's lives are disrupted by a menace from spike's past. as a rival's maniacal plot continues to unravel, spike must choose between life with his newfound family or revenge for his old wounds.",8.75,"action, sci-fi, award winning",tv,original,https://cdn.myanimelist.net/images/anime/4/19644.jpg,action sci-fi award_winning tv original
5,cowboy bebop tengoku no tobira,"another day, another bounty—such is the life of the often unlucky crew of the bebop. however, this routine is interrupted when faye, who is chasing a fairly worthless target on mars, witnesses an oil tanker suddenly explode, causing mass hysteria. as casualties mount due to a strange disease spreading through the smoke from the blast, a whopping three hundred million woolong price is placed on the head of the supposed perpetrator.\n\nwith lives at stake and a solution to their money problems in sight, the bebop crew springs into action. spike, jet, faye, and edward, followed closely by ein, split up to pursue different leads across alba city. through their individual investigations, they discover a cover-up scheme involving a pharmaceutical company, revealing a plot that reaches much further than the ragtag team of bounty hunters could have realized.",8.38,"action, sci-fi",movie,original,https://cdn.myanimelist.net/images/anime/1439/93480.jpg,action sci-fi movie original
6,trigun,"vash the stampede is the man with a $$60,000,000,000 bounty on his head. the reason: he's a merciless villain who lays waste to all those that oppose him and flattens entire cities for fun, garnering him the title ""the humanoid typhoon."" he leaves a trail of death and destruction wherever he goes, and anyone can count themselves dead if they so much as make eye contact—or so the rumors say. in actuality, vash is a huge softie who claims to have never taken a life and avoids violence at all costs.\n\nwith his crazy doughnut obsession and buffoonish attitude in tow, vash traverses the wasteland of the planet gunsmoke, all the while followed by two insurance agents, meryl stryfe and milly thompson, who attempt to minimize his impact on the public. but soon, their misadventures evolve into life-or-death situations as a group of legendary assassins are summoned to bring about suffering to the trio. vash's agonizing past will be unraveled and his morality and principles pushed to the breaking point.",8.22,"adventure, action, sci-fi",tv,manga,https://cdn.myanimelist.net/images/anime/7/20310.jpg,adventure action sci-fi tv manga
7,witch hunter robin,"robin sena is a powerful craft user drafted into the stnj—a group of specialized hunters that fight deadly beings known as witches. though her fire power is great, she's got a lot to learn about her powers and working with her cool and aloof partner, amon. but the truth about the witches and herself will leave robin on an entirely new path that she never expected!\n\n(source: funimation)",7.25,"mystery, action, supernatural, drama",tv,original,https://cdn.myanimelist.net/images/anime/10/19969.jpg,mystery action supernatural drama tv original
8,bouken ou beet,"it is the dark century and the people are suffering under the rule of the devil, vandel, who is able to manipulate monsters. the vandel busters are a group of people who hunt these devils, and among them, the zenon squad is known to be the strongest busters on the continent. a young boy, beet, dreams of joining the zenon squad. however, one day, as a result of beet's fault, the zenon squad was defeated by the devil, beltose. the five dying busters sacrificed their life power into their five weapons, saiga. after giving their weapons to beet, they passed away. years have passed since then and the young vandel buster, beet, begins his adventure to carry out the zenon squad's will to put an end to the dark century.",6.94,"adventure, fantasy, supernatural",tv,manga,https://cdn.myanimelist.net/images/anime/7/21569.jpg,adventure fantasy supernatural tv manga


<h1 id='1-lower-casing-and-removing-all-break-lines-and-special-characters' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📦 | Hands-on</h1>

Steps:

1. lower case and remove all break lines and special characters;
2. calculate Term Frequency - Inverse Document Frequency (TF-IDF);
3. calculate Cosine Similarity;
4. create search function;
5. recommendations.

---

**- Lower Casing and Removing all Break Lines and Special Characters**

In [3]:
# ---- Lower Casing ----
animes_df.metadatas = animes_df.metadatas.apply(lambda metadata: metadata.lower())

# ---- Removing All Break Lines (\n) and Special Characters (\t \r \x0b \x0c) ----
animes_df.metadatas = animes_df.metadatas.apply(lambda metadata: ' '.join(metadata.split()))

---

**- Calculating Term Frequency - Inverse Document Frequency (TF-IDF)**

In [4]:
# ---- Calculating TF-IDF ----
tfidf_vectorizer = TfidfVectorizer(analyzer='word', norm='l2', stop_words='english')
tfidf_metadatas = tfidf_vectorizer.fit_transform(animes_df.metadatas)

print(f'- Number of Animes: {tfidf_metadatas.shape[0]}')
print(f'- Number of Words to Describe the Animes: {tfidf_metadatas.shape[1]}')

- Number of Animes: 23748
- Number of Words to Describe the Animes: 43


---

**- Calculating Cosine Similarity**

In [5]:
# ---- Calculating Cosine Similarity ----
cosine_similarity_metadatas = linear_kernel(tfidf_metadatas, tfidf_metadatas)
cosine_similarity_metadatas

array([[1.        , 0.60115168, 0.5593071 , ..., 0.32382172, 0.12808412,
        0.12808412],
       [0.60115168, 1.        , 0.6254103 , ..., 0.43171329, 0.17075944,
        0.17075944],
       [0.5593071 , 0.6254103 , 1.        , ..., 0.50989131, 0.        ,
        0.        ],
       ...,
       [0.32382172, 0.43171329, 0.50989131, ..., 1.        , 0.        ,
        0.        ],
       [0.12808412, 0.17075944, 0.        , ..., 0.        , 1.        ,
        1.        ],
       [0.12808412, 0.17075944, 0.        , ..., 0.        , 1.        ,
        1.        ]])

---

**- Creating Search Function**

In [6]:
# ---- Recommending Animes: Reseting Animes DataFrame Index ----
#
# - in order to the index follow a sequence from 0 to 'n', being 'n'
# the total number of animes.
#
animes_df.reset_index(inplace=True)

In [7]:
# ---- Recommending Animes ----
#
# - search animes titles that contains a given string in order to use it
# in the next cell to get recommendations.
#
animes_df.title.loc[animes_df.title.str.contains('brotherhood')]

3961                      fullmetal alchemist brotherhood
4578             fullmetal alchemist brotherhood specials
5174     fullmetal alchemist brotherhood - 4-koma theater
11624                        brotherhood final fantasy xv
Name: title, dtype: object

<h1 id='2-recommendations' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📦 | Recommendations</h1>

In [8]:
# ---- Recommending Animes ----
animes_indices = pd.Series(animes_df.index, index=animes_df.title)

get_recommendations(
    dataset=animes_df
    , title='fullmetal alchemist brotherhood'
    , animes_indices=animes_indices
    , cosine_similarity=cosine_similarity_metadatas
    , number_recommendations=10
)

Unnamed: 0_level_0,title,synopsis,score,genres,image_url,cosine_similarity
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
28249,arslan senki tv,"the year is 320. under the rule of the belligerent king andragoras iii, the kingdom of pars is at war with the neighboring empire, lusitania. though different from his father in many aspects, arslan, the young prince, sets out to prove his valor on the battlefield for the very first time. however, when the king is betrayed by one of his most trusted officials, the parsian army is decimated and the capital city of ecbatana is sieged. with the army in shambles and the lusitanians out for his head, arslan is forced to go on the run. with a respected general by his side, daryun, arslan soon sets off on a journey in search of allies that will help him take back his home.\n\nhowever, the enemies that the prince faces are far from limited to just those occupying his kingdom. armies of other kingdoms stand ready to conquer ecbatana. moreover, the mastermind behind lusitania's victory, an enigmatic man hiding behind a silver mask, poses a dangerous threat to arslan and his company as he possesses a secret that could jeopardize arslan's right to succession.\n\nwith the odds stacked against him, arslan must find the strength and courage to overcome these obstacles, and allies who will help him fight in the journey that will help prepare him for the day he becomes king.",7.66,"adventure, action, fantasy, drama",https://cdn.myanimelist.net/images/anime/6/73588.jpg,1.0
31821,arslan senki tv fuujin ranbu,"continuing on his quest to retake ecbatana, prince arslan and his company march toward the city. but upon receiving news that the neighboring kingdom of turan is launching an assault on the parsian stronghold at peshawar citadel, the prince is forced to turn back in order to defend the fortress. amid holding off the invading forces, the parsian army is met by an unexpected visitor.\n\nas arslan returns to peshawar, prince hermes takes a slight detour from his clash against his cousin to search for the legendary sword rukhnabad, which would grant him the right to rule and take back what he believes is rightfully his. however, after unearthing the lost artifact, the blade is stolen by the temple knights of lusitania, prompting the masked warrior to give chase. meanwhile in ecbatana, the captive king andragoras iii finds an opportunity to strike and begins to make his move.\n\nas the separate sides of the parsian royal conflict clash, arslan's right to the throne falls under attack. but no matter the obstacles in their way, the young prince and his loyal band of warriors charge forward to restore pars to its former glory.",7.5,"adventure, action, fantasy, drama",https://cdn.myanimelist.net/images/anime/12/80681.jpg,1.0
589,ginga nagareboshi gin,"having been born a brindle (""tora-ge"" or tiger striped) akita, gin (""silver"", named after his fur color) is destined to become a successful bear-hunting dog. however, when he witnesses his father's death at the hands of the man-eating demon bear akakabuto, he is chosen above his siblings to become his father's successor and defeat the monstrous bear that terrorizes his home village. as akakabuto gathers his own allies, gin must travel across japan in search of dogs to join him in an all out war of dog vs. bear.",8.02,"adventure, action, drama",https://cdn.myanimelist.net/images/anime/3/47461.jpg,0.921737
2243,karasu tengu kabuto,"500 years ago in the tensho era of japan, a man was born who defied the will of a demon; a man who had gods of good on his side; a man destined to battle evil....his name was kabuto. somehow, kuroyasya douki, the vile black night demon, escaped his prison in hell and returned to the earthly plane to wreak vengeance on the family-line of kabuto. none can escape his deadly magic and masterful skills with the blade; however, the gods of the north, west, east, and south band together to help kabuto stand for justice. with the questionable help of a diabolical talking sword that his own father forged, kabuto may live another day to see his own sons born....",6.72,"adventure, action, drama",https://cdn.myanimelist.net/images/anime/1/2445.jpg,0.921737
37521,vinland saga,"young thorfinn grew up listening to the stories of old sailors that had traveled the ocean and reached the place of legend, vinland. it's said to be warm and fertile, a place where there would be no need for fighting—not at all like the frozen village in iceland where he was born, and certainly not like his current life as a mercenary. war is his home now. though his father once told him, ""you have no enemies, nobody does. there is nobody who it's okay to hurt,"" as he grew, thorfinn knew that nothing was further from the truth.\n\nthe war between england and the danes grows worse with each passing year. death has become commonplace, and the viking mercenaries are loving every moment of it. allying with either side will cause a massive swing in the balance of power, and the vikings are happy to make names for themselves and take any spoils they earn along the way. among the chaos, thorfinn must take his revenge and kill askeladd, the man who murdered his father. the only paradise for the vikings, it seems, is the era of war and death that rages on.",8.74,"adventure, action, drama",https://cdn.myanimelist.net/images/anime/1500/103005.jpg,0.921737
49387,vinland saga season 2,"after his father's death and the destruction of his village at the hands of english raiders, einar wishes for a peaceful life with his family on their newly rebuilt farms. however, fate has other plans: his village is invaded once again. einar watches helplessly as the marauding danes burn his lands and slaughter his family. the invaders capture einar and take him back to denmark as a slave. \n\neinar clings to his mother's final words to survive. he is purchased by ketil, a kind slave owner and landlord who promises that einar can regain his freedom in return for working in the fields. soon, einar encounters his new partner in farm cultivation—thorfinn, a dejected and melancholic slave. as einar and thorfinn work together toward their freedom, they are haunted by both sins of the past and the ploys of the present. yet they carry on, grasping for a glimmer of hope, redemption, and peace in a world that is nothing but unjust and unforgiving.",8.81,"adventure, action, drama",https://cdn.myanimelist.net/images/anime/1170/124312.jpg,0.921737
678,shadow skill eigi,"in the land of kuruda, warriors with magical powers and incredible fighting skills battle for the ultimate prize: the title of sevaar, the strongest warrior in the land. elle ragu, nicknamed shadow skill, is the newest sevaar to emerge, but that doesn't make her life any easier. teaching her ""little brother,"" gau, how to be a warrior, fending off assassins from other kingdoms and thwarting enemy invasions is hard enough, but her biggest challenge will be paying off her drinking debts. \n\n(source: ann)",7.07,"adventure, fantasy, drama",https://cdn.myanimelist.net/images/anime/4/61189.jpg,0.916023
1325,hamelin no violin hiki,"trouble arises in staccato one day. demons show up and start to wreak havoc on the townspeople. the entire country seems to be suffering. the source of the problem is that the ""barrier"" which until now has kept the demons from crossing over to the human world, is weakening. the person who sustains it is losing her strength after holding it up for so many years. that person is queen horn of sforzando. hell king bass is trying to break through the barrier and release their ""supreme leader,"" kestra (orchestra). one whose power is unparalleled yet is trapped somewhere in the world of the humans.\n\n(source: anidb)",6.52,"adventure, fantasy, drama",https://cdn.myanimelist.net/images/anime/7/20972.jpg,0.916023
34547,shoukoku no altair,"tuğrul mahmut is a young pasha serving on the divan of the türkiye stratocracy. the clouds of war are gathering over his country due to the threat of an aggressive empire. with the divan split between warmongers and the pacifists, mahmut begins his quest to keep the peace at any cost. as he finds himself deeper and deeper in the politics of the ancient world, new enemies and allies surface. who will prevail? what will mahmut do if war proves to be inevitable?",7.55,"adventure, fantasy, drama",https://cdn.myanimelist.net/images/anime/3/86751.jpg,0.916023
39199,katsute kami datta kemono-tachi e,"with the initiation of the patrian civil war came the creation of half-beast, half-human soldiers—a development of the outnumbered northerners in a desperate attempt to counter the overwhelming southern forces. able to quickly dominate battlefields and achieve victory with ease, the soldiers' godlike abilities earned them the name ""incarnates."" however, as the war raged on, the incarnates encountered a problem involving the beasts inside them that they were unable to rectify by ordinary means. \n\nonce the war was over, mysteries and accounts of the incarnates submitting to the misfortune of their war days surfaced. aware of the horrors they faced during the war, special sergeant major and former captain of the incarnates hank henriette becomes a beast hunter—those who take the lives of incarnates who have succumbed to the issues they experienced on the battlefields. \n\nafter witnessing her father, a former incarnate soldier, meet his end at the hands of one such beast hunter, nancy schaal bancroft resolves to hunt the man who took her father's life. however, nancy's eye-opening encounter with the beast hunter influences her to instead seek the reason behind her father's death and the incarnates' problematic existence in society.",6.4,"action, fantasy, drama",https://cdn.myanimelist.net/images/anime/1109/101230.jpg,0.901985


<h1 id='3-benchmarking' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📦 | Benchmarking</h1>

In [1]:
# ---- Imports ----
import numpy as np                                           # pip install numpy
import pandas as pd                                          # pip install pandas
import psutil as psutil                                      # pip install psutil
import os                                                    # pip install os
from sklearn.feature_extraction.text import TfidfVectorizer  # pip install sklearn
from sklearn.metrics.pairwise import linear_kernel           # pip install sklearn
import threading                                             # pip install threading
import time                                                  # pip install time

# ---- Constants ----
DATASETS_PATH = ('./datasets')
NUMBER_OF_RECOMMENDATIONS = (10)
NUMBER_OF_ITERATIONS = (10)
SEED = (20240420) # April 20, 2024 (fourth Bitcoin Halving)

# ---- Settings ----
np.random.seed(SEED)

# ---- Functions ----
def generate_metadatas_sequential_text(dataset, features):
    """
    \ Description:
        - iters each dataset row and features parameter's elements;
        - if the value at row[feature] position is different than a single hyphen:
             - the value gets all spaces replaced by underscores;
             - the value gets all commas-spaces replaced by space;
             - sequential_text is incremented by the resultant value and by a space at the end;
             - sequential_text is stripped and appended into sequential_text_list;
             - at the end, sequential_text_list is returned.
    
    \ Paramters:
        - dataset: Pandas DataFrame;
        - features: list of strings.
    """
    sequential_text_list = []
    
    for index, row in dataset.iterrows():
        current_sequential_text = ''
        for feature in features:
            if row[feature] != '-':
                current_sequential_text += row[feature].replace(' ', '_').replace(',_', ' ')
                current_sequential_text += ' '
                     
        sequential_text_list.append(current_sequential_text.strip())
    
    return sequential_text_list

def get_recommendations(dataset, title, animes_indices, cosine_similarity, number_recommendations=10):
    """
    \ Description:
        - gets the index of the anime that matches the title;
        - gets the pairwise similarity scores of all animes with the chosen anime;
        - sort the animes based on the similarity socres on descending order;
        - gets the scores of the top 'number_recommendations' animes, excluding the chosen one;
        - gets the animes indices;
        - returns the recommended animes id, title, synopsis, score, genre and image url.
    
    \ Parameters:
        - dataset: Pandas DataFrame;
        - title: string;
        - animes_indices: list of integers;
        - cosine_similarity: NumPy array of floats;
        - number_recommendation: integer.
    """
    index = animes_indices[title]
    
    similarity_scores = list(enumerate(cosine_similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda score: score[1], reverse=True)
    similarity_scores = similarity_scores[1:number_recommendations+1]
    
    recommended_animes_indices = [index[0] for index in similarity_scores]
    recommended_animes_scores = [index[1] for index in similarity_scores]
    
    recommendations_df = dataset.iloc[recommended_animes_indices][
        ['id', 'title', 'synopsis', 'score', 'genres', 'image_url']
    ].set_index('id')
    recommendations_df['cosine_similarity'] = recommended_animes_scores
    
    return recommendations_df

def content_based_filtering(df, number_recommendations):
    temp_df = df.copy()
    
    # ---- Calculating TF-IDF ----
    tfidf_vectorizer = TfidfVectorizer(analyzer='word', norm='l2', stop_words='english')
    tfidf_metadatas = tfidf_vectorizer.fit_transform(temp_df.metadatas)
    
    # ---- Calculating Cosine Similarity ----
    cosine_similarity_metadatas = linear_kernel(tfidf_metadatas, tfidf_metadatas)
    
    # ---- Reseting Animes DataFrame Index ----
    #
    # - in order to the index follow a sequence from 0 to 'n', being 'n'
    # the total number of animes.
    #
    temp_df.reset_index(inplace=True)
    temp_indices = pd.Series(temp_df.index, index=temp_df.title)
    
    # ---- Recommending Animes ----
    get_recommendations(
        dataset=temp_df
        , title='fullmetal alchemist brotherhood'
        , animes_indices=temp_indices
        , cosine_similarity=cosine_similarity_metadatas
        , number_recommendations=number_recommendations
    )

In [2]:
# ---- Reading Dataset ----
animes_df = pd.read_csv(f'{DATASETS_PATH}/anime-transformed-dataset-2023.csv', index_col='id')[
    ['title', 'synopsis', 'score', 'genres', 'type', 'source', 'image_url']
]

# ---- Generating Sequential Text for Metadatas ----
metadata_features = ['genres', 'type', 'source']
animes_df['metadatas'] = generate_metadatas_sequential_text(animes_df, metadata_features)

# ---- Lower Casing ----
animes_df.metadatas = animes_df.metadatas.apply(lambda metadata: metadata.lower())

# ---- Removing All Break Lines (\n) and Special Characters (\t \r \x0b \x0c) ----
animes_df.metadatas = animes_df.metadatas.apply(lambda metadata: ' '.join(metadata.split()))

# ---- Benchmark Dataset ----
benchmark_df = pd.DataFrame(
    columns=[
        'iteration', 'algorithm', 'execution_time', 'avg_cpu_usage'
        , 'min_cpu_usage', 'max_cpu_usage', 'avg_ram_usage'
        , 'min_ram_usage', 'max_ram_usage'
    ]
)

In [3]:
# ---- Thread ----
global python_process

global iteration_cpu_usage
global cpu_usage
global min_cpu_usage
global max_cpu_usage

global iteration_ram_usage
global ram_usage
global min_ram_usage
global max_ram_usage

global execution_time

global running

def benchmark():
    global iteration_cpu_usage
    global iteration_ram_usage
    global running
    
    running = True
    
    while running:
        iteration_cpu_usage.append(python_process.cpu_percent(interval=0.1) / psutil.cpu_count())
        iteration_ram_usage.append(python_process.memory_percent(memtype='uss'))
        #iteration_ram_usage.append(python_process.memory_full_info().uss / 1024 / 1024) # in MB

def start_thread():
    global thread
    thread = threading.Thread(target=benchmark)
    thread.start()

def stop_thread():
    global thread
    global running
    
    running = False
    thread.join() # wait for thread's end

In [4]:
# ---- Benchmark ----
python_process = psutil.Process(os.getpid())

iteration_cpu_usage = []
cpu_usage = []
min_cpu_usage = []
max_cpu_usage = []

iteration_ram_usage = []
ram_usage = []
min_ram_usage = []
max_ram_usage = []

execution_time = []

running = False


for iteration in range(NUMBER_OF_ITERATIONS):
    # ---- Globals ----
    global iteration_cpu_usage
    global cpu_usage
    global min_cpu_usage
    global max_cpu_usage
    
    global iteration_ram_usage
    global ram_usage
    global min_ram_usage
    global max_ram_usage
    
    global execution_time
    
    # ---- Thread ----
    iteration_cpu_usage = []
    iteration_ram_usage = []

    start_time = time.perf_counter()
    start_thread()

    try: content_based_filtering(animes_df, NUMBER_OF_RECOMMENDATIONS)
    except Exception as exception: print(f'- An exception occurred: {exception}')
    finally: stop_thread()
    
    # ---- Computing Bechmarks ----
    print(f'- Calculations of iteration {iteration}')
    
    final_time = time.perf_counter()
    execution_time.append(final_time - start_time)
    
    cpu_usage.append(sum(iteration_cpu_usage) / len(iteration_cpu_usage))
    min_cpu_usage.append(min(iteration_cpu_usage))
    max_cpu_usage.append(max(iteration_cpu_usage))
    
    ram_usage.append(sum(iteration_ram_usage) / len(iteration_ram_usage))
    min_ram_usage.append(min(iteration_ram_usage))
    max_ram_usage.append(max(iteration_ram_usage))

- Calculations of iteration 0
- Calculations of iteration 1
- Calculations of iteration 2
- Calculations of iteration 3
- Calculations of iteration 4
- Calculations of iteration 5
- Calculations of iteration 6
- Calculations of iteration 7
- Calculations of iteration 8
- Calculations of iteration 9


In [5]:
# ---- Storaging Data ----
benchmark_df['iteration'] = [number for number in range(NUMBER_OF_ITERATIONS)]
benchmark_df['algorithm'] = 'Content-Based Filtering - Metadatas'
benchmark_df['execution_time'] = execution_time

benchmark_df['avg_cpu_usage'] = cpu_usage
benchmark_df['min_cpu_usage'] = min_cpu_usage
benchmark_df['max_cpu_usage'] = max_cpu_usage

benchmark_df['avg_ram_usage'] = ram_usage
benchmark_df['min_ram_usage'] = min_ram_usage
benchmark_df['max_ram_usage'] = max_ram_usage

benchmark_df

Unnamed: 0,iteration,algorithm,execution_time,avg_cpu_usage,min_cpu_usage,max_cpu_usage,avg_ram_usage,min_ram_usage,max_ram_usage
0,0,Content-Based Filtering - Metadatas,12.028853,11.56,0.0,18.7,25.366249,0.319158,59.512932
1,1,Content-Based Filtering - Metadatas,9.872834,10.889744,0.0,12.6,17.554028,0.345049,61.777857
2,2,Content-Based Filtering - Metadatas,6.760252,12.030882,0.0,12.6,18.144258,1.001399,62.408058
3,3,Content-Based Filtering - Metadatas,6.799316,12.135357,0.0,12.6,18.573451,0.9966,62.407672
4,4,Content-Based Filtering - Metadatas,7.008312,11.793403,0.0,12.6,18.906895,1.002815,62.409411
5,5,Content-Based Filtering - Metadatas,6.845567,12.011029,0.0,12.6,18.713641,1.002171,62.413436
6,6,Content-Based Filtering - Metadatas,6.589947,12.121786,0.0,14.2,17.755243,1.000465,61.150554
7,7,Content-Based Filtering - Metadatas,6.644442,12.017279,0.0,12.6,18.306001,1.008097,61.482818
8,8,Content-Based Filtering - Metadatas,6.501366,12.072794,0.0,12.6,18.784666,1.008193,62.414917
9,9,Content-Based Filtering - Metadatas,6.524777,12.084191,0.0,12.6,18.873934,1.008322,62.415078


In [6]:
# ---- Exporting Data ----
benchmark_df.to_csv(
    f'{DATASETS_PATH}/benchmarks/content-based-filtering-metadas.csv'
    , index=False
)

---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).