<center>
    <h1 id='benchmark-analysis' style='color:#7159c1; font-size:350%'>Benchmark Analysis</h1>
    <i style='font-size:125%'>Analysing Recommendation System Algorithms Performances</i>
</center>

> **Topics**

```
- 👨‍🔬 Merging Datasets
- 👨‍🔬 Exploring Performances
- 👨‍🔬 Conclusions
```

In [1]:
# ---- Imports ----
import gc            # pip install gc
import numpy as np   # pip install numpy
import pandas as pd  # pip install pandas

# ---- Constants ----
DATASETS_PATH = ('./datasets/benchmarks')
SEED = (20240420) # April 20, 2024 (fourth Bitcoin Halving)

# ---- Settings ----
np.random.seed(SEED)
pd.set_option('display.max_columns', None)

<h1 id='0-merging-datasets' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>👨‍🔬 | Merging Datasets</h1>

Since a different benchmark dataset has been created for each Recommendation System Algorithm and it's better to work with a single one to analyse the performances, the first thing we have to do in this notebook is merge all datasets into one!!

In [2]:
# ---- Reading Datasets ----
bayesian_mean_df = pd.read_csv(f'{DATASETS_PATH}/demographic-filtering-bayesian-mean.csv')
popularity_df = pd.read_csv(f'{DATASETS_PATH}/demographic-filtering-popularity.csv')
plots_df = pd.read_csv(f'{DATASETS_PATH}/content-based-filtering-plots.csv')
metadatas_df = pd.read_csv(f'{DATASETS_PATH}/content-based-filtering-metadas.csv')
user_based_df = pd.read_csv(f'{DATASETS_PATH}/collaborative-filtering-user-based.csv')
item_based_df = pd.read_csv(f'{DATASETS_PATH}/collaborative-filtering-item-based.csv')
hybrid_df = pd.read_csv(f'{DATASETS_PATH}/hybrid-filtering.csv')

In [3]:
# ---- Merging Datasets ----
#
# - axis: 0 concats the datasets by rows (top-bottom) and 1 concats by columns (side-by-side)
#
full_benchmark_df = pd.concat(
    [
        bayesian_mean_df
        , popularity_df
        , plots_df
        , metadatas_df
        , user_based_df
        , item_based_df
        , hybrid_df
    ]
    , ignore_index=True
    , axis=0
)

full_benchmark_df['algorithm'] = pd.Categorical(full_benchmark_df['algorithm'])

In [4]:
# ---- Exporting Dataset ----
full_benchmark_df.to_csv(f'{DATASETS_PATH}/full-benchmark.csv', index=False)

In [5]:
# ---- Deleting Objects ----
#
#  - 'del' keyword deletes the object into RAM memory;
#  - 'gc.collect()' function frees the freed and unused RAM memory previously occupied by the objects;
# Moreover. the number of non-deleted objects is returned after running the function.
#
del bayesian_mean_df
del popularity_df
del plots_df
del metadatas_df
del user_based_df
del item_based_df
del hybrid_df
del full_benchmark_df

gc.collect()

0

<h1 id='1-exploring-performances' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>👨‍🔬 | Exploring Performances</h1>

<h1 id='2-conclusions' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>👨‍🔬 | Conclusions</h1>

---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).