# Evaluation notebook

We have divided this notebook into the following parts:

1. Load **matrix**: We load a CSV file with the matrix concerning the model to evaluate (e.g., validation, calibration or test set data).
2. Load **preds**: We load a CSV file with the predictions concerning the model to evaluate.
3. **Tokenize** the matrix for evaluation purposes: We apply tokenization (e.g., _spacy_ or _whitespace_) before evaluating the results.
4. **Compute metrics** (dubbed evaluations): We compute the specified evaluation metrics.
5. **Dump metrics**: After computing the evaluations, dump their results in the disk.

**Note**: We assume that all of these files will have a set of index columns through which we can jointly align them.


In [22]:
ROOT_DIR = "../outputs/results/mocha/narrativeqa/dev4"
!mkdir -p {ROOT_DIR}

# TODO - Come up with some uuid (model_name + dataset + split)
MATRIX_FILEPATH = f"{ROOT_DIR}/matrix/dev4-uqa-t5-small_preds.csv.gz"
PREDS_FILEPATH = f"{ROOT_DIR}/preds/dev4-uqa-t5-small_preds.csv.gz"

# ----------------------------------------------------------------------------------
# Outputs
# ----------------------------------------------------------------------------------
# Tokenizer
TOKENIZER = "default"
# TOKENIZER = "spacy"
TOKENIZER_FILEPATH = f"{ROOT_DIR}/evals/{TOKENIZER}_dev4-uqa-t5-small_evals.yml"

# Instance-wise metrics for each prediction
EVALS_FILEPATH = f"{ROOT_DIR}/evals/{TOKENIZER}_dev4-uqa-t5-small_evals.csv.gz"

# Dataset-wise metrics avg over all predictions (it will include calibration and correlation metrics)
EVALS_GLOBAL_FILEPATH = f"{ROOT_DIR}/evals/{TOKENIZER}_dev4-uqa-t5-small_evals"
CORR_METRICS_SUFFIX = "correlation_metrics.csv"
CALIB_METRICS_SUFFIX = "calib_metrics.csv"
PERF_METRICS_SUFFIX = "perf_metrics.csv"

# Arguments used to read the files from disk
csv_kwargs = {
   "compression": "gzip"
}

# ----------------------------------------
## Columns names
# ----------------------------------------
ID_COLS = ["example_id", "answer_id"]

UNIQUE_ID_COL = ID_COLS[0]
NON_UNIQUE_ID_COL = ID_COLS[1]
print("Using", UNIQUE_ID_COL, "as the unique column to de-duplicate the data")

Using example_id as the unique column to de-duplicate the data


## Load Data and Preds

We expect the data matrix to be a matrix of instances described by the `ID_COLUMNS` specified above but also by the following columns (along with some others that won't be used in this notebook such as the Xs): 

- `TARGET_LABEL`: the golden text of the example. It should not contain any model-specific preprocessing.
- `TARGET_MULTI_LABELS`: the multiple annotations that could be provided to that example (e.g., in a QA setting we can have multiple possible answers for the same context question pair.


We expect the corresponding **predictions** to be described by the `ID_COLUMNS` but also by the following columns:
- `TARGET_PRED_LABEL`: the predicted text.

In [23]:
import pandas as pd
import numpy as np
import yaml

In [24]:
TARGET_LABEL = "label"
TARGET_MULTI_LABELS = "multi_way_labels"

TARGET_PRED_LABEL = "preds"

In [25]:
preds = pd.read_csv(PREDS_FILEPATH, **csv_kwargs).set_index(ID_COLS)
print("Loaded", len(preds), "predictions from", PREDS_FILEPATH)

matrix = pd.read_csv(MATRIX_FILEPATH, **csv_kwargs, converters={TARGET_MULTI_LABELS: eval}).set_index(ID_COLS)
print("Loaded", len(matrix), "datapoints from", MATRIX_FILEPATH)

assert len(preds) <= len(matrix), "More preds than datapoints: len(preds) > len(matrix)"

DATA = matrix.join(preds.droplevel(NON_UNIQUE_ID_COL), how="left")
DATA.head(3)

Loaded 277 predictions from ../outputs/results/mocha/narrativeqa/dev4/preds/dev4-uqa-t5-small_preds.csv.gz
Loaded 445 datapoints from ../outputs/results/mocha/narrativeqa/dev4/matrix/dev4-uqa-t5-small_preds.csv.gz


Unnamed: 0_level_0,Unnamed: 1_level_0,title,context,question,label,multi_way_labels,preds,score_proba,score_proba_geom,score_proba_arithm,score_proba_std
example_id,answer_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2e6b688ad84546d681f1339df539a0b2,618baa6d3449a6d9171cd39bf204e8c9,narrativeqa,the plot centres on the neurotic young priest ...,who put serge mouret in a home care?,le docteur pascal,[le docteur pascalle doctor pascal],doctor pascal rougon,0.310916,0.846292,0.878556,0.454899
2e6b688ad84546d681f1339df539a0b2,1b5134ff922b6f3eb0b5557a6afb4036,narrativeqa,the plot centres on the neurotic young priest ...,who put serge mouret in a home care?,le doctor pascal,[le docteur pascalle doctor pascal],doctor pascal rougon,0.310916,0.846292,0.878556,0.454899
2aea58b6733f531e127a96752d9db1a4,52e62d271a8dbbc887e25f68681794b4,narrativeqa,the plot revolves around hypatia the pagan phi...,what does raphael do to win victoria's love?,he converts to christianity.,[he converts to christianity.],converts,0.390377,0.730849,0.772606,0.330944


## Tokenize 

At the moment, we provide two different tokenizations:

- `default`: uses punctuation, lowercase, determinants normalization, followed by whitespace and single quotes normalization. This method closely follows the evaluation strategies in the HuggingFace repository for QA.
- `spacy`: uses `spacy` framework for tokenization.

We apply this tokenization to the specified columns: `TARGET_LABEL`, `TARGET_MULTI_LABELS`, and `TARGET_PRED_LABEL`, placing their resulting tokenized versions on columns w/ the same name but with a `_token` suffix.


### Apply tokenization

In [26]:
import tokenizer as t

if TOKENIZER == "spacy":
    tokenizer_classpath = t.spacy_tokenizer
elif TOKENIZER == "default":
    tokenizer_classpath = t.default_tokenizer
else:
    raise ValueError(f"Unrecognized tokenizer value: {TOKENIZER}")

tokenizer_params = {
    "tokens": True
} 

for _col in (TARGET_LABEL, TARGET_PRED_LABEL, TARGET_MULTI_LABELS):
    print("Applying tokenization to col", _col)
    DATA[f"{_col}{t.TOKENIZATION_SUFFIX}"] = DATA[_col].apply(tokenizer_classpath, **tokenizer_params)

with open(TOKENIZER_FILEPATH, "w") as f:
    yaml.safe_dump({
        "tokenizer_classpath": tokenizer_classpath.__name__,
        "tokenizer_params": tokenizer_params,
    }, f)

Applying tokenization to col label
Applying tokenization to col preds
Applying tokenization to col multi_way_labels


## Compute metrics

We'll resort to HuggingFace's `datasets` builtin [metrics](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Metric) library. This provides more flexibility and is also less cumbersome to maintain code. 

Unfortunately, it means that it's not as efficient, since we will be iterating the whole dataset `M` times, where `M` is the number of metrics to compute. One might compute these metrics in parallel. We resort to a _pipeline_ implementation which should be easily adapted for multithreading to benefit from parallelism.

In order to use standard standard metrics based in word overlap (e.g., `precision`, `recall` and `f1-score`) we need to create our own methods. We'll use the implementation available in [`datasets/squad_metrics.py`](https://github.com/huggingface/transformers/blob/master/src/transformers/data/metrics/squad_metrics.py).

---

Our evaluation pipeline supports the following metrics:

- **performance metrics**: evaluate performance metrics like `precision`, `recall`, `bleu`, `rougeL`, among others.
- **correlation metrics**: evaluate the correlation between specified pairs of columns. Correlation metrics include `pearsonr`, `spearman`, and `kendalltau`.
- **calibration metrics**: evaluate calibration metrics like `equal_width_ece`, `log_loss`, `brier_score`, among others.

In [27]:
import metrics as m


### Performance metrics

Based on the names of the metrics specified by the user, we'll have to delegate the appropriate methods. Since different metrics require different types of inputs, we also provide the option for the user to specify which columns to use for applying a given metric.

These metrics will be computed at an instance level (per each example in the dataset).

In [28]:
perf_metric = m.PerformanceMetrics(TARGET_LABEL, TARGET_PRED_LABEL, TARGET_MULTI_LABELS)
perf_results = perf_metric.compute(DATA)
perf_results.head()

[nltk_data] Downloading package wordnet to /home/kat/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Unnamed: 0_level_0,Unnamed: 1_level_0,exact_match,first_error_position,precision,recall,f1_score,csi,rouge1,rouge2,rougeL,rougeLsum,meteor,metric_type
example_id,answer_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2e6b688ad84546d681f1339df539a0b2,618baa6d3449a6d9171cd39bf204e8c9,0,0.0,0.333333,0.333333,0.333333,0.2,0.333333,0.0,0.333333,0.333333,0.166667,performance
2e6b688ad84546d681f1339df539a0b2,1b5134ff922b6f3eb0b5557a6afb4036,0,0.0,0.666667,0.666667,0.666667,0.5,0.666667,0.5,0.666667,0.666667,0.625,performance
2aea58b6733f531e127a96752d9db1a4,52e62d271a8dbbc887e25f68681794b4,0,0.0,1.0,0.25,0.4,0.25,0.4,0.0,0.4,0.4,0.135135,performance
a9bf0ee9eac598663c3f2c9a9908b1e6,2093f3df3379e16364141294a8c9a062,0,0.0,0.666667,1.0,0.8,0.666667,0.8,0.666667,0.8,0.8,0.892857,performance
a9bf0ee9eac598663c3f2c9a9908b1e6,560c5cb1b644704520449f966d12394a,0,2.0,0.666667,0.666667,0.666667,0.5,0.666667,0.5,0.666667,0.666667,0.625,performance


#### Dump instance-wise metrics to filepath

In [29]:
perf_results.to_csv(EVALS_FILEPATH, **csv_kwargs)

## Global metrics (or Dataset-wise metrics)


These metrics include **correlation** and **calibration** metrics, as well as the mean values for the **performance metrics** we computed before.

In [32]:
SCORE_COLS = [
    "score_proba",
    ## Add other normalization scores. We will assume these columns
    ## are normalized between [0, 1]. Consider renormalizing prior
    ## using this script
    "score_proba_arithm",
    "score_proba_geom",
    "score_proba_std",
    # ""...
]

# Validation of the scores range
for col in SCORE_COLS:
    assert 0 < min(DATA[col]), f"{col} col is less than 0"
    assert max(DATA[col]) <= 1, f"{col} col is greater than 1"


GLOBAL_METRICS = DATA[SCORE_COLS].copy()
GLOBAL_METRICS = GLOBAL_METRICS.join(perf_results, how="left")

### Filter the duplicate ones

When dealing with multi-way annotations one might have different golden annotations for the same example pair. Therefore, we're going to drop the duplicates as it is standard practice, keeping only the example with highest achieving metric values. 


In the past, we've been using the columns `exact_match` and `f1_score` to sort the performance metrics descending and then use drop_duplicates, while keeping the first instance. This guarantees we only keep the highest achieving `f1_score`s. Consider changing the `REFERENCE_METRICS` below to adopt a different sorting process. 

**Note**: Be mindful when using multiple metrics, since this code is not directly supporting metrics with opposite senses and, in fact, is assuming that **higher values of REFERENCE METRICS are better**.

In [33]:
REFERENCE_METRICS = ["exact_match", "f1_score"] 

# -----------------------------------------------------------------
GLOBAL_METRICS = GLOBAL_METRICS.reset_index()
print("Before de-duplication of data:", len(GLOBAL_METRICS))

_temp = GLOBAL_METRICS.sort_values(REFERENCE_METRICS, ascending=False)
GLOBAL_METRICS_UNIQUE = GLOBAL_METRICS[~_temp.duplicated(UNIQUE_ID_COL)].set_index(ID_COLS)

print("After de-duplication of data:", len(GLOBAL_METRICS_UNIQUE))
# -----------------------------------------------------------------
GLOBAL_METRICS_UNIQUE

Before de-duplication of data: 445
After de-duplication of data: 277


  GLOBAL_METRICS_UNIQUE = GLOBAL_METRICS[~_temp.duplicated(UNIQUE_ID_COL)].set_index(ID_COLS)


Unnamed: 0_level_0,Unnamed: 1_level_0,score_proba,score_proba_arithm,score_proba_geom,score_proba_std,exact_match,first_error_position,precision,recall,f1_score,csi,rouge1,rouge2,rougeL,rougeLsum,meteor,metric_type
example_id,answer_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2e6b688ad84546d681f1339df539a0b2,1b5134ff922b6f3eb0b5557a6afb4036,0.310916,0.878556,0.846292,0.454899,0,0.0,0.666667,0.666667,0.666667,0.500000,0.666667,0.500000,0.666667,0.666667,0.625000,performance
2aea58b6733f531e127a96752d9db1a4,52e62d271a8dbbc887e25f68681794b4,0.390377,0.772606,0.730849,0.330944,0,0.0,1.000000,0.250000,0.400000,0.250000,0.400000,0.000000,0.400000,0.400000,0.135135,performance
a9bf0ee9eac598663c3f2c9a9908b1e6,2093f3df3379e16364141294a8c9a062,0.228876,0.909880,0.884369,0.381095,0,0.0,0.666667,1.000000,0.800000,0.666667,0.800000,0.666667,0.800000,0.800000,0.892857,performance
f485d3509a0606a7b570cc5f2edbd083,e2fd235719de9140057fdb4e61e930e9,0.310726,0.725339,0.677318,0.448770,1,,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,0.937500,performance
921d4e8c8c14e3b385bb80546b792154,b1a98f8723f1df6c2a71c2ce80f8af52,0.787864,0.928995,0.923600,0.364561,1,,1.000000,1.000000,1.000000,1.000000,1.000000,0.000000,1.000000,1.000000,0.500000,performance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
cb25a066e09c116f9519a5a77e9a0b5f,b0b95faf20f3d7991d8bb09a1e08a4ae,0.701743,0.839303,0.837701,0.051829,1,,1.000000,1.000000,1.000000,1.000000,1.000000,0.000000,1.000000,1.000000,0.500000,performance
92cf35db62a8b8b885ae6bd88b74796e,83eeb53928af196b2521ec80c13d3594,0.299900,0.886838,0.841942,0.423348,0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,performance
3215cbad5038466d27701023ed9b1425,b07f2a24dd9d08626ce39fbc31afa97c,0.670925,0.941953,0.935648,0.101772,1,,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,0.981481,performance
69fb15c137edf9c225875ffa7112906a,e23c2e8728e615c361a843ec13823480,0.344851,0.866389,0.837411,0.452246,0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,performance


### Correlation metrics

In [34]:
corr_metrics = []
for score_col in SCORE_COLS:
    score_col_results = m.CorrelationMetric(score_col).compute(GLOBAL_METRICS_UNIQUE, GLOBAL_METRICS_UNIQUE.columns)
    score_col_results = score_col_results.dropna()
    corr_metrics.append(score_col_results)
    
corr_metrics = pd.concat(corr_metrics, axis=0)
corr_metrics



Unnamed: 0,x,y,pearsonr,pearsonr_pvalue,spearmanr,spearmanr_pvalue,kendall_tau,kendall_tau_pvalue,metric_type
0,score_proba_arithm,score_proba,0.733841,4.322087e-48,0.807135,6.803368e-65,0.616544,8.215224e-53,correlation
1,score_proba_geom,score_proba,0.764906,1.8497e-54,0.835253,2.137268e-73,0.646,8.401534e-58,correlation
2,score_proba_std,score_proba,-0.43479,3.344793e-14,-0.401213,3.89021e-12,-0.28907,7.428613e-13,correlation
3,exact_match,score_proba,0.535293,6.135915e-22,0.532466,1.102286e-21,0.43554,9.072483999999999e-19,correlation
4,precision,score_proba,0.467091,2.040065e-16,0.540614,2.007397e-22,0.409588,1.4729699999999997e-19,correlation
5,recall,score_proba,0.36604,3.299666e-10,0.399931,4.617043e-12,0.290838,4.693688e-11,correlation
6,f1_score,score_proba,0.435757,2.893437e-14,0.480735,2.002479e-17,0.351592,7.32254e-16,correlation
7,csi,score_proba,0.476991,3.8255030000000006e-17,0.480656,2.03019e-17,0.351695,7.747786e-16,correlation
8,rouge1,score_proba,0.435757,2.893437e-14,0.480735,2.002479e-17,0.351592,7.32254e-16,correlation
9,rouge2,score_proba,0.021278,0.7244126,-0.046629,0.4395413,-0.025429,0.581251,correlation


In [35]:
corr_metrics.to_csv(f"{EVALS_GLOBAL_FILEPATH}_{CORR_METRICS_SUFFIX}", index=False)

In [36]:
corr_metrics[corr_metrics.x == "f1_score"]

Unnamed: 0,x,y,pearsonr,pearsonr_pvalue,spearmanr,spearmanr_pvalue,kendall_tau,kendall_tau_pvalue,metric_type
6,f1_score,score_proba,0.435757,2.893437e-14,0.480735,2.002479e-17,0.351592,7.32254e-16,correlation
6,f1_score,score_proba_arithm,0.474665,5.695816e-17,0.493075,2.239575e-18,0.366364,4.3195030000000005e-17,correlation
6,f1_score,score_proba_geom,0.475979,4.5507e-17,0.490831,3.357864e-18,0.364001,6.845677000000001e-17,correlation
6,f1_score,score_proba_std,-0.609337,1.5176100000000002e-29,-0.668029,3.625402e-37,-0.508183,2.113587e-31,correlation


### Calibration metrics


Amongst the calibration metrics, we have `expected calibration error (ECE)`, `brier score`, `AUC` which quantify the absolute and relative calibrations measures. 


In [37]:
CALIB_METRICS = ["exact_match", "f1_score", "precision", "recall"]

In [38]:
calib_metrics = []
for calib_metric in CALIB_METRICS:
    calib_results = m.CalibrationMetrics(calib_metric).compute(GLOBAL_METRICS_UNIQUE, SCORE_COLS)
    calib_results = calib_results.dropna()
    
    calib_metrics.append(calib_results)
    
calib_metrics = pd.concat(calib_metrics, axis=0)
calib_metrics

Unnamed: 0,x,y,mse,mae,ce_avg,ce_std,ECE_eq_width,ECE_eq_width_max,ECE_eq_freq,ECE_eq_freq_max,hyperparams,metric_type
0,exact_match,score_proba,0.175117,0.334808,-0.006059,0.418426,0.093206,0.013949,0.046076,0.010041,"{'n_bins': 20, 'frac': 0.1}",calibration
1,exact_match,score_proba_arithm,0.383502,0.50237,0.419268,0.455759,0.419268,0.097651,0.419268,0.064955,"{'n_bins': 20, 'frac': 0.1}",calibration
2,exact_match,score_proba_geom,0.361276,0.489918,0.397622,0.450746,0.397622,0.082359,0.397622,0.063269,"{'n_bins': 20, 'frac': 0.1}",calibration
3,exact_match,score_proba_std,0.362034,0.552373,-0.137693,0.585726,0.381762,0.093478,0.388238,0.091278,"{'n_bins': 20, 'frac': 0.1}",calibration
0,f1_score,score_proba,0.181294,0.359875,-0.219097,0.36509,0.23523,0.04227,0.219281,0.041215,"{'n_bins': 20, 'frac': 0.1}",calibration
1,f1_score,score_proba_arithm,0.160646,0.291859,0.206229,0.34368,0.206229,0.041526,0.206229,0.033233,"{'n_bins': 20, 'frac': 0.1}",calibration
2,f1_score,score_proba_geom,0.149603,0.282785,0.184583,0.3399,0.184583,0.033007,0.184583,0.031289,"{'n_bins': 20, 'frac': 0.1}",calibration
3,f1_score,score_proba_std,0.340321,0.500195,-0.350731,0.466164,0.404875,0.093478,0.406999,0.091935,"{'n_bins': 20, 'frac': 0.1}",calibration
0,precision,score_proba,0.213671,0.396753,-0.284156,0.36459,0.297761,0.040721,0.284156,0.050071,"{'n_bins': 20, 'frac': 0.1}",calibration
1,precision,score_proba_arithm,0.150331,0.271482,0.141171,0.361112,0.144934,0.026296,0.143023,0.026895,"{'n_bins': 20, 'frac': 0.1}",calibration


In [39]:
calib_metrics.describe().to_csv(f"{EVALS_GLOBAL_FILEPATH}_{CALIB_METRICS_SUFFIX}", index=False)

In [44]:
calib_metrics[calib_metrics.x == "f1_score"]

Unnamed: 0,x,y,mse,mae,ce_avg,ce_std,ECE_eq_width,ECE_eq_width_max,ECE_eq_freq,ECE_eq_freq_max,hyperparams,metric_type
0,f1_score,score_proba,0.181294,0.359875,-0.219097,0.36509,0.23523,0.04227,0.219281,0.041215,"{'n_bins': 20, 'frac': 0.1}",calibration
1,f1_score,score_proba_arithm,0.160646,0.291859,0.206229,0.34368,0.206229,0.041526,0.206229,0.033233,"{'n_bins': 20, 'frac': 0.1}",calibration
2,f1_score,score_proba_geom,0.149603,0.282785,0.184583,0.3399,0.184583,0.033007,0.184583,0.031289,"{'n_bins': 20, 'frac': 0.1}",calibration
3,f1_score,score_proba_std,0.340321,0.500195,-0.350731,0.466164,0.404875,0.093478,0.406999,0.091935,"{'n_bins': 20, 'frac': 0.1}",calibration


### Performance metrics (dataset wise)

In [40]:
global_perf = (
    pd.DataFrame(GLOBAL_METRICS_UNIQUE.mean(), columns=["metric_avg"]),
    pd.DataFrame(GLOBAL_METRICS_UNIQUE.std(), columns=["metric_std"]),
)
# GLOBAL_METRICS_UNIQUE[~GLOBAL_METRICS_UNIQUE.first_error_position.isna()].mean()
global_perf = pd.concat(global_perf, axis=1)
global_perf

  pd.DataFrame(GLOBAL_METRICS_UNIQUE.mean(), columns=["metric_avg"]),
  pd.DataFrame(GLOBAL_METRICS_UNIQUE.std(), columns=["metric_std"]),


Unnamed: 0,metric_avg,metric_std
score_proba,0.419934,0.290176
score_proba_arithm,0.845261,0.10941
score_proba_geom,0.823615,0.129196
score_proba_std,0.2883,0.122059
exact_match,0.425993,0.495388
first_error_position,0.553459,1.01662
precision,0.70409,0.395493
recall,0.632634,0.394394
f1_score,0.639032,0.382494
csi,0.580936,0.404067


In [41]:
global_perf.describe().to_csv(f"{EVALS_GLOBAL_FILEPATH}_{PERF_METRICS_SUFFIX}", index=False)