# Evaluating the user classifier

We use $F_{latency}$, as defined in Parapar, Losada 2022 (Overview of eRisk at CLEF 2021: Early Risk Prediction on the Internet (Extended Overview))

## Loading the classification results

In [1]:
import json
import os

evaluation_dir = "evaluation"

In [2]:
with open(
    os.path.join(evaluation_dir, "classification_results.json"),
    "r"
) as f:
    classification_results = json.load( f,)

In [3]:
classification_results[0]

{'user': '00kate00',
 'true_label': 0,
 'total_documents': 5,
 'predicted_label': 1,
 'necessary_documents': 5}

## Defining F_latency

We have to define the penalty and the speed. 
Calculating the penalty requires a hyperparameter $p$, which the authors set to $0.0078$

In [4]:
from myutils.utils import P_PARAMETER
P_PARAMETER

0.0078

In [5]:
import numpy as np

In [6]:
# penalty depends only on the number of documents seen before taking a decision.
# Calculating the penalty requires 1-indexing!
def calculate_penalty(n_documents_seen):
    exponent = -P_PARAMETER*(n_documents_seen-1)
    return -1 + 2/(1 + np.exp(exponent))

In [7]:
calculate_penalty(1)

0.0

In [15]:
# speed is calculated only considering the true positives
def calculate_speed(
    classification_results # dictionary containing true and predicted labels, and documents required for the decision
):
    true_positives = []
    for entry in classification_results:
        if entry["true_label"] ==1 and entry["predicted_label"]==1:
            true_positives.append(entry)

    # calculate penalties for the true positives
    tp_penalties = []
    for entry in true_positives:
        penalty = calculate_penalty(
            # penalties require 1-indexing 
            # Being conscious of this, we saved the evaluation results with 1-indexing
            entry["necessary_documents"]
        )
        tp_penalties.append(penalty)
    
    # calculate speed
    median = np.median(tp_penalties)
    print(f"Median penalties for TPs: {median}")
    speed = 1 - median

    return speed

In [16]:
calculate_speed(classification_results)

Median penalties for TPs: 0.019497528750876292


0.9805024712491237

The speed of this classifier is really high!

In [17]:
from sklearn.metrics import f1_score

$F_{latency} = F_1 * speed$

In [18]:
# calculate F_1
def calculate_f_1(
        classification_results
):
    true_labels = [entry["true_label"] for entry in classification_results]
    predicted_labels = [entry["predicted_label"] for entry in classification_results]

    f_1 = f1_score(y_true=true_labels, y_pred=predicted_labels)
    return f_1

In [19]:
calculate_f_1(classification_results)

0.3603805260212647

The $F_1$ is low. Remember that for this test we're training with a downsampled training dataset, due to computational limitations.
With the complete training dataset, the $F_1$ should improve significatively.

In [20]:
# define F-latency
def calculate_f_latency(
        classification_results
):
    f_1 = calculate_f_1(classification_results)
    speed = calculate_speed(classification_results)

    f_latency = f_1 * speed
    return f_latency

Calculating $F_{latency}$ for our classifier

In [21]:
calculate_f_latency(classification_results)

Median penalties for TPs: 0.019497528750876292


0.3533539963539092

# How to interpret this ?

This classifier is quick to take decisions for true positives, but in general its decisions are not reliable

## How to improve this?

- Training with the complete training set
- Playing with the hyperparameter
- Trying different dimensionality reduction (PCA, UMAP, LDA, etc) and classification methods (Logistic Regression, GB classifiers, etc)