Skip to content

Compute "chance agreement" baseline  #49

@FrancescoCasalegno

Description

@FrancescoCasalegno

Background

The following idea was inspired by the term p_e ("probability of chance agreement") in Cohen's Kappa definition.

Formula for "chance agreement"

For a (multi-class) classification problem, let's consider the vector of ground-truth labels y_true. If we assume that the dataset represents accurately the proportions of each label, we can then say that the probability of any given sample to have label k (for k in 1...K) is:

where n_k is the number of samples in y_true with label equal to k, and N is the total number of samples in y_true.

Based on this observation, let's consider a model that predicts y_pred by attributing to each sample, independently from the other samples, a random label according to the observed occurrence probabilities. This means that the predicted label of the i-th sample, y_pred[i] is given by

Then, the probability of the event y_true[i] == y_pred[i] is computed using the Law of Total Probability as

Actions

  • Implement "chance accuracy" as a metric in make_performance_table()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions