Skip to content

Latest commit

 

History

History
308 lines (213 loc) · 9.05 KB

measures.rst

File metadata and controls

308 lines (213 loc) · 9.05 KB

General Notions

Inter-rater agreement (also known as inter-rater reliability) is a measure of consensus among n raters in the classification of N objects in a k different categories.

In the general case, the rater evalutations can be represented by the reliability data matrix: a n \times N-matrix R such that R[i,j] stores the category selected by the i-th rater for the j-th object.

A more succint representation is provided by a N \times k-matrix C whose elements C[i,j] account how many raters evaluated the i-th object as belonging to the j-th category. This matrix is the classification matrix.

Whenever the numer of raters is 2, i.e., n=2, the rater evaluations can be represented by the agreement matrix: a k \times k-matrix A such that A[i,j] stores the number of objects that are classified at the same time as belonging to the i-th category by the first rater and to the j-th category by the second rater.

Bennett, Alpert and Goldstein's S

Bennett, Alpert and Goldstein's S is an inter-rater agreement measure on nominal scale (see :cite:`bennettS` and :cite:`bennettS2`). It is defined as:

S \stackrel{\tiny\text{def}}{=} \frac { k * P_0 - 1 } { k - 1 }

where P_0 is the probability of agreement among the raters and k is the number of different categories in the classification.

Bangdiwala's B

Bangdiwala's B is an inter-rater agreement measure on nominal scale (see :cite:`BangdiwalaB`). It is defined as:

B \stackrel{\tiny\text{def}}{=} \frac{\sum_{i} A[i,i]}{\sum_{i} A_{i\cdot}*A_{\cdot{}i}}

where A_{i\cdot} and A_{\cdot{}i} are the sums of the elements in the i-th row and i-th column of the matrix A, respectively.

Cohen's Kappa

Cohen's \kappa is an inter-rater agreement measure on nominal scale (see :cite:`CohenK`). It is defined as:

\kappa \stackrel{\tiny\text{def}}{=} \frac{P_0-P_e}{1-P_e}

where P_0 is the probability of agreement among the raters and P_e is the agreement probability by chance.

Scott's Pi

Scott's \pi is an inter-rater agreement measure on nominal scale (see :cite:`ScottPi`). Similarly to Cohen's \kappa, it is defined as:

\pi \stackrel{\tiny\text{def}}{=} \frac{P_0-P_e}{1-P_e}

where P_0 is the probability of agreement among the raters (as in Cohen's \kappa) and P_e is the sum of the squared joint proportions (whereas it is the sum of the squared geometric means of marginal proportions in Cohen's \kappa). In particular, the joint proportions are the arithmetic means of the marginal proportions.

Yule's Y

Yule's Y (see :cite:`YuleY`), sometime called coefficient of colligation, measures the relation between two binary random variables (i.e., it can be computed exclusively on 2 \times 2 agreement matrices). It is defined as:

Y \stackrel{\tiny\text{def}}{=} \frac{\sqrt{\text{OR}}-1}{\sqrt{\text{OR}}+1}

where \text{OR} is the odds ratio (e.g., see here):

\text{OR} \stackrel{\tiny\text{def}}{=} \frac{A[0,0]*A[1,1]}{A[1,0]*A[0,1]}.

Fleiss's Kappa

Fleiss's \kappa (see :cite:`fleissK`) is a multi-rater generalization of :ref:`ScottPi_theory`.

If the classifications are represented in a classification matrix (see :ref:`basic_notions`), the ratio of classifications assigned to class j is:

p_j \stackrel{\tiny\text{def}}{=} \frac{1}{N*n}\sum_{i=1}^{N} C[i,j]

and their square sum is:

\bar{P_e} \stackrel{\tiny\text{def}}{=}  \sum_{j=1}^k p_j.

Instead, the ratio between the pairs of raters which agree on the i-th subject and the overall pairs of raters is:

P_i \stackrel{\tiny\text{def}}{=} \frac{1}{n*(n-1)}\left(\left(\sum_{j=1}^k C[i,j]^2\right) - n\right)

and its mean is:

\bar{P} \stackrel{\tiny\text{def}}{=} \frac{1}{N}\sum_{i=1}^{N}P_i.

Fleiss's \kappa is defined as:

\kappa \stackrel{\tiny\text{def}}{=} \frac{\bar{P}-\bar{P_e}}{1-\bar{P_e}}.

Information Agreement

The Information Agreement, (\text{IA}), is an inter-rater agreement measure on nominal scale (see :cite:`IA2020`) which gauges the dependence between the classifications of two raters.

The probability distributions for the evaluations of the rater \mathfrak{X}, those of the rater \mathfrak{Y}, and the joint evalutions \mathfrak{X}\mathfrak{Y} on the agreement matrix A are:

p_{X_{A}}(j_0) \stackrel{\tiny\text{def}}{=}
\frac{\sum_{i} A[i,j_0]}{\sum_{i}\sum_{j} A[i,j]},
\quad\quad\quad
p_{Y_{A}}(i_0) \stackrel{\tiny\text{def}}{=}
\frac{\sum_{j} A[i_0,j]}{\sum_{i}\sum_{j} A[i,j]},

and

p_{X_{A}Y_{A}}(i_0,j_0) =
\frac{A[i_0,j_0]}{\sum_{i}\sum_{i} A[i,j]},

respectively. The entropy functions for the random variables X_{A}, Y_{A}, and X_{A}Y_{A} are:

H(X_{A}) \stackrel{\tiny\text{def}}{=}
- \sum_{i} p_{X_{A}}(i) \log_2 p_{X_{A}}(i),
\quad\quad\quad
H(Y_{A}) \stackrel{\tiny\text{def}}{=}
- \sum_{j} p_{Y_{A}}(j) \log_2 p_{Y_{A}}(j),

and

H(X_{A}Y_{A}) \stackrel{\tiny\text{def}}{=}
- \sum_{i}\sum_{j} p_{X_{A}Y_{A}}(i,j)
\log_2 p_{X_{A}Y_{A}}(i,j).

The mutual information between the classification of \mathfrak{X} and \mathfrak{Y} is:

I(X_{A},Y_{A}) \stackrel{\tiny\text{def}}{=}
H(X_{A})+H(Y_{A})-H(X_{A}Y_{A}).

The Information Agreement of A is the ratio between I(X_{A},Y_{A}) and the minimum among H(X_{A}) and H(Y_{A}) as \epsilon tends to 0 from the right, i.e.,

\text{IA} \stackrel{\tiny\text{def}}{=} \frac{I(X_{A},Y_{A})}
{ \min(H(X_{A}), H(Y_{A})) }.

Extension-by-Continuity of IA

\text{IA} was proven to be effetive in gauging agreement and solves some of the pitfalls of Cohen's \kappa. However, it is not defined over all the agreement matrices and, in particular, it cannot be directly computed on agreement matrices containing some zeros (see :cite:`IAc2020`).

The extension-by-continuity of Information Agreement, (\text{IA}_{C}), extends \text{IA}'s domain so that it can deal with matrices containing some zeros (see :cite:`IAc2020`). In order to achieve this goal, the considered agreement matrix A is replaced by the symbolic matrix A_{\epsilon} is defined as:

A_{\epsilon}[i,j] \stackrel{\tiny\text{def}}{=} \begin{cases}
A[i,j] & \textrm{if $A[i,j]\neq 0$}\\
\epsilon &   \textrm{if $A[i,j]=0$}
\end{cases}

where \epsilon is a real variable with values in the open interval (0, +\infty). On this matrix, mutual information of the variables X_{A_{\epsilon}} and Y_{A_{\epsilon}} and their entropy functions are defined. The extension-by-continuity of Information Agreement of A is the limit of the ratio between I(X_{A_{\epsilon}},Y_{A_{\epsilon}}) and the minimum among H(X_{A_{\epsilon}}) and H(Y_{A_{\epsilon}}) as \epsilon tends to 0 from the right, i.e.,

\text{IA}_{C}(A) \stackrel{\tiny\text{def}}{=}
\lim_{\epsilon \rightarrow 0^+}
\frac{I(X_{A_{\epsilon}},Y_{A_{\epsilon}})}
{ \min(H(X_{A_{\epsilon}}), H(Y_{A_{\epsilon}})) }.

\text{IA}_{C}(A) was proven to be defined over any non-null agreement matrix having more than one row/column and, if l and m are numbers of non-null columns and non-null rows in A, respectively, then:

\text{IA}_{C}(A) = \begin{cases}
1-\frac{m}{k} & \text{if $H(\overline{X_{A}})=0$}\\
1-\frac{l}{k} & \text{if $H(\overline{Y_{A}})=0$}\\
\frac{I(\overline{X_{A}},\overline{Y_{A}})}
{ \min\left(H\left(\overline{X_{A}}\right),
H\left(\overline{Y_{A}}\right)\right) }&\text{otherwise}
\end{cases}

where \overline{X_{A}}, \overline{Y_{A}}, and \overline{X_{A}Y_{A}} are three random variables having the same probability distributions of {X_{A}}, {Y_{A}}, and {X_{A}Y_{A}} except for 0-probability events which are removed from their domains (see :cite:`IAc2020`).

References

.. bibliography:: refs.bib