In [1]:
import numpy as np
import pandas as pd

Forecasts for a well-defined observed event can be evaluated by analyzing the elements of a 2x2 contingency table. A generic form of the 2x2 contingency table is shown below. The observed value for the $i^{th}$ case can be denoted as $x_i = 1$ (or $obs=yes$) when the event was observed to occur,  $x_i = 0$ (or $obs=no$) when the event did not occur (following the notation of Murphy 1988). Forecasts could be provided in one of several forms: continuous (e.g., probabilistic), ordinal (e.g., slight/moderate/high), or binary (yes/no). Continuous and ordinal forecast values can be converted to binary forecasts via thresholding, $f_i = 1$ (or $fcst=yes$) when the forecast value is greater than or equal to a threshold,  $f_i = 0$ (or $fcst=no$) otherwise. The elements of the contingency table are simply the relative frequencies or proportions of each contingency (joint probabilities).

In [2]:
# generic 2x2 contingency table
pd.DataFrame(data=np.array([['a','b','a+b'],['c','d','c+d'],['a+c','b+d','1']]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'])

Unnamed: 0,obs_yes,obs_no,row_sum
fcst_yes,a,b,a+b
fcst_no,c,d,c+d
col_sum,a+c,b+d,1


The “a” element of the contingency table provides the proportion of correct “yes” forecasts, often referred to as “true positives” or “hits”. The “b” element provides the proportion of incorrect “yes” forecasts, also known as “false positives” or “false alarms”. The “c” element provides the proportion of incorrect “no” forecasts, also denoted as “false negatives” or “missed events”. The “d” element provides the proportion of correct “no” forecasts, also called “true negatives” or “correct nulls”.

In [3]:
pd.DataFrame(data=np.array([['a','b',r'$\bar f$'],['c','d',r'1 - $\bar f$'],[r'$\bar x$',r'1 - $\bar x$','1']]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'])

Unnamed: 0,obs_yes,obs_no,row_sum
fcst_yes,a,b,$\bar f$
fcst_no,c,d,1 - $\bar f$
col_sum,$\bar x$,1 - $\bar x$,1


The marginal probabilities (column and row sums) provide information about the overall distributions of observations and forecasts. The overall proportion of observed events is often called the “base rate”, “prevalence”, or “event frequency” ($\bar x = a + c$) . The mean proportion of “yes” forecasts is often called the “mean forecast” or “positive sign rate”  ($\bar f = a + b$).



In [4]:
# 2x2 contingency table counts for Finley's 1884 tornado forecasts
df_finley = pd.DataFrame(data=np.array([[28,72,100],[23,2680,2703],[51,2752,2803]]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'])
df_finley

Unnamed: 0,obs_yes,obs_no,row_sum
fcst_yes,28,72,100
fcst_no,23,2680,2703
col_sum,51,2752,2803


An example of a 2x2 contingency table that has been used in many textbooks and journal articles is from J.P. Finley's 1884 experimental tornado forecasts (Finley, J.P., 1884: Tornado predictions. Amer. Meteor. J., 1, 85-88.). In the example below, the 2x2 table is expressed in terms of proportions.

In [5]:
# 2x2 contingency table for Finley's 1884 tornado forecasts in terms of joint probabilities
df_fin=df_finley/2803.
df_fin

Unnamed: 0,obs_yes,obs_no,row_sum
fcst_yes,0.009989,0.025687,0.035676
fcst_no,0.008205,0.956118,0.964324
col_sum,0.018195,0.981805,1.0


In [6]:
# define a couple of python functions to compute 2x2 contingency table elements from fcst and obs binary arrays

def cont_table(fcst,obs):
    # assuming fcst and obs are both binary and are of the same length
    # joint probabilities elelements should sum to 1
    # return elements of 2x2 contingency table as well as a pandas dataframe for easy viewing
    nn=float(len(obs))
    aa=np.sum(obs*fcst)/nn
    bb=np.sum((1.-obs)*fcst)/nn
    cc=np.sum(obs*(1.-fcst))/nn
    dd=np.sum((1.-obs)*(1.-fcst))/nn
    df=pd.DataFrame(data=np.array([[aa,bb,aa+bb],[cc,dd,cc+dd],[aa+cc,bb+dd,aa+bb+cc+dd]]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'])
    return aa, bb, cc, dd, df

def cont_table_counts(fcst,obs):
    # assuming fcst and obs are both binary and are of the same length
    # count elelements should sum to nn
    # return elements of 2x2 contingency table as well as a pandas dataframe for easy viewing
    nn=float(len(obs))
    aa=np.sum(obs*fcst)
    bb=np.sum((1.-obs)*fcst)
    cc=np.sum(obs*(1.-fcst))
    dd=np.sum((1.-obs)*(1.-fcst))
    df=pd.DataFrame(data=np.array([[aa,bb,aa+bb],[cc,dd,cc+dd],[aa+cc,bb+dd,aa+bb+cc+dd]]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'],dtype='int')
    return aa, bb, cc, dd, df

# set up arrays of binary forecasts and observations that match Finley's results
obs_fin=np.zeros(2803)
fcst_fin=np.zeros(2803)
obs_fin[:51]=1.  # a+c
fcst_fin[:28]=1.  # a
fcst_fin[-72:]=1.  # b

aa,bb,cc,dd,df_finley=cont_table_counts(fcst_fin,obs_fin)

a,b,c,d,df_fin=cont_table(fcst_fin,obs_fin)

# assign pandas dataframe elements to variables for easy calculation
# a=df_fin.loc['fcst_yes','obs_yes']
# b=df_fin.loc['fcst_yes','obs_no']
# c=df_fin.loc['fcst_no','obs_yes']
# d=df_fin.loc['fcst_no','obs_no']

print(df_finley)
print(df_fin)

          obs_yes  obs_no  row_sum
fcst_yes       28      72      100
fcst_no        23    2680     2703
col_sum        51    2752     2803
           obs_yes    obs_no   row_sum
fcst_yes  0.009989  0.025687  0.035676
fcst_no   0.008205  0.956118  0.964324
col_sum   0.018195  0.981805  1.000000


In [7]:
# pandas crosstab will provide the same information, reordered to put the "no" row/column first
pd.crosstab(fcst_fin,obs_fin,rownames=['fcst'],colnames=['obs'],margins=True,normalize=False)

obs,0.0,1.0,All
fcst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,2680,23,2703
1.0,72,28,100
All,2752,51,2803


Given a set of binary forecasts and corresponding binary observations, there are numerous measures of forecast quality that can be obtained from the elements of the 2x2 contingency table. Several widely-used measures are conditional probabilities derived from the 2x2 contingency table. Brooks et al. (2024) discussed the history of several of these measures. Finley's 2x2 table will be used as an example in the following calculations, the elements have been assigned to variables named 'a','b','c','d'

## frequency bias (B)
The ratio of the mean frequency of “yes” forecasts to the mean observed event frequency is also known as the “frequency bias” (“bias” for short) that provides information about the overall degree of over- or under-prediction of the forecast system. 

\begin{equation}
B = \frac{a+b}{a+c} = \frac{\bar f}{\bar x}
\end{equation}

In [8]:
bias1=(a+b)/(a+c)
print('bias = ',bias1)

bias =  1.9607843137254901


## probability of detection (POD)

Probability of detection (POD) is the conditional probability of a correct forecast given an observed event, also known as “prefigurance”, “sensitivity”, “hit rate”, “recall”, or “true positive rate/ratio”:

\begin{equation}
POD = \frac{a}{a+c}
\end{equation}

In [9]:
pod1=a/(a+c)
print('POD = ',pod1)

POD =  0.5490196078431373


python functions have been created for several of these scores to allow for easy calculation, for example pod(a,b,c,d).
Brusco et al. (2021) analyzed the behavior of 71 different measures computed from the 2x2 table, several of these are included in a larger set of functions found at the end of this notebook.

In [10]:
# set up functions for score calculations from 2x2 contingency table
def pod(a,b,c,d):
    return a/(a+c)
    
def sr(a,b,c,d):
    return a/(a+b)    

def pofd(a,b,c,d):
    return b/(b+d)
    
def mr(a,b,c,d):
    return c/(c+d)

In [11]:
pod1=pod(a,b,c,d)
print('POD = ',pod1)

POD =  0.5490196078431373


## probability of false detection (POFD)
Probability of false detection (POFD) is the conditional probability of a false alarm given a “no” observation, also known as “false positive rate/ratio” or “false alarm rate” (1-POFD is also known as “specificity”):

\begin{equation}
POFD = \frac{b}{b+d}
\end{equation}

In [12]:
pofd1=pofd(a,b,c,d)
print('POFD = ',pofd1)

POFD =  0.02616279069767442


## success ratio (SR)
Success ratio (SR) is the conditional probability of a correct forecast given a “yes” forecast, also known as “relevancy”, “correct alarm ratio”, “positive predictive value”, or “precision”:

\begin{equation}
SR = \frac{a}{a+b}
\end{equation}

In [13]:
sr1=sr(a,b,c,d)
print('SR = ',sr1)

SR =  0.28


## miss ratio (MR)
Miss ratio (MR) is the conditional probability of a missed event given a “no” forecast, also known as “false reassurance rate” or “false omission rate” “detection failure ratio”:

\begin{equation}
MR = \frac{c}{c+d}
\end{equation}

In [14]:
mr1=mr(a,b,c,d)
print('MR = ',mr1)

MR =  0.008509064002959674


## false alarm ratio (FAR)
False alarm ratio (FAR) is the conditional probability of a false alarm given a “yes” forecast, this has also been referred to as the “false alarm rate” (causing endless confusion with POFD), and “false discovery rate”. Note that FAR = 1 - SR

\begin{equation}
FAR = \frac{b}{a+b}
\end{equation}

In [15]:
def far(a,b,c,d):
    return b/(a+b)
    
print('FAR = ',far(a,b,c,d))

FAR =  0.7200000000000001


## critical success index (CSI)
Critical success index (CSI) is the intersection of forecast and observed events over the union of forecast and observed events (“IoU”), also known as “threat score”, “Jaccard index”, “Tanimoto index”, or “ratio of verification”:

\begin{equation}
CSI = \frac{a}{a+b+c}
\end{equation}

In [16]:
def csi(a,b,c,d):
    return a/(a+b+c)
    
csi1=csi(a,b,c,d)
print('CSI = ',csi1)

CSI =  0.22764227642276424


## random contingency table
Several measures have been derived with the purpose of correcting a score to account for random chance (often called “skill scores”). Assuming random forecasts and observations are statistically independent, expected values of contingency table elements for a random forecast are simply the products of the marginal probabilities obtained from a 2x2 contingency table:

In [17]:
# generic 2x2 contingency table for a random forecast
pd.DataFrame(data=np.array([['(a+b)(a+c)','(a+b)(b+d)','a+b'],['(c+d)(a+c)','(c+d)(b+d)','c+d'],['a+c','b+d','1']]),index=['fcst_yes','fcst_no','col_sum'],columns=['obs_yes','obs_no','row_sum'])

Unnamed: 0,obs_yes,obs_no,row_sum
fcst_yes,(a+b)(a+c),(a+b)(b+d),a+b
fcst_no,(c+d)(a+c),(c+d)(b+d),c+d
col_sum,a+c,b+d,1


## expected scores for random forecasts
Assuming random forecasts and observations are statistically independent, expected values of various scores can be derived using the random contingency table:

## POD[random]
\begin{equation}
POD_{random} = \frac{a_{random}}{a+c} = \frac{(a+b)(a+c)}{a+c} = a + b = \bar f
\end{equation}

## SR[random]
\begin{equation}
SR_{random} = \frac{a_{random}}{a+b} = \frac{(a+b)(a+c)}{a+b} = a + c = \bar x
\end{equation}

## POFD[random]
\begin{equation}
POFD_{random} = \frac{b_{random}}{b+d} = \frac{(a+b)(b+d)}{b+d} = a + b = \bar f
\end{equation}

## MR[random]
\begin{equation}
MR_{random} = \frac{c_{random}}{c+d} = \frac{(c+d)(a+c)}{c+d} = a + c = \bar x
\end{equation}

## CSI[random]
\begin{equation}
CSI_{random} = \frac{a_{random}}{a_{random}+b_{random}+c_{random}} = \frac{(a+b)(a+c)}{(a+b)+(a+c)-(a+b)(a+c)} = \frac{\bar x}{1+\frac{\bar x}{\bar f}-\bar x}
\end{equation}

## perfect forecasts
For a perfect forecast, the “a” element of the 2x2 contingency table will be equal to the base rate, the “d” element will be one minus the base rate, and the “b” and “c” elements will be zero. Measures of forecast quality for a perfect forecast are provided below:

$POD_{perfect} = 1$

$SR_{perfect} = 1$

$POFD_{perfect} = 0$

$MR_{perfect} = 0$

$CSI_{perfect} = 1$


In [18]:
# set up functions for score calculations from 2x2 contingency table
def ang2bias(x):
    return np.tan(x*np.pi/180.)

def bias2ang(x):
    return np.arctan(x)*180./np.pi

def pod(a,b,c,d):
    return a/(a+c)
    
def sr(a,b,c,d):
    return a/(a+b)    

def pofd(a,b,c,d):
    return b/(b+d)
    
def mr(a,b,c,d):
    return c/(c+d)

def csi(a,b,c,d):
    return a/(a+b+c)

def tversky(a,b,c,d,gamma):
    return a/(a+gamma*b+(1.-gamma)*c)
    
def pss(a,b,c,d):
    return (a*d-b*c)/(a+c)/(b+d)

def css(a,b,c,d):
    return (a*d-b*c)/(a+b)/(c+d)

def qyule(a,b,c,d):
    return (a*d-b*c)/(a*d+b*c)

def mse(a,b,c,d):
    return b+c
    
def srskill(a,b,c,d):
    return (a*d-b*c)/(a+b)/(b+d)
    
def podskill(a,b,c,d):
    return (a*d-b*c)/(a+c)/(c+d)
    
def deelia(a,b,c,d):
    return np.log(b/(a+b))/np.log(a/(a+c))

def edi(a,b,c,d):
    return (np.log(b/(b+d))-np.log(a/(a+c)))/(np.log(b/(b+d))+np.log(a/(a+c)))

def sedi(a,b,c,d):
    return (np.log(b/(b+d))-np.log(a/(a+c))-np.log(d/(b+d))+np.log(c/(a+c)))/(np.log(b/(b+d))+np.log(a/(a+c))+np.log(d/(b+d))+np.log(c/(a+c)))

def hss(a,b,c,d):
    return 2.*(a*d-b*c)/((a+c)*(c+d)+(a+b)*(b+d))

def kappa(a,b,c,d,w):
    return (a*d-b*c)/((1.-w)*(a+c)*(c+d)+w*(a+b)*(b+d))

def dprime(a,b,c,d):
    return norm.ppf(a/(a+c))-norm.ppf(b/(b+d))

def biasodds(a,b,c,d):
    return (a+b)*(b+d)/(a+c)/(c+d)

def oddsr(a,b,c,d):
    return a*d/b/c

def bias(a,b,c,d):
    return (a+b)/(a+c)
    
def phi(a,b,c,d):
    return (a*d-b*c)/np.sqrt((a+c)*(c+d)*(a+b)*(b+d))

def relvalue(a,b,c,d,alpha,xbar):
    if alpha>xbar:
        relval=((1.-alpha)*a-alpha*b)/(1.-alpha)/xbar
    else:
        relval=(alpha*d-(1.-alpha)*c)/alpha/(1.-xbar)
    return relval

def betafa(a,b,c,d):
    return (a/(a+c))/(b/(b+d))
    
def betame(a,b,c,d):
    return (d/(b+d))/(c/(a+c))

def alphafa(a,b,c,d):
    return (a/(a+c))/(a/(a+c)+b/(b+d))
    
def alphame(a,b,c,d):
    return (c/(a+c))/(d/(b+d)+c/(a+c))
