# 006.001 Metrics - Rank Probability Score

In [1]:
import pandas as pd
import numpy as np

## RPS

[Met Office? Online Course](https://www.met-learning.eu/pluginfile.php/5277/mod_resource/content/6/www/english/courses/msgcrs/index.htm)

Rank Probability Score (RPS) measures the accuracy of probability predictions when there are more than 2 categories (For 2 categories use the Brier Score)

$$RPS = \frac{1}{K - 1}\sum_{k=0}^K (CDF_{pred,k} - CDF_{obs,k})^2$$

where:
- there are K Categories
- CDF is Cumulative Distribution of the prediction, and the observed
- RPS has a range from 0 to 1
- Lower is better


### Test Data

In [2]:
# Test Data form Met Office? Online Course Page - Computation of the Rank Probability Score (RPS) – Accuracy

d = {'p1': [0.7, 0.9, 0.9, 0.8, 0.8, 0.9, 0.6, 0.3, 0.3,0.8, 0.8, 0.0, 0.3],
     'p2': [0.3, 0.1, 0.1, 0.2, 0.2, 0.1, 0.4, 0.4, 0.4, 0.2, 0.2, 0.4, 0.7],
     'p3': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.3, 0.0, 0.0, 0.6, 0.0],
     'obs_cat': [1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 3 ,2],
     'obs_cat1': [1,1,1,1,1,1,0,0,1,1,0,0,0],
     'obs_cat2': [0,0,0,0,0,0,1,1,0,0,1,0,1],
     'obs_cat3': [0,0,0,0,0,0,0,0,0,0,0,1,0],
     'rps': [0.045, 0.005, 0.005, 0.020, 0.020, 0.005, 0.180, 0.090, 0.29, 0.020, 0.320, 0.080, 0.045]}
df = pd.DataFrame(d)
df.head(20)


Unnamed: 0,p1,p2,p3,obs_cat,obs_cat1,obs_cat2,obs_cat3,rps
0,0.7,0.3,0.0,1,1,0,0,0.045
1,0.9,0.1,0.0,1,1,0,0,0.005
2,0.9,0.1,0.0,1,1,0,0,0.005
3,0.8,0.2,0.0,1,1,0,0,0.02
4,0.8,0.2,0.0,1,1,0,0,0.02
5,0.9,0.1,0.0,1,1,0,0,0.005
6,0.6,0.4,0.0,2,0,1,0,0.18
7,0.3,0.4,0.3,2,0,1,0,0.09
8,0.3,0.4,0.3,1,1,0,0,0.29
9,0.8,0.2,0.0,1,1,0,0,0.02


### Calculation

In [3]:
def calc_rps(pred_df: pd.DataFrame, obs_df: pd.DataFrame) -> pd.Series:
    """
    Accepts two DataFrames - DataFrames must be same size
    pred_df contains probability predictions of outcomes in ranked order
    obs_df contains 0 or 1 based on the observed outcome where the 
    outcomes are in the same ranked order
    Return the Rank Probability Score for each row inside a series
    """
    #print(pred_df.values)
    pred_cdf = pred_df.cumsum(axis='columns').values
    # print(pred_cdf)
    obs_cdf = obs_df.cumsum(axis='columns').values
    # RPS = (pred_cdf - obs_cdf)**2
    RPS = pd.Series(np.sum(1/(pred_df.shape[1]-1) * (pred_cdf - obs_cdf)**2, 1), name='RPS')
    return RPS
    
RPS = calc_rps(df[['p1','p2', 'p3']], df[['obs_cat1','obs_cat2', 'obs_cat3']])
print(RPS)
    
    

0     0.045
1     0.005
2     0.005
3     0.020
4     0.020
5     0.005
6     0.180
7     0.090
8     0.290
9     0.020
10    0.320
11    0.080
12    0.045
Name: RPS, dtype: float64


## RPS Skill Score

Skill scores have a range from - infinity to 1

Negative values indicate the predictions are less accurate than the benchmark predictions

Perfect RPS = 0


$$RPSS = \frac{1 - RPS}{RPS_b}$$

where $RPS_b$ is the Benchmark RPS Score