In [None]:
import pyroc
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

## Data

The below loads in a dataset comparing the performance of a risk score (Krebs-Goplerud) against common biomarkers for Ovarian cancer. This is the dataset originally used in the article by Delong, Delong, and Clarke-Pearson which described the statistical test implemented in pytest.

In [None]:
# Load CSV
data = pd.read_csv("tests/data/ovarian_cancer.csv", index_col="idx")

# Imputation (Median)
outcome = data.pop("outcome")

data.albumin = data.albumin.fillna(np.median(data.albumin.dropna()))
data.total_protein = data.total_protein.fillna(np.median(data.total_protein.dropna()))

data.rename(columns={
    'albumin': 'Albumin',
    'total_protein': 'Total Protein',
    'total_score': 'Krebs-Goplerud'
}, inplace=True)

print(f'Dataset size: {data.shape}')
# Show examples
data.head(5)

## Instantiate ROC object

The ROC object is used to calculate the area under the operator characteristic curve (AUROC).
The first argument should be the binary target (0s and 1s). The second argument should be a dataframe with the predictions to compare.

In [None]:
roc = pyroc.ROC(outcome, data)

Predictions are stored in a dictionary attribute named `preds`:

In [None]:
print(roc.preds.keys())
roc.preds['Krebs-Goplerud']

## Value of the ROC

Upon initialization, the ROC object has calculated the AUROC for each prediction.

In [None]:
print(roc.auc)

As the predictions are stored in an OrderedDict, we can easily associate the AUC with each prediction.

In [None]:
auc = {pred_name: roc.auc[0, i] for i, pred_name in enumerate(roc.preds.keys())}
print(auc)

We may also acquire confidence intervals.

In [None]:
print(roc.preds.keys())
print(roc.ci()[0], 'lower')
print(roc.ci()[1], 'upper')

## Plot ROC curves

The ROC object provides a convenient method for plotting ROC curves:

In [None]:
fig, ax = roc.plot()
plt.show()

## Statistically comparing different predictions

pyroc provides a method for comparing whether a given prediction has a statistically significantly different AUROC than another. The approach is described in an article by DeLong, DeLong, and Clarke-Pearson ([available here](https://doi.org/10.2307/2531595)). In the article, a general approach is described which allows comparing either two predictions directly, e.g. "is prediction A better than prediction B?", or comparing groups of predictions, e.g. "is prediction A significantly different than predictions B or C?". This is accomplished through a *contrast matrix*. We will demonstrate usage of the compare method through three common questions asked. As a reminder, we have three predictors in our example dataset: Albumin, Total Protein, and Krebs-Goplerud. We may thus ask:

* Is Krebs-Goplerud better than Total Protein?
* Is Krebs-Goplerud better than Albumin?
* Is Krebs-Goplerud better than at least one of Albumin or Total Protein?

Recall our predictions are:

In [None]:
print(roc.preds.keys())

### Is Krebs-Goplerud different than Total Protein?

The first question focuses on direct comparison of two predictions. We use a single-row contrast: [0, -1, 1].


In [None]:
p, ci = roc.compare(np.array([
    [0, -1,  1],
]))
print(p, ci)

Our p-value is 0.72, which does not reject the null hypothesis. The AUROC Krebs-Goplerud score is not significantly different from the AUROC of the Total Protein biomarker.

### Is Krebs-Goplerud different than Albumin?

Similar to above, we are only comparing two of the three predictors. However, we are changing which predictor we are comparing again.

We use a single-row contrast: [-1, 0, 1].

In [None]:
p, ci = roc.compare(np.array([
    [-1, 0,  1],
]))
print(p, ci)

Again, no significant difference.

### Is Krebs-Goplerud better than at least one of Albumin and Total Protein?

This comparison is identical to the comparison made in page 844 of the original article by DeLong, DeLong, and Clarke-Pearson.

In [None]:
p, ci = roc.compare(np.array([
    [1, -1,  0],
    [1,  0, -1]
]))
print(p, ci)