In [15]:
pip install plotly pandas numpy

Note: you may need to restart the kernel to use updated packages.


# Balanced Risk Set Matching (Li et al., 2001)

This is a study of treatment, cystoscopy, and hydrodistention, given in response to the symptoms of the chronic, nonlethal disease interstitial cystitis. The idea of the journal is to match the treatment of the patient with a similar history of symptoms who have different times receiving their treatment. It is described as time $t$ for when the patient received their treatment.

## Data

The journal uses the Interstitial Cystitis Data Base (ICDB) for the data, but we will be using synthetic data simulating a similar result. The data currently being used is not trained to accurately reproduce a similar result to ICDB.

In [16]:
from defs import patients_evaluations

patients_evaluations[0][["gender", "pain", "urgency", "nocturnal frequency"]].head(10)

Unnamed: 0,gender,pain,urgency,nocturnal frequency
0,M,3,3,4
1,F,8,7,2
2,F,0,4,3
3,F,4,2,3
4,M,8,5,2
5,F,6,8,2
6,M,2,8,4
7,F,7,4,3
8,M,3,5,1
9,F,5,9,3


Patients are evaluated at intervals of approximately every 3 months thereafter for up to 4 years. Three quantities are measured repeatedly over time:

- Pain
- Urgency
- Nocturnal Frequency

Pain and urgency are subjective appraisals on a scale from 0 - 9.

In [39]:
from defs import patients_evaluations
import pandas as pd

# The first element of the list is the evaluation on entry
pd.DataFrame({"evaluation per iterval": patients_evaluations})

Unnamed: 0,evaluation per iterval
0,id gender pain urgency nocturnal freq...
1,id gender pain urgency nocturnal freq...
2,id gender pain urgency nocturnal freq...
3,id gender pain urgency nocturnal freq...
4,id gender pain urgency nocturnal freq...
5,id gender pain urgency nocturnal freq...
6,id gender pain urgency nocturnal freq...
7,id gender pain urgency nocturnal freq...
8,id gender pain urgency nocturnal freq...
9,id gender pain urgency nocturnal freq...


## Matching by Minimum Cost Flow in a Network

Set $\mathcal{A} = \{ \alpha_1, \dots, \alpha_M \}$ called units. Set $\mathcal{T} \subseteq \mathcal{A}$ called treated units. Set $\mathcal{E} \subseteq \mathcal{T} \times \mathcal{A}$ called edges. If the pair $e = ( \alpha_p, \alpha_q )$ is an edge $e \in \mathcal{E}$, then it is permitted to match $\alpha_p$ to $\alpha_q$, but if $e \not\in \mathcal{E}$, then this match is forbidden.

The journal $\mathcal{A}$ consists of 400 patients randomly sampled from the IC database.

For each $e \in \mathcal{E}$, there is a distance $\delta_e > 0$. The distance $\delta_e$ is the Mahalanobis distance between subject $\alpha_p$ and control $\alpha_q$ on a six-dimensional covariate describing the three symptoms at baseline and at time $T_p$ when $a_q$ received treatment.

Set $S \subseteq M \subseteq \mathcal{E}$ where $|M| = S$ edges such that each unit $\alpha_q \in \mathcal{A}$ appears in at most one matched pair, possibly as $(\alpha_p, \alpha_q) \in M$ or as $(\alpha_q, \alpha_p) \in M$ but not as both. $\sum_{e \in M} \delta_e$ over all pair matchings $M$ of size $S$ obtainable with the given structure $\mathcal{A}, \mathcal{T}, \mathcal{E}$.

$|M| = S = 100$ matched pairs. There are three variables: pain score, urgency score, and nocturnal frequency. A patient is paired to their matched not-yet-treated data.

## Balanced Pair Matching



## References

Li, Y. P., Propert, K. J., & Rosenbaum, P. R. (2001). Balanced risk set matching. Journal of the American Statistical Association, 96(455), 870–882. https://doi.org/10.1198/016214501753208573