### Competing risks and cluster data models

+ 標題的兩者是不同東西

+ Competing risks data
+ Mixture models
+ Cluster survival data
+ The FGM models

+ Competing risks are said to be present when a patient is at risk of more than one mutually exclusive event. For example,
  + Death from different causes
  + The occurrence of one of these will prevent any other event
from ever happening
+ If several types of events occur, a model describing progression to each of these competing risks is needed.
+ If failures are different causes of death, only the first of these to occur is observed.
+ In R package ‘cmprsk”, you can find many competing risks datasets and the estimating methods from the existing models

##### Competing risks data

![Competing risks data](https://upload.cc/i1/2021/11/02/5CknVf.png)

+ 每個cause都有各自的時間，但每個cause之間並不獨立
+ 資料結構為multivariate的failure time

---

##### Introduction

+ SEER public use dataset on survival of breast cancer patients from 1992 − 2007 (n = 61050). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch (released April 2011).
+ Follow-up restricted to 10 years
+ Cause of death was categorised into the following:
  +  Breast cancer (n = 8329)
  + Heart disease (n = 2818)
  + Other causes (n = 4454)
+ Age categorised into 18 − 59, 60 − 84 and 85+
+ It is not reasonable to assume the competing event time distributions were independent of the distribution of time to the event of interest.
+ Beasuse usually there is some biological mechanism that influences occurrence of both events, and changing the mechanism behind the competing event will also change the risk of the event of interest.
+ So the time to the event of interest and time to the competing event are not independent.
+ Early approaches viewed competing risks models as a multivariate failure time model, where each individual is assumed to have a potential failure time for each type of failure.
+  Let $\tilde{T}_k$ denote the time to failure of cause k. We only observe $T = \min\{C,\tilde{T}_k :,k = 1,··· ,K\}$ and D. Here D is an index variable, which specifies which event happened first.
  + 每個人有K個$\tilde{T}$
  + D is an indicator
    + ex: ($T = \tilde{T}_1, D = 1$), ($T = c, D = 0$)
+ If some individuals are censored for all events by end of study or loss to follow-up, they have D = 0

##### Mixture model
+ The mixture model is the naive approach to handle this type of data.
+ It is assumed that each subject has the nonnegative probability $p_k$ of dying for cause k.
+ Let $\zeta$ be the cause indicator and model it by
$$
P(\zeta=k|Z) = \frac{exp(\alpha Z)}{1 + exp(\alpha Z)}
$$
+ In proportional hazards regression on the failure time $\tilde{T}_k$, we model the hazard function of cause k with covariate vector Z as
$$
\lambda_k(t|Z, \zeta=k) = \lambda_{k_0}(t)exp(\beta_k Z)
$$
where $λ_{k_0}(t)$ is the baseline cause-specific hazard of cause k,
and the vector $\beta_k$ represents the covariate effects on cause k.
+ The observed data for each individual is $\{Z, T, D\}$ to
follow-up, they have D = 0.
  + D = 0 means censored.
+ Note that the cause indicator $\zeta$ is not fully observable. For example, if D = 1 we know that the subject dies for the cause 1 so $\zeta$ = 1. However, if the subject is censored, i.e. D = 0, then we have no idea what $\zeta$ is.
+ To count for this uncertainity, the failure time T is the mixture population of $\{\tilde{T}_1, · · · , \tilde{T}_k\}$
+ Given the models (1)–(2) and observed data
$\{Z_i, T_i, D_i; i = 1, · · · , n\}$, we can obtain the likelihood function of $\{\alpha, \beta_k, d\Lambda_{k_0}\}$ as follows:
  + Consider easy case, only two types:
  + K = 2 (types)
  + $P_i = \frac{exp(\alpha Z_i)}{1 + exp(\alpha Z_i)} = P(D=1)$
  + $P(D=2) = 1 - P_i = \bar{P}_i$
  + if $(T_i, D_i = 1)$ which likelihood is:
    + $P_i \times f_1(t_i)$
  + if $(T_i, D_i = 2)$ which likelihood is:
    + $(1 - P_i) \times f_2(t_i)$
  + if $(T_i, D_i = 0)$ which likelihood is:
    + $P_i\times S(t_i) + (1 - P_i)\times S_2(t_i)$
  + 完整的likelihood:
    + $(P_i f_1(t_i))^{I(D_i = 1)}((1 - P_i)f_2(t_i))^{I(D_i = 2)} (P_i S(t_i) + (1 - P_i)S_2(t_i))^{I(D_i = 0)}$
  + 待估計參數：$\alpha, \beta_1, \beta_2, \lambda_{10}, \lambda_{20}$  

##### Cluster survival data

+ Let $\tilde{T}_{i} = (\tilde{T}_{i1}, \tilde{T}_{i2}, \cdots, \tilde{T}_{im})$ be the event time for the cluster $i$, $C_i = (C_{i1}, C_{i2},  \cdots, C_{im})$ the censored times
  + 同一個屬性的人弄成同一個cluster
  + 每個cluster的個體數目可以不同 $\rightarrow$ 為了研究常常設定一樣
+ Let $T_i = (T_{i1}, T_{i2}, \cdots , T_{im})$ the observed event times and $\delta_i = (\delta_{i1}, \delta_{i2}, \cdots, \delta_{im})$ the indicator
+ In the cluster survival data, we treat each subject is followed up by the same type of event
+ The size of each cluster may vary

##### Cluster survival data in R

+ `data(retinopathy, package="survival")`
is a trial of laser coagulation as a treatment to delay diabetic retinopathy
+ 394 observations and there are 2 survival outcomes for each observation
+ The outcome is the time to loss of vision
+ In particular, the laser variable is of interest
+ This is a typical example of the bivariate cluster data

In [1]:
data(retinopathy, package = "survival")

In [11]:
print(head(retinopathy, 15))
cat("-----------------------------------------------------------\n")
print(tail(retinopathy, 15))

   id laser   eye age     type trt futime status risk
1   5 argon  left  28    adult   1  46.23      0    9
2   5 argon  left  28    adult   0  46.23      0    9
3  14 argon right  12 juvenile   1  42.50      0    8
4  14 argon right  12 juvenile   0  31.30      1    6
5  16 xenon right   9 juvenile   1  42.27      0   11
6  16 xenon right   9 juvenile   0  42.27      0   11
7  25 argon  left   9 juvenile   1  20.60      0   11
8  25 argon  left   9 juvenile   0  20.60      0   11
9  29 xenon  left  13 juvenile   1  38.77      0    9
10 29 xenon  left  13 juvenile   0   0.30      1   10
11 46 xenon right  12 juvenile   1  65.23      0    9
12 46 xenon right  12 juvenile   0  54.27      1    9
13 49 argon right   8 juvenile   1  63.50      0    8
14 49 argon right   8 juvenile   0  10.80      1    6
15 56 xenon right  12 juvenile   1  23.17      0    8
-----------------------------------------------------------
      id laser   eye age     type trt futime status risk
380 1672 argon  lef

##### Farlie-Gumbel-Morgenstern (FGM) model

+ In addition, there are some methods using the joint distribution to model the cluster survival data
+ Assume m = 2 (每個cluster內有兩個outcomes), the joint distribution function of $(\tilde{T}_1, \tilde{T}_2)$ is given by
$$
F_{12}(t_1, t_2) = F_1(t_1)F_2(t_2)[1 + \theta\{1 - F_1(t_1)\}\{1 - F_2 (t_2)\}]
$$
where 
  + $F_{12}(t_1, t_2) = P(\tilde{T}_1 \geq t_1, \tilde{T}_2 \geq t_2)$
  + $F_{k}(\cdot)$:  the (marginal) survival function of $\tilde{T}_k$
  + $\theta$ : the association parameter, $−1 \leq 
  \theta \leq 1$