### Log-rank 與 Cox model 的關聯

+ In log-rank test:

  + $\frac{(\sum(d_{1j} - E_{1j}))^2}{\sum V_{1j}^2}\sim \chi^2_1$

  + $H_0: S_1(t) = S_2(t)$ vs. $H_1: S_1(t) \neq S_2(t)$


+ In Cox model:

  + $\lambda(t) = \lambda_0(t)exp\{\beta Z\}$

  + $H_0:\beta = 0$ vs. $H_1:\beta \neq 0$

+ Score function of $\beta$

  + $U(\beta) = \sum^n_{i=1}\delta_i\left(Z_i - \frac{S_1(t_i, \beta)}{S_0(t_i, \beta)}\right)$
    + $S_1(t_i, \beta) = \sum_{j\in R_i}exp(\beta Z_j)Z_j$
    + $S_0(t_i, \beta) = \sum_{j\in R_i}exp(\beta Z_j)$

  + 當$\delta_i$等於1時，score equation才有變化。
  + $Z_i$ is the group indicator for the two-sample test.
  + Under $H_0: \beta = 0$, $U(\beta=0) = log(rank\,\,\,test)$
  + R usage: `coxph(Surv(time, delta) ~ z）` or `survdiff(Surv(time, delta) ~ z)`

#### Log-rank test with 3 groups

+ $H_0:S_1(t) = S_2(t) = S_3(t)$ vs. $H_1:$at least one of them not equal
+ Chi-square test with df = 2
+ The test is 
$$
\left(T_1, T_2\right)\left(
\begin{matrix}
a_1 & a_2 \\
b_1 & b_2 
\end{matrix}\right)
\left( \begin{matrix}
T_1 \\
T_2 
\end{matrix} \right)
$$
where
  + $T_1 = \sum_j (d_{1j} - E_{1j})$
  + $\sigma_1^2 = \sum_j d_j \times \frac{n_{1j}(n_j - n_{1j})(n_j - d_j)}{n_j^2 (n_j - 1)}$
  + $\sigma_{12}^2 = \sum_j d_j \times \frac{n_{1j}(0-n_{2j})(n_j - d_j)}{n_j^2(n_j - 1)}$

+ R usage: `survdiff(Surv(time, delta) ~ group)` 

#### Cox models with time-dependent covariates

+ Cox models:
$$
\lambda(t) = \lambda_0(t)exp(\beta Z(t))
$$
+ Score function of $\beta$
  + $U(\beta) = \sum^n_{i=1}\delta_i(Z_i(t) - \frac{S_1(t_i, \beta)}{S_0(t_i, \beta)})$
    + $S_1(t_i, \beta) = \sum_{j\in R_i}exp(\beta Z_j(t_i))Z_j(t_i)$
    + $S_0(t_i, \beta) = \sum_{j\in R_i}exp(\beta Z_j(t_i))$

+ R usage: `coxph(Surv(start, stop, delta) ~ z)`

##### How to generate a time-dependent hazard model

+ Suppose $z(t)$ is a step function has jumps at time 5, 10, 15, 20, 25
+ Let T be the event time follows the Cox model:
$$
\lambda_0(t)exp(\beta Z(t))
$$
with 
  + $\lambda_0(t) = c$
  + $\beta = 0.5$

預期收到資料：

|id|start|stop|delta|z|
|:-:|:-:|:-:|:-:|:-:|
|1|0|10|0|5|
|1|10|13|1|3|
|2|0|7|1|3|
|3|0|10|0|2|
|3|10|20|0|1|
|3|20|33|1|7|
|4|$\vdots$|$\vdots$|$\vdots$|$\vdots$|

##### How to generate data follow the equation below:

$$
\lambda_0(t)exp\{\beta Z(t)\}
$$

Based on Memoriless Property:
\begin{align}
&P(T > t + s | T > t)\\
=& \frac{P(T > t + s)}{P(T > t)}\\
=& \frac{exp(-\int^{\infty}_{t+s} \lambda_0(x)exp(\beta z)dx)}{exp(-\int^{\infty}_t \lambda_0(x)exp(\beta z)dx)}\\
=& exp(-\int^{t+s}_t \lambda_0(x)exp(\beta z)dx)\\
=& exp(-e^{\beta z}\times 0.01 \times s) = Unif(0, 1)\equiv U 
\end{align}

+ step 1: given $\lambda_0(t) = 0.1$ and $\beta = 0.5$



+ step 2: $Z(t)$ = `sample(c(-200:200), 5, replace = T)` / 100
  + 可以觀察到五段時間的不同Z值且為step function，較好模擬
  + Consider time [0, 5), [5, 10), ..., [20, 25) 

+ step 3: generate survival time(T) and censor = 25
  + 舉例：給定Z = 2, 時間在[0, 5)
  + 先計算累積風險：$\int_0^5\lambda_0 exp(\beta \times 2)dt = 0.1\times 5 \times exp(\beta Z(t))$ $\rightarrow$ 累積風險
  + 生成一個 $u = unif(0, 1)$，$\frac{-log(u)}{0.1 * exp(\beta Z(t))}\rightarrow$ 正數(T)：暫時的死亡時間
  + 若 $T<5$, 則存活時間就是T。
  + 若 $T>5$, 則進入下一個時段重新計算。

In [33]:
lambda0 = 0.1
beta = 0.5
rep = 5

data = NULL

# 對每個subject遍歷
for (i in 1:100){
    # covariate 會隨時間而變，所以要sample五個當作五段不同時間的covariate
    Z = sample(-200:200, rep, replace = T) / 100
    # 為了方便起見，跳點採等距處理，以五為單位
    jump.time = seq(0, 25, 5)
    # 存放每個subject在不同時間的delta
    delta = NULL
    # 存放每個subject的stop time
    stop.time = NULL
    
    # 對某個subject遍歷所有時間
    for (j in 1:rep){
        # 隨機生成一個時間
        t = -log(runif(1)) / (lambda0 * exp(0.5 * Z[j]))
        # 若是此時間小於前後兩個時間點之差
        # 則stop time記為前一時間段的stop time加上新生成的時間，並將delta記為1
        # 若否，則delta記為0，且進入下一時間段，並重新生成新的時間再次檢查是否小於前後兩時間點之差
        if (t <= (jump.time[j+1] - jump.time[j])){
            stop.time[j] = jump.time[j] + t
            delta[j] = 1
            break
        }
        stop.time[j] = jump.time[j+1]
        delta[j] = 0
    
    }
    
    #stop.time = ceiling(stop.time)
    comp = cbind(i, jump.time[1:length(delta)], stop.time, delta, Z[1:length(delta)])
    data = rbind(data, comp)
}

data = data.frame(data)
colnames(data) = c("ID", "start", "stop", "delta", "Z")
print(head(data, 20))

   ID start       stop delta     Z
1   1     0  5.0000000     0 -1.04
2   1     5 10.0000000     0  1.83
3   1    10 13.0487056     1  1.93
4   2     0  3.5957042     1 -0.42
5   3     0  5.0000000     0  1.08
6   3     5 10.0000000     0 -1.31
7   3    10 15.0000000     0 -0.66
8   3    15 20.0000000     0 -1.57
9   3    20 25.0000000     0 -0.75
10  4     0  5.0000000     0  0.13
11  4     5  5.8384676     1  0.38
12  5     0  5.0000000     0 -0.35
13  5     5  9.1876023     1  0.27
14  6     0  0.8881112     1  1.65
15  7     0  5.0000000     0 -1.70
16  7     5  7.5163619     1 -1.38
17  8     0  0.6922448     1  0.02
18  9     0  1.9420633     1  1.36
19 10     0  3.7433806     1  0.67
20 11     0  3.6442618     1 -0.74


In [31]:
library(survival)

In [32]:
fit = coxph(Surv(start, stop, delta) ~ Z, data = data)
summary(fit)

Call:
coxph(formula = Surv(start, stop, delta) ~ Z, data = data)

  n= 217, number of events= 97 

    coef exp(coef) se(coef)     z Pr(>|z|)    
Z 0.4659    1.5934   0.1062 4.386 1.15e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  exp(coef) exp(-coef) lower .95 upper .95
Z     1.593     0.6276     1.294     1.962

Concordance= 0.649  (se = 0.033 )
Likelihood ratio test= 19.93  on 1 df,   p=8e-06
Wald test            = 19.24  on 1 df,   p=1e-05
Score (logrank) test = 19.89  on 1 df,   p=8e-06
