## Survival Analysis InClass1

---

### Simulation

#### exponential case

+ Set the true value: $\alpha = 0.5$, $\beta = -0.5$
+ The survival function is $S(t) = exp\{-exp(\alpha + \beta Z)t\}$
+ The cdf inverse function of t is
$$
F(t) = 1 - e^{-e^{\alpha+\beta Z}\,\,t} \sim U(0, 1)\equiv U
$$
+ we can solve the t, which be represented by U:
$$
t = \frac{-log(1 - U)}{e^{\alpha + \beta Z}}
$$

In [None]:
library(survival)

In [None]:
generate.data.exponential = function(N, par){
    # true value
    alpha = par[1]
    beta = par[2]
    # generate one dimension covariate
    z = sample(c(0, 1), N, p = c(.5, .5), replace = T)
    # generate uniform rv to generate event time
    u = runif(N)
    # input the cdf inverse
    cdf.inverse = function(unif, a, b){
        return (-log(1 - unif) / exp(a + b * z))
    }
    # event time
    e = cdf.inverse(unif = u, a = alpha, b = beta)
    # censored time
    c = rexp(N, 0.25)
    # get observation time
    o = c()
    delta = c()
    for (i in 1:N){
        if (c[i] > e[i]){
            o[i] = e[i]
            delta[i] = 1
        }else{
            o[i] = c[i]
            delta[i] = 0
        }
    }
    dt = data.frame(Observation = o, Delta = delta, Covariate = z)
    return (dt)
}

In [None]:
data = generate.data.exponential(N = 300, par = c(0.5, -0.5))
print(head(data, 30))

   Observation Delta Covariate
1   0.06277140     0         1
2   0.38844264     1         0
3   0.20643743     1         0
4   1.46623820     1         0
5   0.40632546     1         0
6   0.37306015     1         1
7   0.38319604     1         0
8   0.86995355     1         0
9   0.27686810     1         0
10  0.95403173     1         1
11  0.35165025     1         0
12  0.68774996     1         0
13  3.14512043     0         1
14  3.74865780     0         1
15  0.50216782     1         1
16  0.65168802     1         0
17  2.46799480     1         1
18  0.06579212     0         0
19  0.13957599     1         0
20  3.49747100     1         1
21  0.08568904     1         1
22  0.19403109     1         1
23  1.06194597     1         0
24  0.27263092     1         0
25  0.30551152     1         1
26  0.08161793     1         0
27  0.92240320     1         0
28  0.81127087     1         0
29  0.05911711     1         1
30  0.68551143     0         1


In [None]:
fit.exp = survreg(Surv(Observation, Delta) ~ Covariate, data = data, dist = "exponential")
summary(fit.exp)


Call:
survreg(formula = Surv(Observation, Delta) ~ Covariate, data = data, 
    dist = "exponential")
              Value Std. Error     z       p
(Intercept) -0.4370     0.0842 -5.19 2.1e-07
Covariate    0.5671     0.1319  4.30 1.7e-05

Scale fixed at 1 

Exponential distribution
Loglik(model)= -189   Loglik(intercept only)= -198.4
	Chisq= 18.89 on 1 degrees of freedom, p= 1.4e-05 
Number of Newton-Raphson Iterations: 4 
n= 300 


In [None]:
alpha.beta = -summary(fit.exp)$coef
print(alpha.beta)

(Intercept)   Covariate 
  0.4370318  -0.5671236 


In [None]:
par = c()
for (i in 1:1000){
  d = generate.data.exponential(N = 300, par = c(0.5, -0.5))
  mod = survreg(Surv(Observation, Delta) ~ Covariate, data = d, dist = "exponential")
  alpha.beta = unname(-summary(mod)$coef)
  par = rbind(par, alpha.beta)
}

print(apply(par, 2, mean))
print(apply(par, 2, sd))

[1]  0.5030854 -0.4955158
[1] 0.08993253 0.13166414


#### weibull case

+ 設定真值:$\alpha = -0.5, \beta = 0.5, \gamma = 2$

+ The survival function is $S(t) = exp\{-t^{\gamma}exp(\alpha + \beta Z)\}$, where $\gamma > 0$.

+ Then the cdf inverse function of t is:

$$
F(t) = 1 - S(t) = 1 - e^{t^{\gamma} e^{\alpha + \beta Z}} \sim U(0, 1) \equiv U
$$

+ we can solve the t, which be represented by U:

$$
t = (\frac{log(1 - U)}{e^{\alpha + \beta Z}})^{-\gamma}
$$

In [None]:
library(survival)

In [None]:
generate.data.weibull = function(N, par){
  # true value
  alpha = par[1]
  beta = par[2]
  gamma = par[3]
  # generate one dimension covariate
  z = sample(c(0, 1), N, p = c(0.5, 0.5), replace = T)
  # prepare uniform rv to generate event time
  u = runif(N)
  # find the cdf inverse function
  cdf.inverse = function(unif, alpha, beta, gammaa){
    return ((-log(1 - unif) / exp(alpha + beta * z))^(1/gamma))
  }
  # generate the event time
  e = cdf.inverse(u, alpha, beta, gamma)
  # generate the censoring
  c = rexp(N, 0.25)
  
  # compare event time and censoring and get observations time and delta
  o = c()
  delta = c()
  for (i in 1:N){
    if (e[i] >= c[i]){
      o[i] = c[i]
      delta[i] = 0
    }else{
      o[i] = e[i]
      delta[i] = 1
    }
  }
  dt = data.frame(observation = o, delta = delta, covariate = z)
  return (dt)
}

In [None]:
data = generate.data.weibull(N = 300, par = c(-0.5, 0.5, 2))
print(head(data, 30))

   observation delta covariate
1   0.31261294     1         0
2   0.70640202     1         0
3   0.88801820     0         0
4   0.74536725     1         1
5   0.71911880     1         1
6   0.28171101     1         1
7   0.60420386     0         0
8   0.99755617     1         1
9   1.62929282     1         0
10  1.72309019     1         1
11  0.54388135     1         0
12  0.75197767     1         1
13  1.16480197     1         1
14  0.45618449     1         1
15  0.07461271     1         0
16  1.62544602     1         0
17  0.02989403     0         0
18  0.96863573     1         1
19  0.26465898     1         1
20  1.42669312     0         0
21  0.08278293     1         1
22  1.11343961     1         1
23  0.33581709     1         1
24  1.24305226     1         0
25  0.87121238     1         1
26  1.29524480     1         1
27  0.52969685     1         1
28  0.49049380     1         1
29  1.53244844     1         1
30  0.81922966     1         0


In [None]:
dt = generate.data.weibull(N = 300, par = c(-0.5, 0.5, 2))
mod = survreg(Surv(observation, delta) ~ covariate, data = dt, dist = "weibull")
alpha.beta = unname(-summary(mod)$coef / summary(mod)$scale)
gamma = 1 / summary(mod)$scale
cat("(alpha, beta) = ", "(", alpha.beta, ")", "\n")
cat("gamma = ", gamma)

(alpha, beta) =  ( -0.4644041 0.4233394 ) 
gamma =  1.886634

+ When we use the `survival` package and the `survreg` function, we need to care about the parameter operation.

+ $\hat{\gamma} = \frac{1}{\hat{\sigma}}$

+ ($\hat{\alpha}$, $\hat{\beta}$) = $\frac{-1}{\hat{\sigma}} \times$ coef

##### Do the generate data and estimating process 1000 times, and take the mean and sd, the estimator is very close to true value.

In [None]:
par = c()
for (i in 1:1000){
  d = generate.data.weibull(N = 300, par = c(-0.5, 0.5, 2))
  mod = survreg(Surv(observation, delta) ~ covariate, data = d, dist = "weibull")
  alpha.beta = unname(-summary(mod)$coef / summary(mod)$scale)
  gamma = 1 / summary(mod)$scale
  p = c(alpha.beta, gamma)
  par = rbind(par, p)
}

print(apply(par, 2, mean))
print(apply(par, 2, sd))

[1] -0.5029491  0.5021560  2.0212933
[1] 0.1072445 0.1350932 0.1023593
