<a href="https://colab.research.google.com/github/dgsob/5BD001/blob/main/SA_THE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Survival Analysis with Applications in Medicine: Take-home examination

## A. Weibull regression models (3 marks)

### 1. For a proportional hazards model with $S_0(t)$ that is a Weibull distribution, show that survival $S(t)$ is also from a Weibull distribution. Marks: 1 for a correct answer; 0.5 for a minor error; otherwise 0.

For a proportional hazards model, we have:
$$S(t|x) = S_0(t)^{\exp(\beta x)}$$

Where $$S_0(t) = \exp(-\lambda t^k)$$ is the Weibull baseline survival function.

Let's substitute:
$$S(t|x) = [\exp(-\lambda t^k)]^{\exp(\beta x)} = \exp(-\lambda t^k \cdot \exp(\beta x))$$

This can be written as:
$$S(t|x) = \exp(-\lambda \cdot \exp(\beta x) \cdot t^k)$$

This is still a Weibull distribution with parameter:
$$\lambda' = \lambda \cdot \exp(\beta x)$$

So survival $S(t)$ is indeed from a Weibull distribution.

### 2. For an accelerated failure time model with $S_0(t)$ that is a Weibull distribution, show that survival $S(t)$ is also from a Weibull distribution. Marks: 1 for a correct answer; 0.5 for a minor error; otherwise 0.

For an accelerated failure time (AFT) model, we have:
$$S(t|x) = S_0(t \exp(-\tilde{\beta}x))$$

Where $$S_0(t) = \exp(-\lambda t^k)$$ is the Weibull baseline survival function.

Let's substitute:
$$S(t|x) = \exp(-\lambda[t \exp(-\tilde{\beta}x)]^k)$$
$$S(t|x) = \exp(-\lambda t^k \exp(-k\tilde{\beta}x))$$

This can be written as:
$$S(t|x) = \exp(-\lambda \exp(-k\tilde{\beta}x) \cdot t^k)$$

This is still a Weibull distribution with parameter:
$$\lambda' = \lambda\exp(-k\tilde{\beta}x)$$
So this survival $S(t)$ is also from a Weibull distribution.

### 3. What is the relationship between $β$ and $\tilde{\beta}$ if both models have a Weibull baseline survival function? Marks: 1 for a correct answer; 0.5 for a minor error; otherwise 0.

From our derivations above:

In the proportional hazards model, the scale parameter is:
$\lambda' = \lambda \exp(\beta x)$

In the accelerated failure time model, the scale parameter becomes:
$\lambda' = \lambda\exp(-k\tilde{\beta}x)$

For these to be equivalent models (since they both result in Weibull distributions):
$\lambda \exp(\beta x) = \lambda \exp(-k\tilde{\beta}x)$

Then:
$\exp(\beta x) = \exp(-k\tilde{\beta}x)$ $→$
$\beta x = -k\tilde{\beta}x$

Thus: $\beta = -k\tilde{\beta}$

## B: Interval-censored likelihood (3 marks)

### 1. Assume that we have a data tuple $(t_i, u_i, v_i)$ where $t_i$ is the delayed entry (left truncation) time and an event is observed in the interval $(u_i, v_i]$ for individual $i$. Express the log-likelihood for this data tuple in terms of (a) survival $S(t)$ and (b) the hazard $h(t)$ at time $t$. Marks for (a) and for (b): 1 for a correct answer; 0.5 for a minor error; otherwise 0.

(a)    
For left-truncated interval-censored data:

$$P(u_i < T_i \leq v_i|T_i > t_i) = \frac{P(u_i < T_i \leq v_i \cap T_i > t_i)}{P(T_i > t_i)} = \frac{P(u_i < T_i \leq v_i)}{P(T_i > t_i)}$$

Since $u_i \geq t_i$ (the observation window starts after the truncation time), we can write:

$$P(u_i < T_i \leq v_i|T_i > t_i) = \frac{S(u_i) - S(v_i)}{S(t_i)}$$

Taking the logarithm for the log-likelihood:

$$\log L_i = \log[S(u_i) - S(v_i)] - \log(S(t_i))$$

(b)    
We know that the survival function can be derived from hazard function:

$$S(t) = \exp \left(- \int_0^t h(s)ds \right)$$

Thus:

$$S(u_i) = \exp \left(- \int_0^{u_i} h(s)ds \right)$$

$$S(v_i) = \exp \left(- \int_0^{v_i} h(s)ds \right)$$

$$S(t_i) = \exp \left(- \int_0^{t_i} h(s)ds \right)$$

For $S(v_i)$ we can do:
$$S(v_i) = \exp \left(- \int_0^{u_i} h(s)ds - \int_{u_i}^{v_i} h(s)ds \right) = S(u_i) \cdot \exp \left(- \int_{u_i}^{v_i} h(s)ds \right)$$

Then substitute it to the result of (a) and factor the $S(u_i)$ out:
$$\log L_i = \log \left( S(u_i) - S(u_i) \cdot \exp \left(- \int_{u_i}^{v_i} h(s)ds \right) \right) - \log(S(t_i))$$

$$\log L_i = \log \left( S(u_i) \cdot \left( 1 - \exp \left(- \int_{u_i}^{v_i} h(s)ds \right) \right) \right) - \log(S(t_i))$$

$$\log L_i = \log(S(u_i)) + \log \left( 1 - \exp \left(- \int_{u_i}^{v_i} h(s)ds \right) \right) - \log(S(t_i))$$

Now we substitute the $S(u_i)$ and the $S(t_i)$ accordingly:
$$\log L_i = - \int_0^{u_i} h(s)ds + \log \left( 1 - \exp \left(- \int_{u_i}^{v_i} h(s)ds \right) \right) + \int_0^{t_i} h(s)ds$$

Since $t_i$ ≤ $u_i$, the above should result in:
$$\log L_i = \log \left( 1 - \exp \left(- \int_{u_i}^{v_i} h(s)ds \right) \right) - \int_{t_i}^{u_i} h(s)ds$$

### 2. Can you express these data using the Surv function from the survival package? If so, show an example; if not, explain why.

It can be implicitly expressed like this:

In [5]:
library(survival)

# Example data frame
data <- data.frame(
  id = 1:5,
  t = c(5, 5, 5, 5, 5),      # Left truncation times
  u = c(13, 5, 13, 13, 13),   # Lower interval bounds
  v = c(18, 5, 538, 13, Inf) # Upper interval bounds
)

# For interval-censored data, we need to specify a status/event indicator
# From https://www.rdocumentation.org/packages/survival/versions/2.11-4/topics/Surv:
# 0 = right censored ------> when v = Inf (event not observed)
# 1 = event at time -------> when u = v (the exact time of the event is known)
# 2 = left censored -------> when u = 0 or u = t (event started before we checked for the first time)
# 3 = interval censored ---> when u < v (sort of typical case)
status <- rep(3, nrow(data))
status[data$v == Inf] <- 0
status[data$u == data$v] <- 1
status[data$u == data$t] <- 2

# Create the interval-censored Surv object
surv_obj <- with(data, Surv(time = u, time2 = v, event = status, type = "interval"))

# Display the Surv object
print(surv_obj)

# Then when using in a model,
# we would include left truncation with a subset parameter:
# model <- coxph(surv_obj ~ predictors, data = data, subset = (t < u))

[1] [13,  18]  5-       [13, 538] 13        13+      


## C: Truncated distributions (7 marks)