### Survival Models

Survival models are a special kind of model where we care about the time to the occurrence of an event, such as the time from treatment to recurrence, or the time from diagnosis to death. This is a common question that doctors want to answer for their patients, such as, how likely am I to survive the next five years or the next 10 years? 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Survival analysis deals with time-to-event data, such as time until death or disease occurrence. Survival models are statistical models used to analyze such data. Previously, we asked questions about the probability of death within a certain time frame. Now, we want to extend that to the probability of survival past any time point. The survival function is the key quantity in survival models, representing the probability that time to event is greater than a certain value. Survival models allow us to use a single model to answer questions about probabilities of survival for different time points, rather than building separate models for each time point.

### Valid Survival Functions

In survival analysis, the survival function is a fundamental concept that represents the probability of surviving beyond a certain time point. The survival function, denoted by S(t), is defined as the probability that an individual survives beyond time t.

For a valid survival function, the following properties must hold:

- S(t) is a non-negative function: The survival function cannot be negative for any time t. This means that the probability of surviving beyond any time point must be greater than or equal to zero.

- S(t) is a decreasing function: The survival function must be a decreasing function of time t. This means that the probability of surviving beyond a longer time period must be smaller than the probability of surviving beyond a shorter time period.

- S(0) = 1: The survival function must equal 1 at time t=0. This means that the probability of surviving beyond time zero (i.e., being alive at the start of the study) is 100%.

- lim S(t) = 0 as t approaches infinity: The survival function must approach zero as time t goes to infinity. This means that the probability of surviving indefinitely is zero.

- The survival function can only take values between 0 and 1: The survival function must take values between 0 and 1 for any time t. This means that the probability of surviving beyond a certain time t cannot be greater than one, nor can it be less than zero.

If a function satisfies these properties, then it is a valid survival function.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

A survival function represents the probability of an event not occurring before a certain time, and it should have a decreasing probability as time increases and start at 1 and end at 0. We looked at examples of survival functions and determined which were valid based on these properties.

### Hearth Attack Data

![image.png](attachment:image.png)

In this example, we're going to be looking at patients who get surgery, and we're going to look at whether they have a heart attack following the surgery. So here we have three patients, and we started our study in January of 2015, and we ended our study in July of 2019. So patients came in at different times where they got their surgery, and we tracked their time to getting a heart attack. So for this first patient, they had their surgery in March of 2016, and had a heart attack in March of 2017, so we can write that down as Patient 1 had a time of 12 months. Then we have patient 2 here, patient 2 had their surgery in July of 2015, and until July 2019, we did not observe any heart attack event. So this is a course of four years, which is equivalent to 48 months. Now notice that we have not seen an event in 48 months, so we're going to put a plus here. And for this third patient who had their surgery, November 2015, dropped out in November 2017, so we observed them for a period of two years, or 24 months, and we observed that it did not have an event in that amount of time, so we write a 24 plus. And so in this way, we can represent our survival data in this following form. And so to summarize, we had survival data where we meet the transition from representing our data as yes or no, as we did in the binary setup, to asking the question when and representing the time from a origin to an event, and also having these censored observations as part of our data, which we'll look into shortly.

![image.png](attachment:image.png)

### Right Censoring

Right censoring is a type of censoring in survival analysis where we don't observe the event for some subjects beyond a certain point in time. In other words, the time to the event is known to be greater than the censoring time for these subjects, but the exact time of the event is unknown. This is usually denoted by a vertical line on the survival function plot. Right censoring is common in medical studies where patients are followed up for a fixed duration of time and some may still be alive at the end of the study. In such cases, we only know that the event has not occurred up until the end of the study period, and thus the data is right censored. It is important to account for right censoring when analyzing survival data, as failure to do so can lead to biased results.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Right censoring is a common occurrence in survival data where the time to event is only known to exceed a certain value. There are two types of right censoring: end-of-study censoring and loss-to-follow-up censoring. It's important to understand censoring to build accurate survival models.

### Estimating the Survival Function

![image.png](attachment:image.png)

We have previously looked at survival models that tell us the survival function, denoted as S(t), for a group of patients. The survival function represents the probability of surviving past a certain time, given the patient's characteristics. In survival data, we have observed the time to an event for some patients, and for others, we only know that their time to an event is greater than a certain value, called right-censored data.

To estimate the survival function, we use the observed data and assume that the same survival curve applies to the entire population. For each time point, we calculate the proportion of patients who survive beyond that time, taking into account both observed events and right-censored data. The resulting survival function shows the probability of surviving past each time point for the population. We will later learn how to build individualized survival functions for patients based on their characteristics.

![image.png](attachment:image.png)

The survival function, denoted as S(t), gives the probability that the time to an event is after some small t. We have seen survival data, where we have a set of patients for whom we have outcome information collected in the form of time to an event. Some observations are right-censored, where we don't observe the event, but we know that the event time is greater than the time of censoring. We want to estimate the survival probability at a fixed time t, such as 25 months, using the available data.

To estimate the survival probability, we count the number of patients who survive to 25 months, and divide it by the total number of patients. However, this estimation is complicated by the presence of right-censored observations. We make assumptions about the fate of the censored observations to obtain the estimate. One assumption is that all censored observations die immediately after being observed. Another assumption is that all censored observations never die. The reality is likely somewhere between these two assumptions.

We calculate the estimate under both assumptions and find that the estimated survival probability is much smaller when assuming that censored observations die immediately compared to the assumption that they never die. The estimate under the first assumption is around 0.29, while the estimate under the second assumption is 0.71.

![image.png](attachment:image.png)

We learned about estimating the survival function using survival data, where we have information about the event times of some patients and right-censored observations for others. We fixed a time t and tried to estimate the probability of survival to that time. We looked at two assumptions we could make for the censored observations: either they die immediately or they never die. We saw that these assumptions led to different estimates of the probability of survival to t, and that the reality is likely somewhere in between these values. We also discussed the importance of addressing this issue when we do not have access to the real event times for the censored observations.

### Using Censored Data

![image.png](attachment:image.png)

So we're going to estimate the probability of survival past 25 months with censored observations. And so, we'll start off with a number line, where we have time points from zero, one, all the way to 25 and beyond. And notice that we're considering discrete points in time, such that our events are happening at either one or two or three; they're not happening in between. And so, we'll start by estimating the probability of survival past 25 months with censored observations, by writing down what this means. So we have S of 25, which is the probability that our time to an event is after 25. Now, notice that if an event happens after 25, that means it happens after or at 26. So we can have that modification that we make, and we'll see why this is useful in just a second. Now, the probability that the time to the event is after or at 26 months, and we're going to expand this out. What does it mean for the time to an event to be after or at 26? It means the same thing as the time to event happened after 26 and after 25, and after 24, all the way to it happening right after or at zero. The reason we can do this is because this implies this. So we know that the time to an event, if it happens after or at 26, all the following must hold true.

![image.png](attachment:image.png)

So we have our S(25) as the probability that T is greater than equal to 26, greater than equal to 25 so on and so forth, up to zero. And the reason we got it to this stage is now we can use the chain rule of conditional probability to break this down even further. And so we can use the chain rule of conditional probability which remember just says the probability of two events and be occurring is the probability of A occurring given B times the probability of B occurring. And this can be expanded out to multiple events A, B and C. And this is going to come out to the probability of A given B,C times the probability of B given C times the probability of C. We just want to know that this exists and we can apply it to this formulation right here to break it down. And the way we'll break it down is to say this is the same as the probability that T is greater than equal to 26, given T is greater than equal to 25 times the probability that T. Is greater than equal to 25 given T is greater than equal to 24, so on and so forth until we reach P(T) is greater than equal to one, given T is greater than equal to zero. Finally times P of T is created than equal to zero as our final term on there. And so we've just applied the chain rule of conditional probability to simplify this expression into that one. Now we know that all events happen at or after zero, so this term is just going to end up being one, and we don't really need to include this term in our multiplication.

![image.png](attachment:image.png)

So we've represented the survival probability in this following expression, that's saying the probability that I survive to 25 is a product of a bunch of probabilities, which says, if I get to 25, what is the probability I get to 26? Multiply that by what's the probability I get to 25 given I get to 24, so on and so forth, until we get to the start of time where we're asking the question, if we survive 0, what is the probability that we survive 1? So we're making these little steps on this timeline to get up to the survival probability we care about here, which is 25. We're going to modify this expression a little more by understanding how this can break down even further. So we have the probability that T is greater than/ equal to 26. The probability that T is greater than/ equal to 26 is also the probability that T is greater than 25. We can write this down as the probability that this person has their event after 25, given that they had their event at or after 25. Now notice that this is simply 1 minus the probability that the event happened exactly at time 25 months, given it happened at or after 25 months. So let's look at what's happening here. We're looking at the probability that the event happens after 25 months, given it happens at or after 25 months. And this is going to be 1 minus the probability that it happens at 25 months, given it happens at or after 25 months. So we're going to use this to extend this expression a little further to say S of 25 is going to be 1 minus probability of having an event at 25 months, given it happens at or after 25 months, and this is going to be the first term. And the second term is going to be 1 minus the probability that it happens at 24 months, given it happens at or after 24 months. And so on and so forth until we get to our last term, which is going to be 1 minus the probability that it happens at 0, given it happens at or after 0. So we're going to see next why it is useful to represent the survival probability in the following way.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

So we've seen how we can represent the survival function using the following expression, and the benefit of using this expression is we can directly estimate this quantity from data. So the probability that the time to event is at a given time can be estimated by looking at the number of patients who died at that time, so the number of patients who died at 25. In the denominator, we're going to look at the patients who have a time to event at or after 25, so this is going to be the number known to survive to 25. So we can estimate this quantity directly from data, by looking at the number of patients who died at 25. And we see that no patients died exactly at time 25, so our numerator is 0. Now in the denominator, we're looking at the number of patients known to survive to 25. So scanning through these patients, patient three survives to 25 and patient six is known to survive to 25, because they have event times or censoring times after 25, and so the denominator here is 2. So we have here 1 minus 0, which is going to come out for 1 for this expression over here, and the reason for this is that we didn't have any patients who died at 25. And this is going to be true for most times in this expression, we're not going to have patients that died. In fact, the only two times for which we're going to see patients die before 25, are going to be for patient 1, at a time of 10, and for patient 4, at a time of 20. So this expression simplifies into computing this expression at those two times, so we have 1 minus the probability T equals 20, given T is at or after 20, times 1 minus the probability that T is equal to 10, given T is at or after 10. And so we have seen how we can simplify this expression to have these two terms, let's see how we can simplify it even further. So we can evaluate this first quantity using data by looking at the number of patients who died at 20 in the numerator, that's going to be 1, that's patient four, and in the denominator, the number of patients that are known to have survived to 20, and so we can walk through the patients. See that patient three has survived up to 20, so is patient four and patient six. So that's three patients, and we can have the 1 minus this quantity. And here we have 1 minus; we'll apply the same logic here, when the numerator will look at the number of events at time 10, there's one, and we look at the number of events at or after time 10, so this is 1, 2, 3, 4, 5, 6. So we have 6 here and now we can complete this computation where the first part evaluates the 2 over 3 and the second part to 5 over 6, to get us to 10 over 18. We can simplify that to 5 over 9, which comes around 0.56.

### Comparing Estimates

![image.png](attachment:image.png)

![image.png](attachment:image.png)

So we have a new estimate of the survival at time 25, which comes out to 0.56. Let's compare this to the estimates we had previously made. So when we had made the assumption that all patients died immediately at their censoring time, we had got a low survival probability of 0.29. On the other extreme, if we assume that all of the censored observations continued to live to the end of time, then we had a survival estimate of 0.71. We also looked at what would have happened if we had access to the real event times in the data, which is not something we can usually observe, but we had this hypothetical scenario in which we could call up the patients and find out the event times. We had a real estimate that we could compute based on that. We can see that our new estimate, which takes into account censored observations, is much closer to the real estimate than we could get with either of our two extreme observations. So we've seen how we can estimate survival probability in the presence of censored observations. So now that we've seen how to estimate the probability of survival past 25 months with censored observations, let's look at how we can generalize that to any time small t. Let's see if we can generalize this expression we saw earlier to have any time small t. So here we can represent the survival at any point t as a product. We can write in product notation of i, starting at time zero and extending all the way to the time t, and then in this we're going to have 1 minus the probability that the time to an event is at the time i, given it's at or after i. So this is simply writing down for any time t what this expression would look like. So now we've seen how we can directly estimate this quantity from data as the number of patients who died at the time. So the time here is i divided by the number that are known to survive to i. We can have a shorthand notation for the numerator of di and for the denominator as ni, which brings us to this expression for the survival function, which is the survival at time t is the product from i equals 0 to t of 1 minus di over ni.

### Kaplan Meier Estimate

The Kaplan-Meier estimate is a popular non-parametric method used to estimate the survival function in the presence of right-censored data. It provides an estimate of the probability of survival beyond a given time point, based on the observed data.

The Kaplan-Meier estimate works by dividing the data into time intervals and calculating the proportion of patients who survive beyond each interval. The estimate takes into account both the observed survival times and the right-censored data, which are treated as if they occurred at the last known follow-up time.

The estimate is calculated by multiplying the survival probabilities across the intervals, with each interval's probability depending on the proportion of patients who survived up to that point. This product of probabilities gives us an estimate of the overall probability of survival beyond the last observed time point.

The Kaplan-Meier estimate produces a stepwise survival curve, which makes jumps at each event time. It is a popular method due to its simplicity and ease of implementation. It is widely used in medical research, where survival analysis is a common tool for analyzing clinical trial data.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

This week we learned about survival analysis and how to model survival up to any time t using survival modeling. We also discussed the issue of right censoring in survival data, where some patients do not experience the event of interest before the end of the study. We then explored the Kaplan-Meier estimator, which estimates a survival function for the whole population, taking censored observations into account. We can use this estimate to compare survival probabilities for different populations. In the upcoming week, we will look at strategies for building and evaluating individualized survival prediction models.