### Hazard Functions

Hazard functions are an important concept in survival analysis. They describe the instantaneous rate at which events occur, given that an individual has survived up to a certain time. In other words, the hazard function tells us the probability of experiencing an event at a specific time, given that the individual has not experienced an event up to that time. Hazard functions can be estimated using survival data, and they provide useful insights into how risk of an event changes over time. Additionally, hazard functions can be used to compare the risk of an event between different groups, such as patients with different characteristics or treatments. In the next week, we will explore how hazard functions can be used to build and evaluate survival prediction models.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In this lesson, we learned about hazard functions and how they represent a patient's immediate risk of death at any given time. The hazard is represented by the Greek letter small lambda and can be graphed against time to show the shape of the hazard curve. The shape of the curve can indicate whether a patient's risk of death is highest immediately or later on, and can be used to inform treatment decisions. The bathtub curve is a common shape for hazard functions, with a high risk of death at time zero, followed by a decrease, and then an increase over time.

### Survival to Hazard

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In this lesson, we learned about hazard functions, which describe a patient's immediate risk of death at any given time t, and how they can be graphically represented. We also learned that there is a formula that relates the hazard function to the survival function, which tells us the probability of survival past any time t. By using this formula, we can derive the corresponding survival or hazard curve from the other. Additionally, we saw that the hazard is the rate of death at time t, given survival to that time. Overall, understanding the relationship between survival and hazard functions is important in predicting a patient's risk and informing treatment decisions.

### Cumulative Hazard

Cumulative hazard is another function that is related to the hazard and survival functions. In other words, it is the cumulative probability of experiencing an event before time t. The cumulative hazard function is calculated by integrating the hazard function from time 0 to t.

The relationship between the survival, hazard, and cumulative hazard functions can be expressed as follows:

- The survival function is the probability of surviving beyond time t, which can be calculated by taking the complement of the cumulative hazard function.
- The hazard function is the instantaneous rate of experiencing an event at time t, which can be calculated by taking the derivative of the cumulative hazard function with respect to time.
- The cumulative hazard function is the cumulative probability of experiencing an event up to time t, which can be calculated by integrating the hazard function from time 0 to t.

Like the survival and hazard functions, the cumulative hazard function can also be graphed, with time on the x-axis and cumulative hazard on the y-axis. The shape of the cumulative hazard function can provide insight into the underlying risk factors and progression of a disease or event. For example, if the cumulative hazard function increases at a faster rate over time, it suggests that the risk of experiencing an event is higher early on in the disease or event progression. Conversely, if the cumulative hazard function increases at a slower rate over time, it suggests that the risk of experiencing an event is relatively constant over time.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

The cumulative hazard is a measure of the patient's accumulated risk up to a certain time t, and it is related to the hazard function. The cumulative hazard at a specific time t can be calculated by summing up the hazards from time 0 to t. For continuous time, the cumulative hazard can be represented by an integral. The cumulative hazard curve shows how the patient's accumulated risk changes over time. Survival models can output not only a survival function but also a hazard function and a cumulative hazard function, and these functions are related to each other and can be used to answer different questions.

### Customizing Risk Models to Individual Patients

One of the ultimate goals of using survival models is to customize the risk model to individual patients. This is because each patient may have unique risk factors that are not accounted for in a general risk model. For example, in a general model for cardiovascular disease risk, factors such as age, gender, blood pressure, and cholesterol levels may be included. However, a patient's specific family history of heart disease or genetic predisposition may not be accounted for.

To customize a risk model to an individual patient, we need to incorporate the patient's specific risk factors into the model. This can be done by adjusting the baseline hazard function or by adding covariates to the model. For example, if we have a model that predicts the risk of heart attack based on age, gender, blood pressure, and cholesterol levels, we can add a new covariate for family history of heart disease to account for this specific risk factor for an individual patient.

Another approach to customizing risk models is to use machine learning techniques such as deep learning to learn a patient's risk factors and predict their individual risk of an event. This approach can be particularly useful when dealing with complex data such as medical images or genetic data.

Customizing risk models to individual patients has the potential to improve the accuracy of predictions and provide more personalized treatment recommendations. However, there are also challenges in implementing these models, including the need for large amounts of data and the complexity of the models. It is important to balance the potential benefits of customized risk models with the practical limitations of implementation.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In this lesson, we learned about the Cox Proportional Hazards Model and how it can take into account patient variables to compare the risk of different patients using their patient profile. We also discussed the issue with using one hazard function for a patient population and how it does not take into account individual differences in patient risk. The Cox Model proposes a way to model the hazard for an individual patient by multiplying the baseline hazard by some factor that is determined by the patient's variables, such as age and smoking status. This approach allows for a more personalized estimate of patient risk.

### Relative Risk

Relative risk is a measure used to compare the risk of an event (such as developing a disease) between two groups. It is calculated by dividing the risk of the event in one group by the risk of the event in another group. The resulting ratio tells us how much more or less likely one group is to experience the event compared to the other group. A relative risk of 1 indicates that the two groups have the same risk, while a relative risk greater than 1 indicates that one group has a higher risk than the other, and a relative risk less than 1 indicates that one group has a lower risk than the other. Relative risk is often used in epidemiological studies to assess the association between risk factors and health outcomes.

![image.png](attachment:image.png)

In the Cox proportional hazards model, the hazard for an individual patient is modeled as the baseline hazard at time t multiplied by some multiplicative factor determined by the patient's variables. This model is similar to a linear model, where variables are multiplied by weights and summed together, except that the exponential function is used to ensure the output is always greater than or equal to zero. The model can be used to determine the relative risks of patients, as shown by the example of comparing the hazards of two patients with different characteristics. The risks for each patient are represented by a factor multiplied by the baseline hazard, with the factor determined by the patient's variables.

### Ranking Patients by Risk

![image.png](attachment:image.png)

![image.png](attachment:image.png)

So now that we've computed the hazard for all three patients, we can actually compare the risk between the patients. So here we see that the factor associated with the 50-year-old smoker is high, followed by the 50-year-old non-smoker, followed by the 30-year-old non-smoker. So we can create a rank that says patient one has a higher risk than patient two, who has a higher risk than patient three.

### Individual vs Baseline Hazard

The individual hazard refers to the hazard function for a specific patient, which takes into account their individual characteristics such as age, smoking status, etc. The individual hazard is obtained by multiplying the baseline hazard (which is the hazard function for the population) by a multiplicative factor that depends on the patient's individual characteristics.

On the other hand, the baseline hazard is the hazard function for the population as a whole, and it does not take into account any individual characteristics. It represents the hazard of an event occurring at a specific time for an average individual in the population.

So the individual hazard is specific to a particular patient and takes into account their individual characteristics, while the baseline hazard is the hazard for the population as a whole and does not take into account any individual characteristics.

![image.png](attachment:image.png)

So we saw the proportional hazards model, where we have the hazard be the baseline hazard times some factor that is determined by the patient covariates, and here's one thing to note is, let's see what happens when all our covariates, all our variables, are equal to zero. So we have this expression evaluate to exp of 0.08, times 0, plus 0.01, times 0, which is going to be exp of 0, which is 1. And so, the hazard for a patient is the same as the baseline hazard, if all of the variables are equal to zero. Now of course, we won't have any patient whose age is zero and is a nonsmoker, usually our age will be larger than zero, but this tells us what the baseline hazard would look like when we have a factor of one that's coming from the patient covariates. And the cool thing about this model is that the baseline hazard doesn't have to be specified, so it can take on any shape. So for example, here we have two graphs showing two different baseline hazards, and a patient who has the baseline hazard times a risk factor of 1.35. And so, on the left we have a constant baseline hazard, and we can see the patient in red who is at every time point, 1.35 times that. You can see on the right, where we have a bathtub curve hazard, how the baseline hazard sits below the patient hazard, which is 1.35 times the risk at every time point.

![image.png](attachment:image.png)

### Smoker vs Non-smoker

![image.png](attachment:image.png)

### Effect of Age on Hazard

![image.png](attachment:image.png)

### Risk Factor Increase Per Unit Increase in a Variable

![image.png](attachment:image.png)

### Risk Factor Increase or Decrease

![image.png](attachment:image.png)

![image.png](attachment:image.png)

We've seen an example of the Cox Proportional Hazards Model, where we looked at patient variables-- like whether they're a smoker and what their age is-- and we multiply each of these variables by a weight. So we can come up with a more general form of this expression where we can call these betas. So this is beta 1 (B1) and this is beta 2 (B2), and we can call these Xs: so there's X1 and X2. Now of course, a patient might have more than two variables. And so the more general form of this expression is lambda t, is lambda not of t, times exp of beta 1, X1, plus beta 2, X2, so on and so forth. And we can simplify that to say it's the sum over i of Bi, Xi, and of course we have our lambda not of t term. So let's continue from there. So now, we've also seen factor increases in the risk by a unit increase in the variables. So we saw that if we have a variable Xi, then if we look at the weight associated with it, Bi, and we take the exponent of that weight, then that represents the factor risk increase

when we have Xi become Xi plus 1. So, it tells us what the increase is in the factor risk when we have a unit increase in our variable. And we saw two examples of this with age and with smoker, where increasing smoker from 0 to 1 increased the risks by a factor of 1.08, and increasing the age from any age n to n plus 1, increased the factor risks by 1.01. And so, one thing worth noticing here is that when our exponent of Bi is greater than 1, that means the risk factor increases. But when our exponent of Bi is smaller than 1, that means our factor risk decreases. And so, if we have this going on, it means our variable Xi is actually reducing risk. So, let's look at that with an example. So, here we have an example where we have four patient variables: age, their HDL cholesterol, and don't worry too much about what this means, whether they had treatment, and whether they are a smoker. And so we have our Cox Proportional Hazards Model with the following weight attached to these variables. Now, we can take the exponent of those weights to get the factor risk increase associated with that variable. So we can see that increasing age from, let's say 50 to 51, or whatever unit increase this can be, so this can be increasing age from 2 to 3, would increase the risk by a factor of 1.14. And just to put that down, if someone is 51, and we looked at their risk and in time t, it would be lambda 50, at time t, times the factor of 1.14. And now, we can see that for HDL we have a negative weight, and when we take the exponent of that, the result is smaller than 1. So, this is actually decreasing risk. For treatment, we see the same thing where our weight is negative, and so our exp of that weight is going to be smaller than one. Therefore, having treatment or having a high HDL is actually reducing the risk for that patient. Finally, we have a smoker, which like we've seen before, has a positive weight associated with it, so it's increasing the risk by this factor amount.

### Intro to Survival Trees

Survival trees are a type of decision tree used for survival analysis, which is a statistical method used to analyze time-to-event data. The goal of survival trees is to divide the population into subgroups based on their survival times and other variables, such as demographic or clinical characteristics, that may affect survival. These subgroups can then be used to identify different risk factors or treatment options for each subgroup.

Like other decision trees, survival trees are constructed by recursively splitting the data into smaller and smaller groups, based on the variables that are most informative for predicting survival. At each split, the tree identifies the variable that best separates the groups in terms of survival outcomes.

One advantage of survival trees is that they can handle complex interactions between variables, which may not be captured by simpler regression models. Another advantage is that they can produce easily interpretable results, in the form of a tree diagram that shows the variables and splits that are most important for predicting survival.

However, survival trees can be prone to overfitting, which means they may not generalize well to new data. To address this issue, techniques such as pruning and cross-validation can be used to optimize the tree structure and improve its accuracy.

In this lesson, you will learn about survival trees, which allow for the comparison of risks among different patients by taking their individual variables into account. Survival models with individualized hazard functions were discussed, and it was noted that the Cox Proportional Hazards Model assumes that hazard functions for similar patients have the same shape, which may not be the case. Nonlinear relationships and varying risk curves over time can be difficult to model with linear functions. To address this, survival trees can be used to create hazard functions that are specific to different types of patients within a population. The goal is to model the cumulative hazard function and use it to calculate the survival function.

### Survival Tree

A survival tree is a type of decision tree used in survival analysis to model the relationship between patient characteristics and their survival time. It is similar to a binary decision tree, but instead of predicting a binary outcome, it predicts a survival outcome. Each split in the tree represents a different subset of patients based on their characteristics, and the tree is built in a way that maximizes the differences in survival between these subsets. Survival trees can capture nonlinear relationships between patient variables and survival outcomes, and can be used to create individualized hazard functions for each patient, allowing for more accurate predictions of survival.

![image.png](attachment:image.png)

In summary, survival trees are decision trees that are used in time-to-event analysis, where the goal is to estimate the risk of an event occurring at any given point in time. Survival trees use variables such as age, blood pressure, and other relevant factors to group similar patients together and estimate their risk of the event of interest. These trees are different from traditional decision trees in that they deal with survival data and censored observations. A patient's risk can be estimated by determining which group they fall into and using the cumulative hazard estimate for that group at every time t.

### Nelson Aalen Estimator

The Nelson-Aalen estimator is a non-parametric method used to estimate the cumulative hazard function for time-to-event data, such as survival data. It is a popular tool in survival analysis and is commonly used to estimate the cumulative hazard function for censored data.

The estimator is based on the idea that the cumulative hazard function is the integral of the hazard function, which is the instantaneous probability of experiencing an event at a given time, given that the individual has survived up to that time. The Nelson-Aalen estimator is a stepwise function that estimates the cumulative hazard function by summing up the hazard rates at each event time.

The estimator is calculated as follows:

- Sort the event times in increasing order
- Calculate the number of events at each time
- For each event time, calculate the sum of the hazard rates up to that time, where the hazard rate at each event time is the number of events at that time divided by the number at risk at that time
- The Nelson-Aalen estimator is the cumulative sum of the hazard rates

The resulting function gives an estimate of the cumulative hazard function at each event time. The estimator is useful because it does not make any assumptions about the distribution of the underlying survival data, making it a non-parametric method. Additionally, it can handle censored data, where the event time is not observed for some individuals.

The Nelson-Aalen estimator is commonly used in survival analysis to estimate the cumulative hazard function and to compare the survival curves of different groups or treatments. It is often used in conjunction with the log-rank test or other statistical tests to determine whether there are significant differences in survival between groups.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Comparing Risks of Patients

![image.png](attachment:image.png)

Let's chat about how we can compare the risk of two patients. So let's say, we have two patients: one with age of 50 and BP of 162, and the other with an age of 61 and a BP of 140. We can use the survival tree to first find out the group that they belong to. So we see for the first patient, their age is smaller than 60 and BP is greater than 160, so they belong to group A. For the second patient, their age is greater than 60, so we see that they belong to group C. So for the first patient, we would use the cumulative hazard for group A to estimate this patient's cumulative hazard. And here we would use the cumulative hazard for group C to estimate this patient's cumulative hazard. And so we can graph these cumulative hazards for this patient and this patient with the blue and the orange curves here. So, we can see at all time points the cumulative hazard for the blue patient is higher than the cumulative hazard for the orange patient. So this is great because we can say that for all time points the blue patient has the higher cumulative hazard. But what do we do when we have cumulative hazards that are crossing, such that for time points below some point, the blue patient has the higher cumulative hazard, while the orange patient has a higher cumulative hazard beyond that time point? And to be able to tell which one of them is more at risk, we have to know what time we care about comparing the cumulative hazard of the two.

### Mortality Score

The key idea we'll use is that we care about comparing the cumulative hazard at the times where we observe deaths in the population. So, let's say we have a population consisting of many patients, which these two patients belong in, and in this population we get to observe the event times or the censoring times. And here, we see that a lot of events take place between 20 and 33, so we might expect that in this region we care about comparing the risks of the two patients, which would say that this patient in orange has the higher risk. Let's try to formalize this. So we have here the four event times in the dataset. Let's assume we have a population of five right now, just for simplicity, and of these five, we have four events. And those four event times are given by these vertical lines here. At these four event times, we see that the orange curve is always higher than the blue curve. So the orange patient always has the greater risk. In general, we can compare the risk of two patients, or the cumulative hazard of two patients, by looking at what they evaluate to at the different event times in the dataset. So for example, for the first patient, which belongs in group A, we can evaluate the cumulative hazard at 20, 25, 30, and 33, using this orange curve here. And we can do the same using this patient, who belongs to group C, and we can get their cumulative hazard at 20, 25, 30, and 33. Notice that we're not using this patient here because we don't observe their event time, we just observe their censoring time. And now that we have these eight numbers, what we can do is sum up each of these columns to arrive at what's called a mortality score. And the mortality score is a single value that allows us to compare the risks of two patients, or rather the cumulative hazard functions, where it matters, which is the event times. And this allows us to say that for patient A, their cumulative hazard is higher than for this patient, that belongs in group C, and thus we're able to compare the risks of two patients.

![image-2.png](attachment:image-2.png)

A mortality score, also known as a prognostic score or risk score, is a tool used in medicine to predict the likelihood of death or other adverse outcomes in patients with certain medical conditions. These scores are based on statistical models that take into account various patient characteristics such as age, gender, medical history, and current symptoms.

Mortality scores can be used in a variety of clinical settings, such as in intensive care units, emergency departments, and for patients with chronic conditions. They are especially useful for identifying high-risk patients who may require more aggressive or specialized treatment, as well as for helping healthcare providers make informed decisions about patient care and resource allocation.

Some examples of mortality scores include the Acute Physiology and Chronic Health Evaluation (APACHE) score, the Sequential Organ Failure Assessment (SOFA) score, and the Charlson Comorbidity Index (CCI). These scores are widely used in clinical practice and have been shown to be effective in predicting mortality and other adverse outcomes in a variety of patient populations.

Evaluation of a survival model involves assessing how well the model performs in predicting the time to event outcome for new individuals or patients. Here are some common evaluation methods for survival models:

- Concordance Index (C-index): The C-index is a measure of how well a model can predict the order of survival times. It ranges from 0 to 1, with 0.5 indicating random prediction and 1 indicating perfect prediction. The higher the C-index, the better the model's predictive accuracy.

- Brier Score: The Brier score measures the mean squared difference between predicted and actual survival times. It ranges from 0 to 1, with 0 indicating perfect prediction and 1 indicating poor prediction. A lower Brier score indicates better predictive accuracy.

- Calibration plot: A calibration plot is a visual tool that compares the predicted survival probabilities to the actual survival probabilities. The closer the predicted probabilities are to the actual probabilities, the better the model's calibration.

- Cross-validation: Cross-validation is a technique used to assess the generalizability of a model. It involves splitting the data into training and testing sets, fitting the model on the training set, and evaluating its performance on the testing set. Repeat this process multiple times, and the average performance across all iterations is used as the final evaluation metric.

- Information criteria: Information criteria, such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), are used to compare different survival models. Lower values of these criteria indicate better model fit and parsimony.

- Decision curve analysis (DCA): DCA is a tool used to evaluate the clinical usefulness of a model. It involves plotting the net benefit of a model against a range of threshold probabilities and comparing it to other models or strategies. A model with higher net benefit across a wide range of threshold probabilities is considered more clinically useful.

![image.png](attachment:image.png)

In this lesson, we learned about the Harrell's concordance index for evaluating the performance of survival models. The key differences between survival models and other prognostic models are that the ground truth is the time to an event, and we have censored observations. We looked at the concept of concordance, which states that the patient with the worst outcome should have a higher risk score. In the context of time-to-event problems, a worse outcome is if a patient has an event earlier. We also looked at the different cases of concordant pairs, non-concordant pairs, risk ties, and how to handle them in our evaluations.

### Example of Harrell's C-Index

![image.png](attachment:image.png)

 So, notice now that we have the censoring occur before the event time. So, this is a case where we're not able to compare the outcomes. So, that's not a permissible pair. We'll see A, C now. For A, C, we can see the event time here is happening before, so this is going to be a permissible pair. Then we can look at A, D and see that both of these times are censored so we won't be able to make this comparison. We can look at A, E and determine, "Okay, we have an event occur before an observation is censored." So, this is also a valid permissible pair. Let's look at B now. So, our first comparison is going to be B and C. Both of these are events, so we can definitely compare them. Let's look at B and D. We can see that the censoring is happening before the event so this is not going to be a permissible pair. Let's look at B and E. So for B and E, both of those are events, so that's certainly a permissible pair. Now, let's look at C. So C and D is the first one. So, notice here that C and D is a case where we have one event, but we know one of them was censored at the time. Now, when we know one of them was censored at the time, we know that they didn't have an event up to or at that time so we know that the worst outcome was for C. So, this is a permissible pair. Notice that so far I've only looked at the T column. I'm not looking at the risk column at all because we don't need the risk column when we're determining whether a pair is permissible. Let's look at C and E and notice that that's also a permissible pair because both are events. And finally, let's look at D and E and realize that for D, the censoring is happening before the event time, so we won't be able to compare the two. And so, we have six permissible pairs.

![image.png](attachment:image.png)

Now, let's look at our concordant pairs. Now, notice when we're looking at concordant pairs, we only have to look at the pairs that were permissible. Because only the permissible pairs are comparable, and so now we'll see concordance. And remember, concordance says, "Does the patient with the worst outcome have the higher risk score?" And let's try to determine whether that's the case. So for A,C, we have the risk as 0.65 and 0.7, and the worst outcome was for patient C, and so that's going to be A,C is concordant. Then we have pair A,E. And for A,E, we can see that the worst outcome was for E. And E had the higher risk score, so A,E is also going to be a concordant pair.

Now, let's look at B,C. So for B,C, we have both of them events and a higher risk assigned to B, but B has the longer survival time, so this is not a concordant pair. Finally, let's look at B,E. In B,E, we have the high risk assigned to B but that's the longer survival time, so that is also not a concordant pair. Let's look at C,D now. C,D, we have the worst outcome for C, and C has the higher risk. So this is going to be a concordant pair. Finally, we have C, where the worst outcome is for C and the higher risk is for E. So this is not a concordant pair. Let's look at whether we have any ties. We don't have any ties because we have been through all of the pairs at this point. So here we're just going to write None. And so, remember our formula for the C-index is going to be the number of concordant pairs. Here, we have 3 concordant pairs + the number of risk ties. We can see we have no risk ties over here. And so, that's going to be 0.5 times 0 and our denominator is going to have the number of permissible pairs which in this case was 6. And so, we have our C-index of 3 over 6, which is equal to 0.5.