<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Heart Failure prediction using teradataml
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p> 
    
<table style = 'width:100%;table-layout:fixed;'>
    <tr>
        <td style = 'vertical-align:middle' width = '50%'>
            <ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
                <li>According to the CDC, the number of emergency room visits in 2017 for issues related to the heart and blood vessels was nearly 5 million. In 2016, 72 million people made heart disease-related visits to their doctors.</li><br>
                <li>The cost of caring for cardiovascular disease is more than \$351 billion per year. Nearly \$214 billion pays for the care of people with heart disease, while more than \$137 billion goes to lost productivity.</li><br>
                <li>Heart attack is one of the most expensive conditions treated in U.S. hospitals. Its care costs an estimated \$11.5 billion a year.</li><br>
                <li>By 2035, more than 45 percent of Americans are projected to have some form of cardiovascular disease. Total costs of cardiovascular disease are expected to reach \$1.1 trillion in 2035, with direct medical costs expected to reach \$748.7 billion and indirect costs estimated to reach \$368 billion.</li>
            </ul> 
        </td>
        <td>
            <img src="images/heart.webp" width="350"/>
        </td>
    </tr>
</table>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Source: <a href = 'https://www.healthline.com/health/heart-disease/statistics#How-much-does-it-cost?'>Healthline</a></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Machine learning can be useful in heart failure prediction as it can analyze large amounts of data from multiple sources and identify complex patterns that may be difficult for humans to recognize. This can potentially improve the accuracy of prediction models and help healthcare professionals identify patients who are at high risk for heart failure, allowing for earlier intervention and better outcomes.</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Data:</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    This is a simulated dataset based on real hospital administrative data for England called Hospital Episodes Statistics. Every public (National Health Service, NHS) hospital in the country must submit records for every admission; private hospitals also submit records for any NHS patients that they treat.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# system packages
import sys
import warnings
warnings.filterwarnings("ignore")

from teradataml import *
from teradataml import valib

# Dataset packages 
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, roc_curve, roc_auc_score, recall_score, ConfusionMatrixDisplay

# plotting packages
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style = "whitegrid")

%matplotlib inline
configure.val_install_location = "val"
display.max_rows = 5

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

In [None]:
execute_sql('''SET query_band='DEMO=Heart_Failure_Prediction_Python.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_HeartFailure_cloud');"        # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_HeartFailure_local');"        # Takes 2 minutes

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 5 seconds

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Data Exploration</b>

In [None]:
heart_failure = DataFrame(in_schema('DEMO_HeartFailure', 'heart_failure'))

In [None]:
print(heart_failure.shape)
heart_failure

<i><b>*Please scroll down to the end of the notebook for detailed column descriptions of the dataset.</b></i>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The dataset above has 31 columns in total and the 'death' column is the predicted column where 1 means the patient died and 0 means he/she did not.
<br>
Let's check the data for people who died and who did not.
</p>

In [None]:
# Sample data for people who did not die
print("No of people wo did not died:", heart_failure[heart_failure.death == 0].shape[0])
heart_failure[heart_failure.death == 0]

In [None]:
# Sample data for people who died
print("No of people wo did not die:", heart_failure[heart_failure.death == 1].shape[0])
heart_failure[heart_failure.death == 1]

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.1 Mortality Rate by Gender</b></p>

In [None]:
grp_gen = heart_failure.select(['gender','death']).groupby(['gender']).agg(['mean', 'count']).to_pandas()
sns.barplot(x = 'gender', y = 'mean_death', data = grp_gen)
plt.xticks(ticks = [0, 1], labels = ['male', 'female'])
plt.title('Mortality rate by gender')
plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The graph depicted above indicates that gender does not appear to be a determining factor in mortality rates related to heart failure.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.2 Mortality Rate by Age</b></p>

In [None]:
grp_gen = heart_failure.select(['age','death']).groupby(['age']).agg(['mean', 'count']).to_pandas()
plt.figure(figsize = (15, 6))
sns.barplot(x = 'age', y = 'mean_death', data = grp_gen)
plt.xticks(rotation = 90)
plt.title('Mortality rate by age')
plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A noticeable pattern can be observed from the graph, revealing a positive correlation between age and mortality rates associated with heart failure.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.3 Correlation Matrix</b></p>

In [None]:
corr_matrix = heart_failure.to_pandas().corr()

In [None]:
# Set figure size to 20 inches by 8 inches
sns.set(rc = {"figure.figsize": (20, 8)})
# Create a heatmap to visualize the correlation matrix
ax = sns.heatmap(corr_matrix, annot = False, cmap = 'coolwarm', vmin = -1, vmax = 1)

# Set title and show plot
plt.title('Multivariate Correlation Matrix')
plt.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Few observations from the correlation matrix above are:</p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Age and mortality are positively correlated.</li>
    <li>Cancer and metastatic cancer exhibit a positive correlation.</li>
    <li>The number of prior appointments attended and the number of appointments missed in the previous year are positively correlated.</li>
    <li>The correlation between age and follow-up time is negative, indicating that as age increases, follow-up time tends to decrease, which means more frequent hospital visits.</li>
    <li>The negative correlation between age and the number of prior appointments missed suggests that as patients get older, they tend to miss fewer appointments, which can be interpreted as an increased seriousness about their health.</li>
</ul>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>While these correlations exist, they may not be strong enough to justify removing any columns from the dataset.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.4 Pairplot for multivariate correlations</b></p>

In [None]:
# Create a pairplot to visualize multivariate correlations

sns.pairplot(heart_failure.to_pandas()[["gender", "los","age","prior_appts_attended","prior_dnas","fu_time"]],
             diag_kind = 'auto', hue = 'gender')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The plot shown above does not yield conclusive results.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.5 Distribution plots for numeric variables</b></p>

In [None]:
# Create a 2x3 grid of subplots
fig, axs = plt.subplots(nrows = 2, ncols = 3, figsize = (12, 8))

# Loop through each numeric column and create a distribution plot in the corresponding subplot
for i, col in enumerate(["los","age","prior_appts_attended","prior_dnas","fu_time"]):
    sns.histplot(data = heart_failure.to_pandas(), x = col, kde = True, ax = axs[i//3, i%3])
    axs[i//3, i%3].set_title(f'Distribution of {col}')
    axs[i//3, i%3].set_xlabel(col)
    axs[i//3, i%3].set_ylabel('Frequency')

# Add space between subplots and show the plot
plt.tight_layout()
plt.show()

<ul style='font-size:16px;font-family:Arial;color:#00233C'>
    <li>The distribution of length of stay (LOS) indicates that the majority of hospitalizations are less than 20 days.</li>
    <li>The distribution of age suggests that individuals aged 60 and above are more likely to be hospitalized.</li>
    <li>The distribution of prior appointments attended suggests that most people attend fewer than 10 appointments.</li>
    <li>The distribution of prior missed appointments (prior DNAs) indicates that the majority of people do not miss appointments.</li>
    <li>The distribution of follow-up time suggests that a large number of individuals have follow-up time is less than 100 days.</li>
</ul>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Data Preparation</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Classify the columns into an ID column, numerical, categorical and binary ones.</p>

In [None]:
num_x = ["los","age","prior_appts_attended","prior_dnas","fu_time"]
cat_x = ["gender", "quintile","ethnicgroup"]
bin_x = ["death", "cancer", "cabg","crt", "defib","dementia", "diabetes","hypertension", "ihd",
         "mental_health", "arrhythmias", "copd", "obesity","pvd", "renal_disease", "valvular_disease",
         "metastatic_cancer", "pacemaker", "pneumonia", "pci", "stroke", "senile"]
idcol = ["id"]

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>One hot encoding is useful when a categorical data element must be re-expressed as one or more numeric data elements, creating a binary numeric field for each categorical data value.</p>

In [None]:
# 1 - male, 2 - female
values1 = {1: "Gender"}
dummy1 = OneHotEncoder(values = values1, columns = "gender")

# quintile (socio-economic status for patient's neighbourhood, from 1 (most affluent) to 5 (poorest))
values2 = {1: "q_richest", 2: "q_rich", 3: "q_average", 4: "q_poor", 5: "q_poorest"}
dummy2 = OneHotEncoder(values = values2, columns = "quintile")

# 1 - White, 2 - Black, 3 - Indian Subcontinent, 8 - Not Known, 9 - Other
values3 = {1: "White", 2: "Black", 3: "Indian_Subcontinent", 8: "Not_Known", 9:"Other"}
dummy3 = OneHotEncoder(values = values3, columns = "ethnicgroup")

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>FillNa allows user to perform missing value/null replacement transformations. Z-Score transforms each column value into the number of standard deviations from the mean value of the column.</p>

In [None]:
fn = FillNa(style = "mode", columns = num_x)
zs = ZScore(columns = num_x,
            out_columns = num_x, 
            fillna = fn)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Keep the other variables that do not not need transformation.</p>

In [None]:
retain = Retain(columns = bin_x)

In [None]:
# Process the transformation
df_transformed = valib.Transform(
                            data = heart_failure, 
                            zscore = zs, 
                            one_hot_encode = [dummy1, dummy2, dummy3],
                            retain = retain,
                            index_columns = idcol,
                            key_columns = idcol
                         )

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Save the transformed dataframe into a table <b>heart_failure_clean</b>.</p>

In [None]:
df_transformed.result.to_sql(
                "heart_failure_clean",
                primary_index = "id",
                if_exists = "replace"
            )

df_transformed.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Splitting the data in training and testing datasets in 75:25 ratio.</p>

In [None]:
TrainTestSplit_out = TrainTestSplit(
                                    data = df_transformed.result,
                                    id_column = "id",
                                    train_size = 0.75,
                                    test_size = 0.25,
                                    seed = 42
)

df_train = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 1].drop(['TD_IsTrainRow'], axis = 1)
df_test = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 0].drop(['TD_IsTrainRow'], axis = 1)

copy_to_sql(df_train, table_name = 'heart_failure_train', if_exists = 'replace')
copy_to_sql(df_test, table_name = 'heart_failure_test', if_exists = 'replace')

In [None]:
print("Training Set = " + str(df_train.shape[0]) + ". Testing Set = " + str(df_test.shape[0]))

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Model Training</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The TD_GLM function is a generalized linear model (GLM) that performs regression and classification analysis on data sets, where the response follows an exponential or binomial family distribution.
<br>
<br>
Due to gradient-based learning, the function is highly sensitive to feature scaling. Input features should be standardized, such as using ScaleFit, and ScaleTransform, before using them in the function. The function takes only numeric features. We must convert the categorical features to numeric values before training. The function skips the rows with missing (null) values during training.</p>

In [None]:
from teradataml import GLM, TDGLMPredict

glm_model = GLM(
                data = df_train,
                input_columns = '2:38', 
                response_column = 'death',
                family = 'Binomial',
                batch_size = 50,
                iter_max = 100,
                lambda1 = 0.02,
                alpha = 0.05,
                iter_num_no_change = 5,
                intercept = True,
                learning_rate = 'optimal',
                initial_eta = 0.05,
                local_sgd_iterations = 20
)

In [None]:
glm_model.result

In [None]:
glm_model_out = glm_model.result.to_pandas().reset_index()
feat_imp = glm_model_out[glm_model_out['attribute'] > 0].sort_values(by = 'estimate', ascending = False)

# Specify figure size
fig, ax = plt.subplots(figsize = (10, 8))

# Use ax.barh() for horizontal bar chart
ax.barh(feat_imp['predictor'], feat_imp['estimate'], edgecolor = 'red')

# Add text labels on right of the bars
for x, y in zip(feat_imp['estimate'], feat_imp['predictor']):
    ax.text(x, y, str(round(x, 2)), ha = 'left', va = 'center')

# Set y-axis label
ax.set_xlabel('Estimate')

plt.title('Feature importance')

plt.show()

from IPython.display import display, HTML

feat1, feat2, feat3, feat4 = (
    list(feat_imp.iloc[:2].predictor)[0],
    list(feat_imp.iloc[:2].predictor)[1],
    list(feat_imp.iloc[-2:].predictor)[0],
    list(feat_imp.iloc[-2:].predictor)[1],
)

html_content = f"""
<p style='font-size:16px;font-family:Arial'>
The feature importances displayed above indicate that <b>{feat1}, {feat2}, {feat3}, and {feat4}</b> are significant factors in predicting heart failure deaths.
<br>
Note that the top ones (negative coefficients) negatively affect the target variable, and that the bottom ones (positive coefficients) positively affect the target variable.
</p>
"""

display(HTML(html_content))

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Model Validation</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The TD_GLMPredict function predicts target values (regression) and class labels (classification) for test data using a GLM model trained by the GLM function.
<br>
<br>
Similar to GLM, input features should be standardized, such as using ScaleFit, and ScaleTransform, before using them in the function. The function takes only numeric features. We must convert the categorical features to numeric values before prediction.</p>

In [None]:
glm_prediction = TDGLMPredict(
                            newdata = df_test,
                            id_column = 'id',
                            object = glm_model.result,
                            accumulate = 'death',
                            output_prob = True,
                            output_responses = ['0', '1']
)

out = glm_prediction.result.assign(prediction = glm_prediction.result.prediction.cast(type_ = BYTEINT))
out = out.assign(prediction = out.prediction.cast(type_ = VARCHAR(2)))
out = out.assign(death = out.death.cast(type_ = VARCHAR(2)))

In [None]:
ClassificationEvaluator_obj = ClassificationEvaluator(
                                                        data = out,
                                                        observation_column = 'death',
                                                        prediction_column = 'prediction',
                                                        labels = ['0', '1']
)

In [None]:
glm_result = out.to_pandas()
glm_result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the above result, the column <b>death</b> is ground truth, <b>prediction</b> is the predicted output and <b>(prob_0, prob_1)</b> are probabilities of the output class.</p>

In [None]:
metrics_glm = ClassificationEvaluator_obj.output_data.to_pandas()
metrics_glm['Metric'] = metrics_glm['Metric'].str.strip('\x00')
metrics_glm

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Recall is more important where <b>Overlooked Cases (False Negatives)</b> are more costly than <b>False Alarms (False Positive)</b>. Because not “capturing” even one case of heart failure could result in death, the models should place emphasis on the recall score. It is far preferable to not “miss” anyone with heart failure even if that means “flagging” some patients as having heart failure that actually do not have the disease.
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let us consider one example. Here we check for patient number 856.</p>

In [None]:
heart_failure[heart_failure['id'] == 856]

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Based on the data provided, the individual in question has hypertension, arrhythmia, and an age of 98. These features have been identified as significant predictors of mortality by our model. Therefore, there was a high probability of death for this patient, which is further supported by the outcome indicating death.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>6. Visualize the results</b>

In [None]:
# Compute confusion matrix
cm = confusion_matrix(glm_result['death'], glm_result['prediction'])

# Plot confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix = cm, display_labels = ['No Death', 'Death'])
fig, ax = plt.subplots(figsize = (8, 8))
disp.plot(ax = ax, cmap = 'Blues', colorbar = True)

# Add labels and annotations
plt.title('GLM Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.xticks(ticks = [0, 1], labels = ['No Death', 'Death'])
plt.yticks(ticks = [0, 1], labels = ['No Death', 'Death'])

# Add text to the plot to show the actual values of the confusion matrix
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, f'{cm[i, j]}', ha = 'center', va = 'center', color = 'white' if cm[i, j] > cm.max()/1.4 else 'black')

# Remove grid lines
ax.grid(False)

# Show the plot
plt.show()

print(f'''
This means that out of all the actual non-death cases ({cm[0][0] + cm[0][1]}),
{round(cm[0][0]/(cm[0][0] + cm[0][1])*100, 2)}% were correctly classified as non-death, while
{round(cm[0][1]/(cm[0][0] + cm[0][1])*100, 2)}% were incorrectly classified as death.
Similarly, out of all the actual death cases ({cm[1][0] + cm[1][1]}),
{round(cm[1][1]/(cm[1][0] + cm[1][1])*100, 2)}% were correctly classified as death, while
{round(cm[1][0]/(cm[1][0] + cm[1][1])*100, 2)}% were incorrectly classified as non-death.
''')

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>7. Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
tables = ['heart_failure_clean', 'heart_failure_train', 'heart_failure_test']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name = table)
    except:
        pass

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_HeartFailure');"        # Takes 5 seconds

In [None]:
remove_context()

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Dataset:</b>

- `id`: patient id
- `death`: If the patient is deceased(boolean)
- `los`: hospital length of stay (in nights)
- `age`: age of the patient (in years)
- `gender`: gender of the patient (1-male, 2-female)
- `cancer`: If the patient has cancer (boolean)
- `cabg`: If the patient has gone through previous heart bypass i.e. Coronary Artery Bypass Graft procedure (boolean)
- `crt`: If the patient has gone through Cardiac Resynchronization Therapy i.e. a device - a treatment for heart failure(boolean)
- `defib`: If the patient has defibrillator implanted (boolean)
- `dementia`: If the patient has dementia (boolean)
- `diabetes`: If the patient has diabetes (boolean)
- `hypertension`: If the patient has hypertension (boolean)
- `ihd`: If the patient has Ischemic Heart Disease (boolean)
- `mental_health`: If the patient has been diagnosed with mental health issues (boolean)
- `arrhythmias`: If the patient has arrhythmia (boolean)
- `copd`: If the patient has Chronic Obstructive Pulmonary Disease (boolean)
- `obesity`: If the patient has obesity (boolean)
- `pvd`: If the patient has Peripheral Vascular Disease (boolean)
- `renal_disease`: If the patient has Renal Disease (boolean)
- `valvular_disease`: If the patient has Valvular Disease (boolean)
- `metastatic_cancer`: If the patient has Metastatic Cancer (boolean)
- `pacemaker`: If the patient has pacemaker (boolean)
- `pneumonia`: If the patient has pneumonia (boolean)
- `prior_appts_attended`: Number of outpatient appointments attended in the previous year
- `prior_dnas`: Number of outpatient appointments missed in the previous year
- `pci`: If the patient has gone though Percutaneous Coronary Intervention procedure (boolean)
- `stroke`: History of stroke
- `senile`: If the patient has Senile amyloidosis (SSA) (boolean)
- `quintile`: Socio-economic status for patient's neighborhood, from 1 (most affluent) to 5 (poorest)
- `ethnicgroup`: 1 - White, 2 - Black, 3 - Indian Subcontinent, 8 - Not Known, 9 - Other 
- `fu_time`: Follow-up time, i.e. time in days since admission to hospital

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023. All Rights Reserved
        </div>
    </div>
</footer>