<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Hospital Readmission using Teradata AutoML Functionality
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style ='font-size:18px;font-family:Arial'><b>Introduction</b></p>

<p style ='font-size:16px;font-family:Arial'>As we strive to enhance patient outcomes and optimize hospital operations, we must address a critical area: reducing hospital readmissions. Our data analysis identifies patients with a history of high hospital utilization and those with prescribed or adjusted diabetes medications as having significantly higher readmission rates. The intersection of these groups is particularly high-risk. Healthcare teams can use this information to prioritize follow-up and risk mitigation for these cohorts.</p>

<p style ='font-size:16px;font-family:Arial'><b>Key Insights and Actionable Recommendations</b></p>

<p style ='font-size:16px;font-family:Arial'><b>1. Common Primary Diagnoses by Age Group</b>
    <li style ='font-size:14px;font-family:Arial'><strong>&nbsp;&nbsp;&nbsp;&nbsp;Circulatory diagnoses</strong>: Most common across all age groups, except in the 40-50 cohort where diabetes predominates.</li>
    <li style ='font-size:14px;font-family:Arial'><strong>&nbsp;&nbsp;&nbsp;&nbsp;Respiratory diagnoses</strong>: Less prevalent in younger patients (40-50) but increasingly significant in patients aged 50+.</li>
    <li style ='font-size:14px;font-family:Arial'><strong>&nbsp;&nbsp;&nbsp;&nbsp;Diabetes</strong>: Primary diagnosis in the youngest cohort (40-50).</li>
</p>
<p style ='font-size:14px;font-family:Arial'><strong>Action</strong>: Tailor discharge planning and follow-up protocols to address circulatory and diabetes-related conditions, with attention to respiratory issues in older patients.</p>

<p style ='font-size:16px;font-family:Arial'><b>2. Impact of Diabetes on Readmission Rates</b>
    <li style ='font-size:14px;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;A diabetes diagnosis alone does not significantly increase readmission risk.</li>
    <li style ='font-size:14px;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;Patients prescribed diabetes medications or with medication changes during their stay show higher readmission rates across all age groups.</li>
    <li style ='font-size:14px;font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;This indicates that diabetes medication management, rather than the diagnosis itself, drives readmission risk.</li>
</p>
<p style ='font-size:14px;font-family:Arial'><strong>Action</strong>: Implement targeted follow-up for patients prescribed diabetes medications or with recent medication changes, including enhanced education and monitoring.</p>


<p style ='font-size:16px;font-family:Arial'><strong> Strategic Recommendations </strong>
    <li style ='font-size:14px;font-family:Arial'><strong>Prioritize High-Risk Cohorts</strong>: Focus follow-up efforts on patients with high hospital utilization and those prescribed or adjusting diabetes medications.</li>
    <li style ='font-size:14px;font-family:Arial'><strong>Enhance Discharge Planning</strong>: Develop customized discharge plans for patients with circulatory, diabetes, and respiratory diagnoses, emphasizing medication management.</li>
    <li style ='font-size:14px;font-family:Arial'><strong>Strengthen Post-Discharge Follow-Up</strong>: Conduct follow-up calls within 48-72 hours for high-risk patients to ensure medication adherence and address issues.</li>
    <li style ='font-size:14px;font-family:Arial'><strong>Educate and Train Staff</strong>: Train clinical staff on identifying and managing high-risk patients, focusing on diabetes medication management.</li>
</p>

<p style ='font-size:16px;font-family:Arial'><strong> Why Vantage ?</strong></p>
<p style='font-size:16px;font-family:Arial'>Automated Machine Learning (AutoML) represents a method for streamlining the entire process of machine learning pipeline in automated way. It encompasses various distinct phases of the machine learning pipeline, including feature exploration, features engineering, data preparation, model selection, model training with hyperparameters tuning, and model evaluation. By automating these tasks, AutoML eliminates the need for manual intervention by trained data scientists and reduces the prerequisite knowledge required for beginners. This accessibility allows individuals of varying expertise levels to effortlessly use AutoML to create machine learning models in an automated fashion.</p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>1. Connect to Vantage.</b></p>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).</p>

In [None]:
import os
import getpass
from teradataml import *
# Modify the following to match the specific client environment settings
display.max_rows = 5
configure.val_install_location = 'val'
configure.byom_install_location = 'mldb'

<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell. Begin running steps with Shift + Enter keys.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=Hospital_Readmission_AutoML.ipynb;' UPDATE FOR SESSION; ''')

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>2. Getting Data for This Demo </b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>


In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_HospitalReadmission_cloud');"
 # Takes about 20 seconds
%run -i ../run_procedure.py "call get_data('DEMO_HospitalReadmission_local');"
 # Takes about 40 seconds

<p style = 'font-size:16px;font-family:Arial'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>3. Analyze raw data.</b></p>


<p style = 'font-size:16px;font-family:Arial'>Let us start by creating a "Virtual DataFrame" that points directly to the dataset in Vantage. We begin our analysis by obtaining the necessary data types for columns and extract values such as Sales_week, Sales_year, etc., from the Sales_date column. These extracted values will be used in our subsequent analysis.</p>

In [None]:
# Fetching in teradata dataframe.
patient_df = DataFrame(in_schema("DEMO_HospitalReadmission","Patients_Data")) 
patient_df

<br>
<p style = 'font-size:16px;font-family:Arial'>This method, <code>describe()</code>, will generate statistics for numeric columns. It computes the count, mean, std, min, percentiles, and max for numeric columns.<br>
Default statistics include: "count", "mean", "std", "min", "percentile", "max"</p>

In [None]:
patient_df.describe()

<p style = 'font-size:16px;font-family:Arial;'> We have our patient dataset which needs to be split into <code>train</code> and <code>test</code> datasets.</p>

<p style = 'font-size:16px;font-family:Arial;'>We'll split these with a 80:20 ratio.</p>

In [None]:
# Performing sampling to get 80% for train and 20% for test
tdf_sample = patient_df.sample(frac = [0.8, 0.2])

# Fetching train and test data
tdf_train= tdf_sample[tdf_sample['sampleid'] == 1].drop('sampleid', axis=1)
tdf_test = tdf_sample[tdf_sample['sampleid'] == 2].drop('sampleid', axis=1)



<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>3. AutoML Training</b>

<p style = 'font-size:16px;font-family:Arial;'>AutoML (Automated Machine Learning) is an approach that automates the process of building, training, and validating machine learning models. It involves various algorithms to automate various aspects of the machine learning workflow, such as data preparation, feature engineering, model selection, hyperparameter tuning, and model deployment. It aims to simplify the process of building machine learning models, by automating some of the more time-consuming and labor-intensive tasks involved in the process.</p>

<p style = 'font-size:16px;font-family:Arial;'>Below image shows the steps that are automated as a part of the AutoML approach using Vantage</p>

<center><img src="images/Approach.png" alt="efs" width="500" height="500" style="border: 4px solid #404040; border-radius: 10px;"></center>

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial;'>3.1. AutoML Training</b>
<p style = 'font-size:16px;font-family:Arial;'>Now we can create a <code>AutoClassifier</code> instance which is a special purpose AutoML feature to run classification specific tasks. We use the <code>exclude</code> parameter to specify model algorithms to be excluded from model training phase. Here we exclude the 'knn' model. The <code>max_runtime_secs</code> specifies the time limit in seconds for model training.
<br><br>
<code>verbose</code>: specifies the detailed execution steps based on verbose level as follows:
</p>

<ul style = 'font-size:16px;font-family:Arial;'>
    <li><b>0</b>: prints the progress bar and leaderboard</li>
    <li><b>1</b>: prints the execution steps of AutoML.</li>
    <li><b>2</b>: prints the intermediate data between the execution of each step of AutoML.</li>
</ul>

In [None]:
'''
Creating AutoClassifier Instance
Selecting 'Auto' mode for AutoML training
Excluding knn and svm model from default model list for training
Used early stopping timer criteria with value 300 sec
'''

aml = AutoClassifier(
    exclude = 'knn',
    verbose = 2,
    max_runtime_secs = 300,
)

<p style = 'font-size:16px;font-family:Arial;'>Below are the different phases of AutoML:</p>
</p>
<center><img src="images/AutoML_phases.png" alt="efs" width=800 height=1200 style="border: 4px solid #404040; border-radius: 10px; padding-right:15px;"></center>

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b>Since our AutoML functionality is executing process like <code>Feature Exploration</code> and <code>Data Preparation</code> along with the <code>Model Training</code> and <code>Best Model Evaluations</code> it may take 12-15 minutes for this next step to execute.</p>
</div>

In [None]:
# Fitting train data 

aml.fit(tdf_train, 'readmitted')

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>3.2. Model Leaderboard Generation</b>

<p style = 'font-size:16px;font-family:Arial'>Next we can generate the model leaderboard and leader for a given dataset. <code>Leaderboard</code> is a ranked table with a list of models with all their evaluation metrics.</p>

In [None]:
# Fetching leaderboard

leaderboard=aml.leaderboard()
leaderboard

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>3.3 Best Performing Model</b>

<p style = 'font-size:16px;font-family:Arial'>The following function displays the best performing model.</p>

In [None]:
# Fetching best performing model
aml.leader()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>4. Prediction</b>

<p style = 'font-size:16px;font-family:Arial'>The predict function generates predictions using either the default test data or any specified dataset, based on the model's rank in the leaderboard, and displays the performance metrics of the chosen model. If the test data contains a target column, both predictions and performance metrics are displayed; otherwise, only the predictions are shown.
<br><br>
You can also use the <code>rank</code> parameter in the predict function. The <code>rank</code> parameter specifies the model's rank in the leaderboard to be used for prediction. By default, the rank is set to 1, meaning the best-performing model is used.</p>

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.1 Generating prediction on test data using Best Model</b>

<p style = 'font-size:16px;font-family:Arial'>Here, we specify the <code>tdf_test</code> dataset for prediction. When using external data instead of the default test data, the predict function applies all the data transformation steps performed during the training phase on the external data before passing the data to the model for prediction.</p>

In [None]:
# Fetching prediction and metrics on test data
prediction = aml.predict(tdf_test)

In [None]:
# Printing prediction
prediction

<b style = 'font-size:18px;font-family:Arial'>Generating predictions using 2nd Best Model</b>

In [None]:
#Prediction using the second best performing model
prediction_second = aml.predict(tdf_test, rank=2)

#Printing prediction
prediction_second

<b style = 'font-size:18px;font-family:Arial'>Generating predictions using 3rd Best Model</b>

In [None]:
prediction_third = aml.predict(tdf_test, rank=3)

#Printing prediction
prediction_third

<hr style="height:1px;border:none">
<b style = 'font-size:18px;font-family:Arial'>4.2 Generating and Comparing ROC for the Top 3 Models</b>

<p style = 'font-size:16px;font-family:Arial'>The ROC curve is a graph between TPR(True Positive Rate) and FPR(False Positive Rate). The area under the ROC curve measures how well the model can distinguish between positive and negative classes. The higher the AUC, the better the model's performance in distinguishing between the positive and negative categories. AUC above 0.75 is generally considered decent.</p>

In [None]:
#Calculating True-Positive Rate (TPR), False-Positive Rate (FPR), Threshold_values for both the models
roc_first = ROC(
    probability_column = prediction.columns[1], #"prediction",
    observation_column = "readmitted",
    positive_class = '1',
    num_thresholds = 100,
    data = prediction
)

roc_second = ROC(
    probability_column = prediction_second.columns[1], #"prediction",
    observation_column = "readmitted",
    positive_class = '1',
    num_thresholds = 100,
    data = prediction_second
)

roc_third = ROC(
    probability_column = prediction_third.columns[1], #"Prediction",
    observation_column = "readmitted",
    positive_class = '1',
    num_thresholds = 100,
    data = prediction_third
)

#Getting auc_score for both models
auc_first = roc_first.result.get_values()[0][0]
auc_second = roc_second.result.get_values()[0][0]
auc_third = roc_third.result.get_values()[0][0]

In [None]:
#first model
first_model = leaderboard.iloc[0][1]

#second model
second_model = leaderboard.iloc[1][1]

third_model = leaderboard.iloc[2][1]

#Plotting the ROC Curve
roc_second.output_data.plot(
    x = roc_first.output_data.fpr,
    y = [roc_first.output_data.tpr, roc_second.output_data.tpr, roc_third.output_data.tpr,roc_first.output_data.fpr],
    legend = [
                '{}: AUC = {}'.format(first_model,str(auc_first)),
                '{}: AUC = {}'.format(second_model,str(auc_second)),
                '{}: AUC = {}'.format(third_model,str(auc_second)),
                'Baseline: AUC = {}'.format(str(round(0.5, 4)))
             ],
    legend_style = 'lower right',
    title = 'Receiver Operating Characteristic (ROC) Curve',
    xlabel = 'False Positive Rate',
    ylabel = 'True Positive Rate',
    color = ['green', 'orange', 'blue'],
    linestyle = ['-', '-', '--']
)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>Automated Machine Learning (AutoML) Overview</b>
<p style = 'font-size:16px;font-family:Arial'>Automated Machine Learning (AutoML) streamlines the machine learning pipeline, automating feature exploration, engineering, data preparation, model selection, training, hyperparameter tuning, and evaluation. It reduces manual intervention, enabling users of all expertise levels to create effective machine learning models effortlessly.</p>
<ul style = 'font-size:16px;font-family:Arial'>
        <li style = 'font-size:16px;font-family:Arial'><strong>Feature Exploration</strong>: Analyzes features, providing insights like column summaries, categorical feature counts, outlier percentages, futile column details, and target distribution.</li>
        <li style = 'font-size:16px;font-family:Arial'><strong>Feature Engineering</strong>: Manages data anomalies (duplicates, missing values, futile columns) and applies transformations based on feature data types.</li>
        <li style = 'font-size:16px;font-family:Arial'><strong>Data Preparation</strong>: Prepares data through feature selection, scaling, and splitting into training and validation sets for model training.</li>
        <li style = 'font-size:16px;font-family:Arial'><strong>Model Training</strong>: Conducts hyperparameter tuning across multiple models to optimize performance.</li>
        <li style = 'font-size:16px;font-family:Arial'><strong>Model Evaluation</strong>: Assesses models, generating a leaderboard with performance metrics, feature selection methods, and rankings, where rank 1 is the best model.</li>
    </ul>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>Teradata AutoML Queries found in dbc.Tables - Teradata SQL Query Documentation</b>
<p></p>
<p style = 'font-size:16px;font-family:Arial'>Query to find these objects:  select * from dbc.Tables t2 where DatabaseName = 'demo_user' and TableName like '%persist_out%' order by CreateTimeStamp ASC;  BELOW IS A DESCRIPTION OF THOSE OBJECTS</p>

<ol style = 'font-size:16px;font-family:Arial'>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748531851162813" AS (SELECT * FROM TD_OrdinalEncodingTransform( ON "DEMO_USER"."ml__td_sqlmr_out__1748532332745430" AS InputTable PARTITION BY ANY ON "DEMO_USER"."ml__td_sqlmr_out__1748532110089230" AS FitTable DIMENSION USING Accumulate('time_in_hospital','n_procedures','n_lab_procedures','A1Ctest'
'age','n_outpatient','id','n_inpatient','n_medications','change','diag_1',
'n_emergency','diabetes_med','diag_2','glucose_test','medical_specialty','diag_3') ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Creates table with ordinal encoding of categorical features for ML.<br>
            <strong>Ref</strong>: <code>TD_OrdinalEncodingTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Converts categorical variables to numeric ordinals, preserving order for ML models.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748531682762671" AS (SELECT * FROM TD_OneHotEncodingTransform( ON (select * from "DEMO_USER"."ml__td_sqlmr_persist_out__1748531851162813" where decode("readmitted", null, 0, 1) = 1) AS InputTable ON "DEMO_USER"."ml__td_sqlmr_out__1748530862022377" AS FitTable DIMENSION USING IsInputDense('True') ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Applies one-hot encoding to filtered data for binary representation.<br>
            <strong>Ref</strong>: <code>TD_OneHotEncodingTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Transforms categorical variables into binary vectors for ML compatibility.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748531304257061" AS (SELECT * FROM TD_OutlierFilterTransform( ON "DEMO_USER"."ml__td_sqlmr_out__1748530828623591" AS InputTable ON "DEMO_USER"."ml__td_sqlmr_out__1748532012742512" AS FitTable DIMENSION ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Filters outliers from input table to enhance model accuracy.<br>
            <strong>Ref</strong>: <code>TD_OutlierFilterTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Removes extreme values from datasets to improve model performance.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748536280502093" AS (SELECT * FROM TD_OutlierFilterTransform( ON "DEMO_USER"."ml__td_sqlmr_persist_out__1748531304257061" AS InputTable ON "DEMO_USER"."ml__td_sqlmr_out__1748532354477023" AS FitTable DIMENSION ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Further filters outliers from processed table for data quality.<br>
            <strong>Ref</strong>: <code>TD_OutlierFilterTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Eliminates outliers in processed data to ensure robust ML results.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748532843403074" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__lasso_train_1748532042357136" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Splits lasso training data into 80/20 train/test sets.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Divides data into training and testing sets for model evaluation.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748534455423988" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__rfe_train_1748537973661062" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Splits RFE training data into 80/20 train/test sets.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Partitions RFE data for training and testing ML models.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748533402988160" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__pca_train_1748532528449658" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Splits PCA training data into 80/20 train/test sets.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Splits PCA-transformed data for model training and validation.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748532939572977" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__lasso_train_1748532042357136" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats lasso train/test split for model evaluation.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats data split for consistent lasso model testing.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748537778384075" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__rfe_train_1748537973661062" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats RFE train/test split for model validation.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats RFE data split for feature selection evaluation.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748531526411974" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__pca_train_1748532528449658" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats PCA train/test split for consistent evaluation.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats PCA data split for reliable model testing.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748532056006776" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__lasso_train_1748532042357136" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats lasso train/test split for data partitioning.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats lasso data split for robust model training.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748535329443481" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__rfe_train_1748537973661062" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats RFE train/test split for feature selection.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats RFE split for consistent feature evaluation.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748533582595585" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__pca_train_1748532528449658" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats PCA train/test split for dimensionality reduction.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats PCA split for effective model validation.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748533159230246" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__lasso_train_1748532042357136" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats lasso train/test split for model training.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats lasso split for reliable training data.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748532432304924" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__rfe_train_1748537973661062" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats RFE train/test split for robust evaluation.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats RFE split for stable feature selection.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748533635440860" AS (SELECT * FROM TD_TrainTestSplit( ON "ml__pca_train_1748532528449658" AS InputTable USING IDColumn('id') StratifyColumn('readmitted') seed(42) trainSize(0.8) testSize(0.2) ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Repeats PCA train/test split for model testing.<br>
            <strong>Ref</strong>: <code>TD_TrainTestSplit</code> (ClearScape Analytics).<br>
            <span class="ref">Repeats PCA split for consistent testing results.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748533358854821" AS (SELECT * FROM TD_OrdinalEncodingTransform( ON "DEMO_USER"."ml__td_sqlmr_out__1748532927343218" AS InputTable PARTITION BY ANY ON "DEMO_USER"."ml__td_sqlmr_out__1748532110089230" AS FitTable DIMENSION USING Accumulate('time_in_hospital','n_procedures','n_lab_procedures','A1Ctest','age','n_outpatient',
'id','n_inpatient','n_medications','change','diag_1','n_emergency','diabetes_med','diag_2'
,'glucose_test','medical_specialty','diag_3') ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Applies ordinal encoding to new input for ML preprocessing.<br>
            <strong>Ref</strong>: <code>TD_OrdinalEncodingTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Encodes new categorical data into ordinals for ML.</span></li>
        <li style = 'font-size:14px;font-family:Arial'><strong>Query</strong>: <code class="query">CREATE MULTISET TABLE "DEMO_USER"."ml__td_sqlmr_persist_out__1748531829647190" AS (SELECT * FROM TD_OneHotEncodingTransform( ON "DEMO_USER"."ml__td_sqlmr_out__1748532598241457" AS InputTable ON "DEMO_USER"."ml__td_sqlmr_out__1748530862022377" AS FitTable DIMENSION USING IsInputDense('True') ) as sqlmr) WITH DATA</code><br>
            <strong>Description</strong>: Applies one-hot encoding to processed table for ML.<br>
            <strong>Ref</strong>: <code>TD_OneHotEncodingTransform</code> (ClearScape Analytics).<br>
            <span class="ref">Converts processed categorical data to binary for ML.</span></li>
    </ol>
<p></p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>Conclusion</b>

<p style = 'font-size:16px;font-family:Arial'>Teradata's AutoML functionality plays a crucial role in this context by automating the complex process of building and deploying machine learning models. AutoML ensures the most optimal preparation and training of models, delivering high-quality machine learning models in minutes. Through hyperparameter tuning (HPT), Teradata's AutoML can automatically select the best parameters for machine learning algorithms using grid search and random search techniques, significantly enhancing model performance.
<br><br>
By leveraging Teradata's AutoML, companies can save time and reduce costs associated with manual model building and tuning. The technology not only improves the accuracy of predictive models but also democratizes the power of machine learning, allowing customers to utilize advanced analytics without requiring extensive coding or data science expertise. This capability enables companies to swiftly and effectively analyze patient data, develop predictive models, and implement proactive strategies to predict customer readmission.
<br><br>
In conclusion, Teradata's AutoML functionality is a vital tool for hospitals to predict readmission. By automating and optimizing the machine learning process, Teradata empowers various industries to make data-driven decisions that improve customer retention and drive long-term profitability.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>5. Cleanup</b></p>


<p style = 'font-size:18px;font-family:Arial'><b>Databases and Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_HospitalReadmission');" 
#Takes 45 seconds

In [None]:
remove_context()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>Required Materials</b>
<p style = 'font-size:16px;font-family:Arial'>Let’s look at the elements we have available for reference for this notebook:</p>
<p style = 'font-size:18px;font-family:Arial'><b>Filters:</b></p>
    <ul style = 'font-size:16px;font-family:Arial'>
    <li><b>Industry:</b> HealthCare</li>
    <li><b>Functionality:</b>AutoML</li>
    <li><b>Use Case:</b> Patient Readmission</li>
    </ul>
    <p style = 'font-size:18px;font-family:Arial'><b>Related Resources:</b></p>
    <ul style = 'font-size:16px;font-family:Arial'>
    <li><a href = 'https://www.teradata.com/Blogs/NPS-is-a-metric-not-the-goal'>·In the fight to improve customer experience, NPS is a metric, not the goal</a></li>
    <li><a href = 'https://www.teradata.com/Blogs/Hyper-scale-time-series-forecasting-done-right'>·Hyper-scale time series forecasting done right</a></li>
    <li><a href = 'https://www.teradata.com/Resources/Datasheets/Digital-Identity-Management-and-Great-CX?utm_campaign=i_coremedia-AMS&utm_source=google&utm_medium=paidsearch&utm_content=GS_CoreMedia_NA-US_BKW&utm_creative=Brand-Vantage&utm_term=teradata%20analytic%20platform&gclid=Cj0KCQjwnMWkBhDLARIsAHBOftrWZxDktHkKMsaWjMmNRnQ6Ys-bZBAUhXjWTo1Xa02fsci-IHWBV_waAppkEALw_wcB'>·Close the Gap Between Digital Identity Management and Great Customer Experiences</a></li>
        </ul>

<p style = 'font-size:18px;font-family:Arial'><b>Reference Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'> 
       <li>Teradata Vantage™ - Analytics Database Analytic Functions - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Analytics-Database-Analytic-Functions-17.20/Introduction-to-Analytics-Database-Analytic-Functions '>https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Analytics-Database-Analytic-Functions-17.20/Introduction-to-Analytics-Database-Analytic-Functions </a></li>    
  <li>Teradata® Package for Python User Guide - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/Introduction-to-Teradata-Package-for-Python'>https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/Introduction-to-Teradata-Package-for-Python</a></li>
  <li>Teradata® Package for Python Function Reference - 17.20: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/Teradata-Package-for-Python-Function-Reference'>https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/Teradata-Package-for-Python-Function-Reference</a></li>      
</ul>

<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>
<ul style = 'font-size:14px;font-family:Arial'>
    <li><strong>age</strong>: Age bracket of the patient (e.g., [70-80)).</li>
    <li><strong>time_in_hospital</strong>: Days spent in the hospital (from 1 to 14).</li>
    <li><strong>n_procedures</strong>: Number of procedures performed during the hospital stay.</li>
    <li><strong>n_lab_procedures</strong>: Number of laboratory procedures performed during the hospital stay.</li>
    <li><strong>n_medications</strong>: Number of medications administered during the hospital stay.</li>
    <li><strong>n_outpatient</strong>: Number of outpatient visits in the year before the hospital stay.</li>
    <li><strong>n_inpatient</strong>: Number of inpatient visits in the year before the hospital stay.</li>
    <li><strong>n_emergency</strong>: Number of visits to the emergency room in the year before the hospital stay.</li>
    <li><strong>medical_specialty</strong>: The specialty of the admitting physician (e.g., Missing, Cardiology).</li>
    <li><strong>diag_1</strong>: Primary diagnosis (e.g., Circulatory, Respiratory, Digestive).</li>
    <li><strong>diag_2</strong>: Secondary diagnosis.</li>
    <li><strong>diag_3</strong>: Additional secondary diagnosis.</li>
    <li><strong>glucose_test</strong>: Whether the glucose serum result was high (> 200), normal, or not performed.</li>
    <li><strong>A1Ctest</strong>: Whether the A1C level was high (> 7%), normal, or not performed.</li>
    <li><strong>change</strong>: Whether there was a change in the diabetes medication ('yes' or 'no').</li>
    <li><strong>diabetes_med</strong>: Whether a diabetes medication was prescribed ('yes' or 'no').</li>
    <li><strong>readmitted</strong>: Whether the patient was readmitted at the hospital ('yes' or 'no').</li></ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>