# <b><span style='color:#F1A424'>AutoML - Multiclass Classification - BMI Value Prediction </span> </b>

### Disclaimer
Please note, the Vantage Functions via SQLAlchemy feature is a preview/beta code release with limited functionality (the “Code”). As such, you acknowledge that the Code is experimental in nature and that the Code is provided “AS IS” and may not be functional on any machine or in any environment. TERADATA DISCLAIMS ALL WARRANTIES RELATING TO THE CODE, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES AGAINST INFRINGEMENT OF THIRD-PARTY RIGHTS, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

TERADATA SHALL NOT BE RESPONSIBLE OR LIABLE WITH RESPECT TO ANY SUBJECT MATTER OF THE CODE UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER THEORY 
    (A) FOR LOSS OR INACCURACY OF DATA OR COST OF PROCUREMENT OF SUBSTITUTE GOODS, SERVICES OR TECHNOLOGY, OR 
    (B) FOR ANY INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO LOSS OF REVENUES AND LOSS OF PROFITS. TERADATA SHALL NOT BE RESPONSIBLE FOR ANY MATTER BEYOND ITS REASONABLE CONTROL.

Notwithstanding anything to the contrary: 
    (a) Teradata will have no obligation of any kind with respect to any Code-related comments, suggestions, design changes or improvements that you elect to provide to Teradata in either verbal or written form (collectively, “Feedback”), and 
    (b) Teradata and its affiliates are hereby free to use any ideas, concepts, know-how or techniques, in whole or in part, contained in Feedback: 
        (i) for any purpose whatsoever, including developing, manufacturing, and/or marketing products and/or services incorporating Feedback in whole or in part, and 
        (ii) without any restrictions or limitations, including requiring the payment of any license fees, royalties, or other consideration. 

## <b> Problem overview:</b>
    

**Dataset used: BMI Dataset**

**Features**:

- `gender`: Gender of person.
- `height`: Height of person.
- `weight`: Weight of person.

**Target Variable**:

- `bmi`: BMI value of person.

        
**Objective**:

The primary objective is typically to build a model that can accurately predict BMI of person.

**Usecase**:

Here, we will use AutoML(Automated Machine Learning) functionality to automate the entire process of developing a predictive model. It will perform feature exploration, feature engineering, data preparation, model training and evaluation on dataset in auto run and at end we will get leaderboard containined different models along with their performance. Model will also have rank associated with them which indicates which is best performing model for given data followed by other models.

In [1]:
# Importing AutoML from teradataml
from teradataml import AutoML, AutoClassifier

In [2]:
# Importing other important libraries
import getpass
from teradataml import create_context, remove_context
from teradataml import DataFrame
from teradataml import load_example_data
from teradataml import TrainTestSplit

In [3]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")

con = create_context(host=host, username=username, password=password)

Host:  ········
Username:  ········
Password:  ········


## <b><span style='color:#F1A424'>| 1.</span> Loading Deployed Models - 'BMI_top_3_models' </b>

### <b><span style='color:#F1A424'>| 1.1.</span> Loading Model </b>

In [4]:
# Creating AutoML object

aml=AutoML()

In [6]:
# Loading models

models_1 = aml.load('BMI_top_3_models')

In [7]:
# Display loaded models


models_1

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,XGBOOST_1,lasso,0.888889,0.888889,0.888889,0.888889,0.893743,0.890104,0.887555,0.894391,0.888889,0.887258,ml__bmi_lasso_1723380027544739
1,2,XGBOOST_5,pca,0.851351,0.851351,0.851351,0.851351,0.854743,0.805479,0.823015,0.858455,0.851351,0.850751,ml__bmi_pca_1723379618418365
2,3,XGBOOST_4,pca,0.432432,0.432432,0.432432,0.432432,0.214548,0.278646,0.234151,0.309902,0.432432,0.350252,ml__bmi_pca_1723379618418365


### <b><span style='color:#F1A424'>| 1.2.</span> Get Loaded Model Hyperparameters</b>

In [8]:
aml.model_hyperparameters(rank=1, use_loaded_models=True)

{'response_column': 'bmi',
 'name': 'xgboost',
 'model_type': 'Classification',
 'column_sampling': 1,
 'min_impurity': 0.0,
 'lambda1': 1,
 'shrinkage_factor': 0.5,
 'max_depth': 7,
 'min_node_size': 1,
 'iter_num': 10,
 'seed': 42,
 'persist': False,
 'output_prob': True,
 'output_responses': ['2', '4', '3', '5'],
 'max_models': 2}

In [9]:
aml.model_hyperparameters(rank=3, use_loaded_models=True)

{'response_column': 'bmi',
 'name': 'xgboost',
 'model_type': 'Classification',
 'column_sampling': 0.6,
 'min_impurity': 0.0,
 'lambda1': 10,
 'shrinkage_factor': 0.1,
 'max_depth': 8,
 'min_node_size': 2,
 'iter_num': 10,
 'seed': 42,
 'persist': False,
 'output_prob': True,
 'output_responses': ['2', '4', '3', '5'],
 'max_models': 2}

### <b><span style='color:#F1A424'>| 1.3.</span> Loading Dataset</b>

In [10]:
### Loading Dataset for Prediction

load_example_data('teradataml','bmi')
df = DataFrame('bmi')



In [11]:
# Display data

df

gender,height,weight,bmi
Male,149,61,3
Male,147,92,5
Male,154,111,5
Male,174,90,3
Male,155,51,2
Male,191,79,2
Female,185,110,4
Female,195,104,3
Female,169,103,4
Female,159,80,4


### <b><span style='color:#F1A424'>| 1.4.</span> Generating Prediction & Performance Metrics</b>

In [12]:
# Generate prediction using some data rows and model rank

prediction = aml.predict(df, rank=1)

Generating prediction using:
Model Name: XGBOOST
Feature Selection: lasso
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [13]:
prediction

id,Prediction,Prob_2,Prob_4,Prob_3,Prob_5,bmi
703,2,0.9852497520459084,0.0037820612468568,0.0074803169524354,0.003487869754799,2
844,5,0.0084488345002769,0.0345317953129166,0.0153101882104333,0.941709181976373,5
644,2,0.7704632887868583,0.0847136721920492,0.1064517688224204,0.038371270198672,2
191,4,0.0041326590070273,0.971525996089602,0.0134168989376877,0.0109244459656826,4
1703,2,0.974802107569527,0.0110766078508702,0.0074839631443105,0.0066373214352921,2
380,2,0.9747610818546728,0.0110761416785551,0.0075257343711065,0.0066370420956656,2
660,2,0.9692765871483442,0.0073178239326359,0.0180372850432594,0.0053683038757603,2
951,4,0.0094136106539428,0.898795545213552,0.0832077576052015,0.0085830865273035,2
1532,5,0.0059976557523382,0.0295652204275693,0.0213537567618595,0.9430833670582328,5
167,4,0.0045237682497516,0.95976583237339,0.0160706934410366,0.0196397059358215,4


In [14]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=1)

Generating performance metrics using:
Model Name: XGBOOST
Feature Selection: lasso


In [15]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.892000
1       5     Macro-Precision     0.579769
2       6        Macro-Recall     0.630184
3       7            Macro-F1     0.599494
4       9     Weighted-Recall     0.892000
5      10         Weighted-F1     0.865583
6       8  Weighted-Precision     0.848340
7       4            Micro-F1     0.892000
8       2     Micro-Precision     0.892000
9       1            Accuracy     0.892000


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  CLASS_4  CLASS_5  CLASS_6  Precision    Recall        F1  Support
SeqNum                                                                                                                  
2               2  CLASS_3       13       22       65        4        0        0   0.625000  0.942029  0.751445       69
4               4  CLASS_5        0        0        1        4      127   

In [16]:
# Generate prediction using data and model rank

prediction = aml.predict(df, rank=3)

Generating prediction using:
Model Name: XGBOOST
Feature Selection: pca


In [17]:
prediction

id,Prediction,Prob_2,Prob_4,Prob_3,Prob_5,bmi
1532,5,0.1522663361341638,0.2308576297118222,0.1840615474933504,0.4328144866606634,5
703,4,0.1852526765404861,0.2994398948179398,0.251078016195688,0.264229412445886,2
844,5,0.1519524177091444,0.2324433246699238,0.1836820787771344,0.4319221788437972,5
1703,4,0.2371389707431206,0.2854789636025671,0.2049469865968072,0.2724350790575049,2
196,2,0.2784804984812272,0.2641293733050792,0.2165784924587093,0.2408116357549841,2
1831,5,0.2061567819480153,0.2481811583550831,0.184705738631225,0.3609563210656765,5
660,5,0.1927340815983844,0.2746016283712899,0.2054427534876824,0.3272215365426432,2
191,4,0.1902379390914333,0.3219439943029892,0.2483225732754686,0.2394954933301087,4
47,4,0.1950080929850465,0.3067702389893672,0.2545491803100029,0.2436724877155832,3
1175,5,0.2061567819480153,0.2481811583550831,0.184705738631225,0.3609563210656765,4


In [18]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=3)

Generating performance metrics using:
Model Name: XGBOOST
Feature Selection: pca


In [19]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.492000
1       5     Macro-Precision     0.299825
2       6        Macro-Recall     0.242562
3       7            Macro-F1     0.210418
4       9     Weighted-Recall     0.492000
5      10         Weighted-F1     0.398458
6       8  Weighted-Precision     0.433836
7       4            Micro-F1     0.492000
8       2     Micro-Precision     0.492000
9       1            Accuracy     0.492000


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  CLASS_4  CLASS_5  CLASS_6  Precision    Recall        F1  Support
SeqNum                                                                                                                  
2               2  CLASS_3        0        2        4        1        0        2   0.444444  0.057971  0.102564       69
3               3  CLASS_4        1        1        0        2        1   

## <b><span style='color:#F1A424'>| 2.</span> Loading Deployed Models - 'BMI_mixed_models' </b>

### <b><span style='color:#F1A424'>| 2.1.</span> Loading Model </b>

In [20]:
# Loading models

models_2 = aml.load('BMI_mixed_models')

In [21]:
models_2

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,XGBOOST_5,pca,0.851351,0.851351,0.851351,0.851351,0.854743,0.805479,0.823015,0.858455,0.851351,0.850751,ml__bmi_pca_1723379823021962
1,2,XGBOOST_2,rfe,0.396825,0.396825,0.396825,0.396825,0.398428,0.397396,0.394433,0.399594,0.396825,0.394798,ml__bmi_rfe_1723385483489996


### <b><span style='color:#F1A424'>| 2.2.</span> Generating Prediction & Performance Metrics</b>

In [22]:
# Generate prediction using data and model rank

prediction = aml.predict(df.iloc[:80], rank=2)

Generating prediction using:
Model Name: XGBOOST
Feature Selection: rfe
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [23]:
prediction

id,Prediction,Prob_2,Prob_4,Prob_3,Prob_5,bmi
416,4,0.2243578675660911,0.3768719219215852,0.1839555955727904,0.2148146149395332,5
456,2,0.3343996727955216,0.1799735512741052,0.1989564392910261,0.286670336639347,5
64,5,0.1184568018840484,0.173541255164818,0.1548358236098994,0.5531661193412342,5
568,5,0.1505389820729077,0.2089069772058458,0.3077971825165772,0.3327568582046692,3
296,4,0.2762875877236205,0.3114566950116919,0.2265338332033145,0.1857218840613731,4
184,5,0.1630719653126512,0.2262993337100123,0.2501684793689671,0.3604602216083692,2
480,3,0.2120762988967763,0.2559602375587901,0.3669597284830767,0.1650037350613569,4
512,4,0.2762875877236205,0.3114566950116919,0.2265338332033145,0.1857218840613731,4
240,5,0.1630719653126512,0.2262993337100123,0.2501684793689671,0.3604602216083692,5
32,5,0.2615270649069253,0.2420929581139058,0.2144313792302615,0.2819485977489072,5


In [24]:
# Generate performance metrics

performance_metric = aml.evaluate(df.iloc[:80], rank=2)

Generating performance metrics using:
Model Name: XGBOOST
Feature Selection: rfe
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [25]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.437500
1       5     Macro-Precision     0.301410
2       6        Macro-Recall     0.312886
3       7            Macro-F1     0.297066
4       9     Weighted-Recall     0.437500
5      10         Weighted-F1     0.427887
6       8  Weighted-Precision     0.445418
7       4            Micro-F1     0.437500
8       2     Micro-Precision     0.437500
9       1            Accuracy     0.437500


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  CLASS_4  CLASS_5  CLASS_6  Precision    Recall        F1  Support
SeqNum                                                                                                                  
5               5  CLASS_6        0        0        2        3        7       13   0.520000  0.481481  0.500000       27
2               2  CLASS_3        2        0        9        3        3   

## <b><span style='color:#F1A424'>| 3.</span> Loading Deployed Models - 'BMI_range_models' </b>

### <b><span style='color:#F1A424'>| 3.1.</span> Loading Model</b>

In [26]:
# Creating another AutoML object

obj=AutoML()

In [27]:
# Loading models

models_3 = obj.load('BMI_range_models')

In [28]:
models_3

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,XGBOOST_4,pca,0.432432,0.432432,0.432432,0.432432,0.214548,0.278646,0.234151,0.309902,0.432432,0.350252,ml__bmi_pca_1723380955260316
1,2,XGBOOST_2,rfe,0.396825,0.396825,0.396825,0.396825,0.398428,0.397396,0.394433,0.399594,0.396825,0.394798,ml__bmi_rfe_1723380015568744
2,3,XGBOOST_3,rfe,0.396825,0.396825,0.396825,0.396825,0.398428,0.397396,0.394433,0.399594,0.396825,0.394798,ml__bmi_rfe_1723380015568744
3,4,XGBOOST_0,lasso,0.285714,0.285714,0.285714,0.285714,0.145889,0.29375,0.187071,0.143826,0.285714,0.183529,ml__bmi_lasso_1723387770647383


### <b><span style='color:#F1A424'>| 3.2.</span> Generating Prediction & Performance Metrics</b>

In [29]:
# Generate prediction using data and model rank

prediction = obj.predict(df.iloc[:80], rank=1)

Generating prediction using:
Model Name: XGBOOST
Feature Selection: pca
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [30]:
prediction

id,Prediction,Prob_2,Prob_4,Prob_3,Prob_5,bmi
396,4,0.2371389707431206,0.2854789636025671,0.2049469865968072,0.2724350790575049,4
628,4,0.239084859831899,0.2878215157297404,0.2066287182149677,0.2664649062233927,4
156,4,0.2390213224786273,0.2877450264976872,0.2223373367247836,0.2508963142989018,4
588,4,0.2371389707431206,0.2854789636025671,0.2049469865968072,0.2724350790575049,2
644,5,0.2061567819480153,0.2481811583550831,0.184705738631225,0.3609563210656765,2
204,5,0.2061567819480153,0.2481811583550831,0.184705738631225,0.3609563210656765,5
444,5,0.2061567819480153,0.2481811583550831,0.184705738631225,0.3609563210656765,5
268,4,0.2390213224786273,0.2877450264976872,0.2223373367247836,0.2508963142989018,4
20,4,0.1950080929850465,0.3067702389893672,0.2545491803100029,0.2436724877155832,3
276,4,0.239084859831899,0.2878215157297404,0.2066287182149677,0.2664649062233927,5


In [31]:
# Generate performance metrics

performance_metric = obj.evaluate(df.iloc[:80], rank=1)

Generating performance metrics using:
Model Name: XGBOOST
Feature Selection: pca
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [32]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.512500
1       5     Macro-Precision     0.175366
2       6        Macro-Recall     0.247575
3       7            Macro-F1     0.201136
4       9     Weighted-Recall     0.512500
5      10         Weighted-F1     0.415114
6       8  Weighted-Precision     0.361126
7       4            Micro-F1     0.512500
8       2     Micro-Precision     0.512500
9       1            Accuracy     0.512500


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  CLASS_3  CLASS_4  CLASS_5  CLASS_6  Precision    Recall        F1  Support
SeqNum                                                                                                                  
2               2  CLASS_3        0        0        0        0        0        0   0.000000  0.000000  0.000000       10
4               4  CLASS_5        2        1        4        2       16   

In [33]:
remove_context()

True