# <b><span style='color:#F1A424'>AutoML - Regression - Advertisment Sales Prediction </span> </b>

### Disclaimer
The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.

## <b> Problem overview:</b>
    

**Dataset Used : Advertising Sales Dataset**

**Features**:

- `TV`: Advertising done on TV.
- `Radio`: Advertising done on Radio.
- `Newspapaer`: Advertising done on Newspaper.

**Target Variable**:

- `Sales`: The Sales received after advertisement.
    
**Objective**:

The primary objective is typically to build a model that can accurately predict sales received based on advertisement.

**Usecase**:

Here, we will use AutoML(Automated Machine Learning) functionality to automate the entire process of developing a predictive model. It will perform feature exploration, feature engineering, data preparation, model training and evaluation on dataset in auto run and at end we will get leaderboard containined different models along with their performance. Model will also have rank associated with them which indicates which is best performing model for given data followed by other models.

In [1]:
# Importing AutoML from teradataml
from teradataml import AutoML, AutoRegressor

In [2]:
# Importing other important libraries
import getpass
from teradataml import create_context, remove_context
from teradataml import DataFrame
from teradataml import load_example_data
from teradataml import TrainTestSplit

In [3]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")

con = create_context(host=host, username=username, password=password)

Host:  ········
Username:  ········
Password:  ········


## <b><span style='color:#F1A424'>| 1.</span> Loading Deployed Models - 'Advertising_top_5_models' </b>

### <b><span style='color:#F1A424'>| 1.1.</span> Loading Model </b>

In [4]:
# Creating AutoML object

aml=AutoML()

In [5]:
# Loading models

models_1 = aml.load('Advertising_top_5_models')

In [6]:
# Display loaded models

models_1

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,MAE,MSE,MSLE,MAPE,MPE,RMSE,RMSLE,ME,R2,EV,MPD,MGD,DATA_TABLE,ADJUSTED_R2
0,1,KNN_1,rfe,1.05579,2.023639,0.009446,7.151976,2.392202,1.422547,0.097189,4.203217,0.922555,0.9273,0.14903,0.011623,ml__sales_rfe_1723391384424642,0.921555
1,2,GLM_1,rfe,1.229208,2.311243,0.011298,8.779032,-0.090825,1.520277,0.106292,3.139551,0.911548,0.912892,0.164516,0.01324,ml__sales_rfe_1723391384424642,0.910406
2,3,DECISIONFOREST_1,rfe,1.529687,3.415041,0.017554,11.22477,-2.792731,1.847983,0.132491,3.88,0.869305,0.86933,0.250371,0.020938,ml__sales_rfe_1723391384424642,0.867619
3,4,GLM_2,pca,1.647833,4.197219,0.01933,12.127324,-3.956709,2.048712,0.139032,4.823611,0.839371,0.839389,0.281034,0.02187,ml__sales_pca_1723396755457502,0.836242
4,5,XGBOOST_2,pca,1.594226,4.208329,0.019006,10.414319,2.439263,2.051421,0.137864,5.575339,0.838945,0.846603,0.293122,0.024563,ml__sales_pca_1723396755457502,0.835808


### <b><span style='color:#F1A424'>| 1.2.</span> Get Loaded Model Hyperparameters</b>

In [7]:
aml.model_hyperparameters(rank=1, use_loaded_models=True)

{'response_column': 'sales',
 'name': 'knn',
 'model_type': 'Regression',
 'k': 5,
 'id_column': 'id',
 'voting_weight': 1.0,
 'persist': False,
 'max_models': 1}

In [8]:
aml.model_hyperparameters(rank=5, use_loaded_models=True)

{'response_column': 'sales',
 'name': 'xgboost',
 'model_type': 'Regression',
 'column_sampling': 1,
 'min_impurity': 0.1,
 'lambda1': 0.1,
 'shrinkage_factor': 0.5,
 'max_depth': 7,
 'min_node_size': 2,
 'iter_num': 10,
 'seed': 42,
 'persist': False,
 'max_models': 1}

### <b><span style='color:#F1A424'>| 1.3.</span> Loading Dataset</b>

In [9]:
### Loading Dataset for Prediction

load_example_data('teradataml','advertising')
df = DataFrame('advertising')



In [10]:
# Display data

df

TV,radio,newspaper,sales
228.0,37.7,32.0,21.5
239.3,15.5,27.3,20.7
241.7,38.0,23.2,21.8
7.3,28.1,41.4,5.5
296.4,36.3,100.9,23.8
230.1,37.8,69.2,22.1
199.1,30.6,38.7,18.3
163.3,31.6,52.9,16.9
94.2,4.9,8.1,14.0
218.5,5.4,27.4,17.2


### <b><span style='color:#F1A424'>| 1.4.</span> Generating Prediction & Performance Metrics</b>

In [11]:
# Generate prediction using some data rows and model rank

prediction = aml.predict(df, rank=1)

Generating prediction using:
Model Name: KNN
Feature Selection: rfe
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [12]:
prediction

id,prediction,sales
22,7.599999001202244,7.6
14,19.799998979026974,19.8
19,16.800001403614925,16.8
12,17.299999182527102,17.3
41,20.89999715407641,20.9
48,5.600001720321714,5.6
28,17.100002210555594,17.1
31,21.50000000268524,21.5
10,16.996971959116028,15.0
15,17.199999802602743,17.2


In [13]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=1)

Generating performance metrics using:
Model Name: KNN
Feature Selection: rfe


In [14]:
performance_metric


############ result Output ############

        MAE       MSE     MSLE      MAPE       MPE      RMSE     RMSLE        ME        R2        EV       MPD       MGD
0  0.154975  0.255818  0.00118  1.179964 -0.389643  0.505785  0.034353  4.345287  0.990791  0.990814  0.017475  0.001366


In [15]:
# Generate prediction using data and model rank

prediction = aml.predict(df, rank=4)

Generating prediction using:
Model Name: GLM
Feature Selection: pca


In [16]:
prediction

id,prediction,sales
40,12.93550309373279,10.4
80,22.717112682919623,26.2
99,18.68915996247545,17.1
61,17.030076973964476,19.6
221,12.687918793150503,12.6
17,9.234658386551873,5.3
162,12.319389225147916,11.9
122,9.23974220603845,6.9
101,21.667869588148697,25.4
183,16.698419146041598,18.0


In [17]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=4)

Generating performance metrics using:
Model Name: GLM
Feature Selection: pca


In [18]:
performance_metric


############ result Output ############

       MAE       MSE      MSLE       MAPE       MPE      RMSE     RMSLE        ME        R2        EV       MPD       MGD
0  1.69903  4.566835  0.035048  16.376698 -9.841271  2.137015  0.187211  7.770491  0.835607  0.840585  0.366177  0.037724


## <b><span style='color:#F1A424'>| 2.</span> Loading Deployed Models - 'Advertising_mixed_models' </b>

### <b><span style='color:#F1A424'>| 2.1.</span> Loading Model </b>

In [19]:
# Loading models

models_2 = aml.load('Advertising_mixed_models')

In [20]:
models_2

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,MAE,MSE,MSLE,MAPE,MPE,RMSE,RMSLE,ME,R2,EV,MPD,MGD,DATA_TABLE,ADJUSTED_R2
0,1,GLM_1,rfe,1.229208,2.311243,0.011298,8.779032,-0.090825,1.520277,0.106292,3.139551,0.911548,0.912892,0.164516,0.01324,ml__sales_rfe_1723397854656713,0.910406
1,2,GLM_2,pca,1.647833,4.197219,0.01933,12.127324,-3.956709,2.048712,0.139032,4.823611,0.839371,0.839389,0.281034,0.02187,ml__sales_pca_1723389958236912,0.836242
2,3,XGBOOST_2,pca,1.594226,4.208329,0.019006,10.414319,2.439263,2.051421,0.137864,5.575339,0.838945,0.846603,0.293122,0.024563,ml__sales_pca_1723389958236912,0.835808


### <b><span style='color:#F1A424'>| 2.2.</span> Generating Prediction & Performance Metrics</b>

In [21]:
# Generate prediction using data and model rank

prediction = aml.predict(df, rank=2)

Generating prediction using:
Model Name: GLM
Feature Selection: pca


In [22]:
prediction

id,prediction,sales
40,12.93550309373279,10.4
80,22.717112682919623,26.2
99,18.68915996247545,17.1
61,17.030076973964476,19.6
221,12.687918793150503,12.6
17,9.234658386551873,5.3
162,12.319389225147916,11.9
122,9.23974220603845,6.9
101,21.667869588148697,25.4
183,16.698419146041598,18.0


In [23]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=2)

Generating performance metrics using:
Model Name: GLM
Feature Selection: pca


In [24]:
performance_metric


############ result Output ############

       MAE       MSE      MSLE       MAPE       MPE      RMSE     RMSLE        ME        R2        EV       MPD       MGD
0  1.69903  4.566835  0.035048  16.376698 -9.841271  2.137015  0.187211  7.770491  0.835607  0.840585  0.366177  0.037724


## <b><span style='color:#F1A424'>| 3.</span> Loading Deployed Models - 'Advertising_range_models' </b>

### <b><span style='color:#F1A424'>| 3.1.</span> Loading Model</b>

In [25]:
# Creating another AutoML object

obj=AutoML()

In [26]:
# Loading models

models_3 = obj.load('Advertising_range_models')

In [27]:
models_3

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,MAE,MSE,MSLE,MAPE,MPE,RMSE,RMSLE,ME,R2,EV,MPD,MGD,DATA_TABLE,ADJUSTED_R2
0,1,GLM_1,rfe,1.229208,2.311243,0.011298,8.779032,-0.090825,1.520277,0.106292,3.139551,0.911548,0.912892,0.164516,0.01324,ml__sales_rfe_1723390582128520,0.910406
1,2,DECISIONFOREST_1,rfe,1.529687,3.415041,0.017554,11.22477,-2.792731,1.847983,0.132491,3.88,0.869305,0.86933,0.250371,0.020938,ml__sales_rfe_1723390582128520,0.867619
2,3,GLM_2,pca,1.647833,4.197219,0.01933,12.127324,-3.956709,2.048712,0.139032,4.823611,0.839371,0.839389,0.281034,0.02187,ml__sales_pca_1723392367146603,0.836242
3,4,XGBOOST_2,pca,1.594226,4.208329,0.019006,10.414319,2.439263,2.051421,0.137864,5.575339,0.838945,0.846603,0.293122,0.024563,ml__sales_pca_1723392367146603,0.835808
4,5,SVM_2,pca,1.719796,4.371814,0.015838,11.118059,-1.933912,2.090888,0.12585,5.37116,0.832689,0.842874,0.266629,0.018849,ml__sales_pca_1723392367146603,0.82943
5,6,DECISIONFOREST_2,pca,1.728646,5.222329,0.020647,11.6876,0.109286,2.285242,0.143692,5.575,0.800139,0.812442,0.334104,0.02452,ml__sales_pca_1723392367146603,0.796246


### <b><span style='color:#F1A424'>| 3.2.</span> Generating Prediction & Performance Metrics</b>

In [28]:
# Generate prediction using data and model rank

prediction = obj.predict(df.iloc[:80], rank=1)

Generating prediction using:
Model Name: GLM
Feature Selection: rfe
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [29]:
prediction

id,prediction,sales
80,4.916385332034668,5.3
40,9.409142273866005,6.6
120,9.800748666616622,8.0
17,8.930491003648847,11.0
160,9.813268146069868,9.2
154,9.658644914481744,13.2
162,10.432900383398488,13.7
202,16.100690786447874,16.7
122,14.695747119612433,15.3
19,12.49374848890492,11.9


In [30]:
# Generate performance metrics

performance_metric = obj.evaluate(df.iloc[:80], rank=1)

Generating performance metrics using:
Model Name: GLM
Feature Selection: rfe
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 10/10           

In [31]:
performance_metric


############ result Output ############

        MAE       MSE      MSLE       MAPE       MPE     RMSE    RMSLE        ME        R2       EV       MPD       MGD
0  1.137393  2.916206  0.045912  17.791905 -6.704899  1.70769  0.21427  7.507286  0.666776  0.66831  0.361321  0.051383


In [32]:
remove_context()

True