<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>deploy() and load() of LightGBM OpensourceML module</b>
</header>

<b style = 'font-size:16px;font-family:Arial;color:#E37C4D'>Disclaimer</b>

The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Introduction</b><br>
OpensourceML enables users to run opensource libraries in Teradata Vantage, without pulling the data to client.</br>
We already exposed `td_sklearn` interface object on which users can run scikit-learn classes and functions and `td_lightgbm` an interface object for lightgbm package.

In this notebook we provide different ways to use `load()` and `deploy()` methods using lightgbm OpensourceML interface object.

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Import the required libraries</b>

In [1]:
# Importing required libraries.
import getpass
from teradataml import *
from teradataml.opensource import td_lightgbm

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Connect to Vantage</b>

In [2]:
td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))

Hostname:  ········
Username:  ········
Password:  ········


In [3]:
load_example_data("openml", ["multi_model_classification"])



In [4]:
df_train = DataFrame("multi_model_classification")
df_train



col1,col2,col3,col4,label,group_column,partition_column_1,partition_column_2
1.441799942052639,-0.0078630882547522,0.1686760703419149,0.6171640072092826,0,9,1,10
0.6694011171349543,0.9652764024551586,-0.0793567059123118,0.6836973223055627,1,8,1,10
-1.876448117768668,-1.0417196347697408,-0.0483451139882223,-1.2344071410531203,0,8,0,11
0.579671001277728,-0.573365219679928,0.1606028221430908,0.014404427544737,0,9,1,10
-0.6152257788527016,-0.5464720050729892,0.0174961180361759,-0.4887201062539174,0,12,0,10
-1.0238897504244115,1.3437756557705778,-0.3375437364491432,0.1102424913748459,1,10,1,10
1.9626893537873635,1.2250596232346729,0.0285237018026106,1.3466659316222755,0,12,0,10
0.2419971248834491,1.9200179733329916,-0.2843409038906144,0.8911360218393471,1,9,1,11
-0.8105262923234027,-1.134015223960283,0.0904299635362961,-0.813586969448157,0,8,0,10
-2.869764010059474,-1.4565007703551098,-0.0961757576193859,-1.8318347434856384,0,11,1,11


In [5]:
df_x = df_train.select(["col1", "col2", "col3", "col4"])
df_y = df_train.select("label")

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Loading Dataset</b>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Dataset Creation for train()</b>

<b style = 'font-size:20px;font-family:Arial;color:#E37C4D'>Single model case</b>

In [6]:
# Training dataset.
obj_s = td_lightgbm.Dataset(df_x, df_y, silent=True, free_raw_data=False)
obj_s

<lightgbm.basic.Dataset object at 0x7f373a78e760>

<b style = 'font-size:20px;font-family:Arial;color:#E37C4D'>Multi model case</b>

In [7]:
# Training dataset.
obj_m = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=["partition_column_1", "partition_column_2"])
obj_m

   partition_column_1  partition_column_2                                              model
0                   1                  11  <lightgbm.basic.Dataset object at 0x7f373a7b09d0>
1                   0                  11  <lightgbm.basic.Dataset object at 0x7f3739e0d550>
2                   1                  10  <lightgbm.basic.Dataset object at 0x7f3739e50820>
3                   0                  10  <lightgbm.basic.Dataset object at 0x7f3739e0b3d0>

In [8]:
# Validation dataset.
obj_m_v = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=["partition_column_1", "partition_column_2"])
obj_m_v

   partition_column_1  partition_column_2                                              model
0                   1                  11  <lightgbm.basic.Dataset object at 0x7f3739d3b9a0>
1                   0                  11  <lightgbm.basic.Dataset object at 0x7f3739d3ba00>
2                   1                  10  <lightgbm.basic.Dataset object at 0x7f3739d3b4c0>
3                   0                  10  <lightgbm.basic.Dataset object at 0x7f3739d379d0>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Data for training locally</b>

In [9]:
# Get pandas DataFrame of data.
pdf_x = df_x.to_pandas().reset_index()
pdf_y = df_y.to_pandas()

In [10]:
pdf_x

Unnamed: 0,col1,col2,col3,col4
0,0.579671,-0.573365,0.160603,0.014404
1,0.152618,-3.176931,0.534689,-1.236542
2,-0.810526,-1.134015,0.090430,-0.813587
3,1.619580,0.479071,0.110080,0.893253
4,-0.615226,-0.546472,0.017496,-0.488720
...,...,...,...,...
395,-0.909989,1.105659,-0.285572,0.061649
396,1.152479,-0.585358,0.229059,0.255960
397,-0.704176,-0.737887,0.038317,-0.605455
398,1.221528,0.486295,0.062689,0.724938


In [11]:
pdf_y

Unnamed: 0,label
0,0
1,0
2,0
3,1
4,0
...,...
395,1
396,0
397,0
398,1


<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Deploy and load models trained in Vantage</b>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Single model</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [12]:
# Training without valid_sets argument.
opt_s = td_lightgbm.train(params={}, train_set=obj_s, num_boost_round=30, valid_sets=[obj_s])
opt_s

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 532
[LightGBM] [Info] Number of data points in the train set: 400, number of used features: 4
[1]	valid_0's l2: 0.215811
[2]	valid_0's l2: 0.188138
[3]	valid_0's l2: 0.166146
[4]	valid_0's l2: 0.14767
[5]	valid_0's l2: 0.133023
[6]	valid_0's l2: 0.121225
[7]	valid_0's l2: 0.111342
[8]	valid_0's l2: 0.102441
[9]	valid_0's l2: 0.0954689
[10]	valid_0's l2: 0.0885382
[11]	valid_0's l2: 0.0829116
[12]	valid_0's l2: 0.0781437
[13]	valid_0's l2: 0.0743136
[14]	valid_0's l2: 0.0708161
[15]	valid_0's l2: 0.0673105
[16]	valid_0's l2: 0.0649678
[17]	valid_0's l2: 0.0610153
[18]	valid_0's l2: 0.0578387
[19]	valid_0's l2: 0.055239
[20]	valid_0's l2: 0.0530351
[21]	valid_0's l2: 0.0508195
[22]	valid_0's l2: 0.0492392
[23]	valid_0's l2: 0.0478884
[24]	valid_0's l2: 0.0468161
[25]	valid_0's l2: 0.0457537
[26]	valid_0's l2: 0.0449375
[27]	valid_0's l2: 0.0439461
[28]	valid_0's l2: 0.0432557
[29]	valid_0's l2: 0.04225

<lightgbm.basic.Booster object at 0x7f373a7a07f0>

In [13]:
type(opt_s)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [14]:
opt_s.deploy(model_name="lightgbm_deploy_train_single_model")

Model is saved.


<lightgbm.basic.Booster object at 0x7f373a7a07f0>

In [15]:
# Deploying again with "replace_if_exists" set to True.
opt_s.deploy(model_name="lightgbm_deploy_train_single_model", replace_if_exists=True)

Model is deleted.
Model is saved.


<lightgbm.basic.Booster object at 0x7f373a7a07f0>

In [16]:
opt_s.record_evaluation_result # Empty as no record evaluation callback used.

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [17]:
opt_load = td_lightgbm.load(model_name="lightgbm_deploy_train_single_model")
opt_load

<lightgbm.basic.Booster object at 0x7f373a7b0910>

In [18]:
opt_load.predict(data=df_x, label=df_y)



col1,col2,col3,col4,label,booster_predict_1
1.08721910721962,-1.00616238561834,0.289957882286553,0.0553936469585556,0,0.0145538523418324
-1.0134542207762,0.855764911464957,-0.256919976110177,-0.0853009535407497,1,0.9765026758817096
1.45883918214779,0.627871026643599,0.067203725213036,0.885080711754702,1,0.6711517783816373
1.15227849437784,-0.385591649876342,0.196528275174699,0.337757362367158,0,0.0317751021094841
-0.697767009551012,2.3918078347398,-0.470222449950579,0.680153033261556,1,0.9798849376620596
0.908225366310926,-1.16923260998804,0.295712076085351,-0.0884667969477652,0,0.0159457535568849
1.46768781760664,-0.373959133431616,0.231255161662991,0.478241860805545,0,0.131719683633308
0.599510804787902,1.29890654949678,-0.141761534378328,0.790378168209106,1,0.9620758466678744
1.4139639946904,0.32934226567058,0.110572077573417,0.743405742245793,0,0.5943842107636799
-1.87644811776867,-1.04171963476974,-0.0483451139882224,-1.23440714105312,0,-0.0109609389371834


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Single model with record_evaluation callback</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [19]:
rec = {}

In [20]:
# Training With valid_sets and callbacks argument.
opt1 = td_lightgbm.train(params={}, train_set = obj_s, num_boost_round=30,
                         callbacks=[td_lightgbm.record_evaluation(rec), td_lightgbm.early_stopping(3)],
                         valid_sets=[obj_s])
opt1

You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 532
[LightGBM] [Info] Number of data points in the train set: 400, number of used features: 4
Training until validation scores don't improve for 3 rounds
Did not meet early stopping. Best iteration is:
[30]	valid_0's l2: 0.0416953



<lightgbm.basic.Booster object at 0x7f37384b2b20>

In [21]:
type(opt1)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

In [22]:
opt1.record_evaluation_result

{'valid_0': OrderedDict([('l2',
               [0.21581071275252506,
                0.18813848372931538,
                0.16614597654803753,
                0.14767044160631687,
                0.13302313884161535,
                0.12122466019783522,
                0.11134224788641987,
                0.10244106031111,
                0.09546885423861737,
                0.08853818576366684,
                0.08291158235734035,
                0.07814370620464588,
                0.07431355341477688,
                0.0708161265895864,
                0.06731046317041568,
                0.06496776380958444,
                0.06101534989219161,
                0.05783874072873213,
                0.05523896129206842,
                0.05303513329305773,
                0.050819538604252125,
                0.0492392067741575,
                0.047888389710407535,
                0.046816146687603685,
                0.045753706406899186,
                0.04493752813972262,
       

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [23]:
opt1.deploy(model_name="lightgbm_deploy_train_single_model_with_callback")

Model is saved.


<lightgbm.basic.Booster object at 0x7f37384b2b20>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [25]:
opt1_load = td_lightgbm.load(model_name="lightgbm_deploy_train_single_model_with_callback")
opt1_load

<lightgbm.basic.Booster object at 0x7f37383c7b50>

In [26]:
opt1_load.record_evaluation_result # Deploy will not save wrapper attributes. It just saves underlying model object.

In [27]:
opt1_load.predict(df_x)



col1,col2,col3,col4,booster_predict_1
1.08721910721962,-1.00616238561834,0.289957882286553,0.0553936469585556,0.0145538523418324
-1.0134542207762,0.855764911464957,-0.256919976110177,-0.0853009535407497,0.9765026758817096
1.45883918214779,0.627871026643599,0.067203725213036,0.885080711754702,0.6711517783816373
1.15227849437784,-0.385591649876342,0.196528275174699,0.337757362367158,0.0317751021094841
-0.697767009551012,2.3918078347398,-0.470222449950579,0.680153033261556,0.9798849376620596
0.908225366310926,-1.16923260998804,0.295712076085351,-0.0884667969477652,0.0159457535568849
1.46768781760664,-0.373959133431616,0.231255161662991,0.478241860805545,0.131719683633308
0.599510804787902,1.29890654949678,-0.141761534378328,0.790378168209106,0.9620758466678744
1.4139639946904,0.32934226567058,0.110572077573417,0.743405742245793,0.5943842107636799
-1.87644811776867,-1.04171963476974,-0.0483451139882224,-1.23440714105312,-0.0109609389371834


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Create Booster object from arguments</b>

In [28]:
opt_s.save_model("model_file_one_partition")

<lightgbm.basic.Booster object at 0x7f373a7a07f0>

In [29]:
direct_obj = td_lightgbm.Booster(model_file="model_file_one_partition")
direct_obj

<lightgbm.basic.Booster object at 0x7f373839a6a0>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [30]:
direct_obj.deploy(model_name="lightgbm_deploy_booster_from_outside", replace_if_exists=True)

Model is saved.


<lightgbm.basic.Booster object at 0x7f373839a6a0>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [31]:
obj_d = td_lightgbm.load("lightgbm_deploy_booster_from_outside")
obj_d

<lightgbm.basic.Booster object at 0x7f3738135340>

In [32]:
type(obj_d)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

In [33]:
obj_d.current_iteration()

30

In [34]:
obj_d.predict(df_x, label=df_y)



col1,col2,col3,col4,label,booster_predict_1
1.08721910721962,-1.00616238561834,0.289957882286553,0.0553936469585556,0,0.0145538523418324
-1.0134542207762,0.855764911464957,-0.256919976110177,-0.0853009535407497,1,0.9765026758817096
1.45883918214779,0.627871026643599,0.067203725213036,0.885080711754702,1,0.6711517783816373
1.15227849437784,-0.385591649876342,0.196528275174699,0.337757362367158,0,0.0317751021094841
-0.697767009551012,2.3918078347398,-0.470222449950579,0.680153033261556,1,0.9798849376620596
0.908225366310926,-1.16923260998804,0.295712076085351,-0.0884667969477652,0,0.0159457535568849
1.46768781760664,-0.373959133431616,0.231255161662991,0.478241860805545,0,0.131719683633308
0.599510804787902,1.29890654949678,-0.141761534378328,0.790378168209106,1,0.9620758466678744
1.4139639946904,0.32934226567058,0.110572077573417,0.743405742245793,0,0.5943842107636799
-1.87644811776867,-1.04171963476974,-0.0483451139882224,-1.23440714105312,0,-0.0109609389371834


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Multi model</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [35]:
# Training with valid_sets argument.
opt_m = td_lightgbm.train(params={}, train_set = obj_m, num_boost_round=30, early_stopping_rounds=50, valid_sets=[obj_m_v, obj_m_v])
opt_m

   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f37382766a0>   
1  <lightgbm.basic.Booster object at 0x7f37384b2f10>   
2  <lightgbm.basic.Booster object at 0x7f3733f909a0>   
3  <lightgbm.basic.Booster object at 0x7f379f29fbb0>   

                                      console_output  

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [36]:
opt_m_deploy = opt_m.deploy(model_name="lightgbm_deploy_train_multi_model")

Model is saved.


In [37]:
opt_m_deploy

   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f37382766a0>   
1  <lightgbm.basic.Booster object at 0x7f37384b2f10>   
2  <lightgbm.basic.Booster object at 0x7f3733f909a0>   
3  <lightgbm.basic.Booster object at 0x7f379f29fbb0>   

                                      console_output  

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [38]:
opt_m_load = td_lightgbm.load("lightgbm_deploy_train_multi_model")
opt_m_load

   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f373b1a9400>   
1  <lightgbm.basic.Booster object at 0x7f373816d280>   
2  <lightgbm.basic.Booster object at 0x7f37380a4f70>   
3  <lightgbm.basic.Booster object at 0x7f3738170cd0>   

                                      console_output  

In [39]:
opt_m_load.predict(df_x)



partition_column_1,partition_column_2,col1,col2,col3,col4,booster_predict_1
0,10,1.18175014373863,-0.378409839573248,0.198781337335136,0.353382411544652,0.1011684726884329
0,10,1.61958006284022,0.479070865675633,0.11007983461946,0.893252734157523,0.6407137464225
0,10,-1.70817782377046,-1.42574876314059,0.0336831120597743,-1.31941475567603,0.0019317029209896
0,10,2.08263134345229,0.884105923622972,0.0979313882125954,1.25851975116246,0.6228217884661379
0,10,1.09867539212952,-0.74898711895134,0.249438812683104,0.165738276697588,0.1422149693184607
0,10,-0.612469017951351,-0.547252280586996,0.0179431561213576,-0.487853741151895,0.0936486934603819
0,10,-1.02578986110877,-0.247690669360584,-0.0787909605284584,-0.542910978510773,0.3801174435875071
0,10,-1.0839778730759,1.87924596826784,-0.431655204285638,0.30387458867114,1.0031885249492265
0,10,0.423752960687558,-2.22341274927069,0.411006436528286,-0.729033087599375,0.0178897040461391
0,10,1.06980588152946,0.658572465011404,0.0170400661725024,0.730269333455773,0.776822066928576


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Multi model with record_evaluation callback</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [40]:
rec = {}

In [41]:
# Training with valid_sets and callbacks argument.
opt_m_r = td_lightgbm.train(params={}, train_set=obj_m, num_boost_round=30, callbacks=[td_lightgbm.record_evaluation(rec)],
                            valid_sets=[obj_m_v, obj_m_v])
opt_m_r

   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f373816da90>   
1  <lightgbm.basic.Booster object at 0x7f373814cbb0>   
2  <lightgbm.basic.Booster object at 0x7f3733f23be0>   
3  <lightgbm.basic.Booster object at 0x7f37382e32b0>   

                                      console_output  \

                            record_evaluation_result  
0  {'valid_0': {'l2': [0.2196373768349858, 0.1965...  
1  {'valid_0': {'l2': [0.2229904865477987, 0.2008...  
2  {'valid_0': {'l2': [0.21514138095238078, 0.191...  
3  {'valid_0': {'l2': [0.2195184911242605, 0.1948...  

In [42]:
type(opt_m_r)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [43]:
opt_m_r_deploy = opt_m_r.deploy("lightgbm_deploy_train_multi_model_with_record_eval")

Model is saved.


In [44]:
opt_m_r_deploy

   partition_column_1  partition_column_2  \
0                   1                  11   
1                   0                  11   
2                   1                  10   
3                   0                  10   

                                               model  \
0  <lightgbm.basic.Booster object at 0x7f373816da90>   
1  <lightgbm.basic.Booster object at 0x7f373814cbb0>   
2  <lightgbm.basic.Booster object at 0x7f3733f23be0>   
3  <lightgbm.basic.Booster object at 0x7f37382e32b0>   

                                      console_output  \

                            record_evaluation_result  
0  {'valid_0': {'l2': [0.2196373768349858, 0.1965...  
1  {'valid_0': {'l2': [0.2229904865477987, 0.2008...  
2  {'valid_0': {'l2': [0.21514138095238078, 0.191...  
3  {'valid_0': {'l2': [0.2195184911242605, 0.1948...  

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [45]:
opt_m_r_load = td_lightgbm.load("lightgbm_deploy_train_multi_model_with_record_eval")

In [46]:
opt_m_r_load.predict(df_x, label=df_y)



partition_column_1,partition_column_2,col1,col2,col3,col4,label,booster_predict_1
0,10,0.773845635832102,-1.58675295942853,0.348051787321054,-0.317428554253059,0,-0.0162995111053354
0,10,-0.652532094184103,1.32943072895083,-0.292093897599532,0.264152749130696,1,1.0361952652995408
0,10,-1.23692438918288,-1.18018296015538,0.0484367935117144,-1.0159842919928,0,0.0019317029209896
0,10,0.920031063974712,-1.20648254637488,0.303144290095821,-0.0986555691105928,0,-0.0162995111053354
0,10,1.25253683301775,0.831584626905822,0.0101019573268889,0.879813010292186,1,0.7062334538356531
0,10,-0.246722969060997,1.63296730945332,-0.294371830900959,0.56318540668509,1,0.977749108989418
0,10,-0.733263307298235,1.97737051024738,-0.406903816763842,0.495003202510811,1,1.0361952652995408
0,10,-1.29470959152506,-0.0474693099496212,-0.142594481916762,-0.576553550048223,0,0.412931851324878
0,10,-1.29000931602396,2.07409376142361,-0.487282819547692,0.2950893945149,1,0.9312513457625944
0,10,1.81742458095479,0.547362905756978,0.121937163355842,1.00637520272149,0,0.6228217884661379


<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Deploy and load sklearn models trained in Vantage</b>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Single model</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [47]:
obj_skl_s = td_lightgbm.LGBMModel(num_leaves=15, objective="binary", n_estimators=10)
obj_skl_s

In [48]:
obj_skl_s.fit(df_x, df_y, callbacks=[td_lightgbm.log_evaluation()])

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [49]:
obj_skl_deploy_s = obj_skl_s.deploy(model_name="lightgbm_sklearn_single_model")
obj_skl_deploy_s

Model is saved.


<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [50]:
obj_skl_load_model_s = td_lightgbm.load("lightgbm_sklearn_single_model")
obj_skl_load_model_s

In [51]:
obj_skl_load_model_s.predict(df_x, pred_leaf=True)



col1,col2,col3,col4,lgbmmodel_predict_1,lgbmmodel_predict_2,lgbmmodel_predict_3,lgbmmodel_predict_4,lgbmmodel_predict_5,lgbmmodel_predict_6,lgbmmodel_predict_7,lgbmmodel_predict_8,lgbmmodel_predict_9,lgbmmodel_predict_10
1.08721910721962,-1.00616238561834,0.289957882286553,0.0553936469585556,9,10,10,9,12,9,14,10,14,9
-1.0134542207762,0.855764911464957,-0.256919976110177,-0.0853009535407497,8,8,8,10,11,7,11,7,12,12
1.45883918214779,0.627871026643599,0.067203725213036,0.885080711754702,6,6,6,3,6,3,6,3,7,6
1.15227849437784,-0.385591649876342,0.196528275174699,0.337757362367158,9,5,5,6,9,6,9,6,8,8
-0.697767009551012,2.3918078347398,-0.470222449950579,0.680153033261556,8,13,12,14,13,14,13,14,12,14
0.908225366310926,-1.16923260998804,0.295712076085351,-0.0884667969477652,9,11,13,9,12,9,14,10,14,9
1.46768781760664,-0.373959133431616,0.231255161662991,0.478241860805545,5,5,5,6,5,6,5,6,3,3
0.599510804787902,1.29890654949678,-0.141761534378328,0.790378168209106,10,13,14,14,14,14,13,14,12,14
1.4139639946904,0.32934226567058,0.110572077573417,0.743405742245793,2,2,2,3,2,3,2,3,7,6
-1.87644811776867,-1.04171963476974,-0.0483451139882224,-1.23440714105312,0,0,0,0,0,0,0,0,0,0


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Multi model</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [52]:
obj_skl_m = td_lightgbm.LGBMModel(num_leaves=15, objective="binary", n_estimators=5)
obj_skl_m

In [53]:
obj_skl_m.fit(df_x, df_y, sample_weight=df_train.select("group_column"),
              partition_columns=["partition_column_1", "partition_column_2"],
              callbacks = [td_lightgbm.log_evaluation()])

0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')


   partition_column_1  partition_column_2                                                         model
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [54]:
obj_skl_m.deploy(model_name="lightgbm_sklearn_multi_model")

Model is saved.


0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')


   partition_column_1  partition_column_2                                                         model
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [55]:
obj_skl_load_model_m = td_lightgbm.load("lightgbm_sklearn_multi_model")

In [56]:
obj_skl_load_model_m

0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')


   partition_column_1  partition_column_2                                                         model
0                   1                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
1                   0                  11  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
2                   1                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')
3                   0                  10  LGBMModel(n_estimators=5, num_leaves=15, objective='binary')

In [57]:
obj_skl_load_model_m.predict(df_x, raw_score=True, pred_leaf=True)



partition_column_1,partition_column_2,col1,col2,col3,col4,lgbmmodel_predict_1,lgbmmodel_predict_2,lgbmmodel_predict_3,lgbmmodel_predict_4,lgbmmodel_predict_5
1,10,-0.81561368659203,-1.23797225905657,0.106755845337771,-0.858387592275172,0,0,1,0,2
1,10,1.22293214360004,-0.0499157426583425,0.150108049327627,0.505750927326388,2,2,2,2,2
1,10,-1.41682282355336,-1.10436212447853,0.0152120983410702,-1.0623134937516,0,0,1,0,0
1,10,1.68842312661873,1.25645955102489,-0.0084288787356851,1.24152358424494,2,1,2,2,1
1,10,1.05259185573158,1.19821169185417,-0.0727720048714777,0.944058397832405,2,1,0,2,1
1,10,0.978831185266201,-1.26545290822398,0.319567156801863,-0.0975263932260341,0,0,3,0,2
1,10,-1.72182210533619,-0.849435337879736,-0.0616823584684557,-1.08905706675667,3,2,0,3,0
1,10,-1.47190825008191,-0.0291948925884957,-0.166141412269092,-0.64530912851927,1,2,0,1,0
1,10,0.191240730419773,1.97100133015377,-0.298530178457708,0.890194110648821,1,1,0,1,1
1,10,1.13813152232889,-0.257627247407291,0.174062606457366,0.384122223783585,2,2,2,2,2


In [58]:
obj_skl_load_model_m.feature_importances_

Unnamed: 0,partition_column_1,partition_column_2,feature_importances_
0,1,11,"[3, 11, 1, 0]"
1,0,11,"[0, 5, 7, 0]"
2,1,10,"[2, 7, 2, 2]"
3,0,10,"[4, 10, 0, 1]"


<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Deploy and load models trained outside Vantage</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train using train() method</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train</b>

In [59]:
from lightgbm import Dataset, train

In [60]:
pdf_dataset = Dataset(params={},data=pdf_x, label=pdf_y)
pdf_dataset

<lightgbm.basic.Dataset at 0x7f3739d3b730>

In [61]:
local_train = train(train_set=pdf_dataset, params={})
local_train

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 532
[LightGBM] [Info] Number of data points in the train set: 400, number of used features: 4
[LightGBM] [Info] Start training from score 0.507500


<lightgbm.basic.Booster at 0x7f3733e818e0>

In [62]:
type(local_train)

lightgbm.basic.Booster

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [63]:
opt_outside = td_lightgbm.deploy(model_name="model_trained_outside_vantage_using_train", model=local_train)

Model is saved.


In [64]:
opt_outside

<lightgbm.basic.Booster object at 0x7f3733e818e0>

In [65]:
type(opt_outside)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [66]:
opt_load_outside = td_lightgbm.load("model_trained_outside_vantage_using_train")

In [67]:
opt_load_outside

<lightgbm.basic.Booster object at 0x7f373847e4f0>

In [68]:
type(opt_load_outside)

teradataml.opensource._lightgbm._LightgbmBoosterWrapper

In [69]:
# Predict on data residing in Vantage.
opt_load_outside.predict(data=df_x, label=df_y, pred_contrib=True)



col1,col2,col3,col4,label,booster_predict_1,booster_predict_2,booster_predict_3,booster_predict_4,booster_predict_5
1.08721910721962,-1.00616238561834,0.289957882286553,0.0553936469585556,0,-0.0083357981072754,-0.2211251701283529,-0.2011344032817137,-0.076055015103008,0.5075000001219884
-1.0134542207762,0.855764911464957,-0.256919976110177,-0.0853009535407497,1,0.0611091609636113,0.2183360575866612,0.2117914894944817,-0.0023573820783368,0.5075000001219884
1.45883918214779,0.627871026643599,0.067203725213036,0.885080711754702,1,0.004680991076712,0.195964350951066,-0.2891789963857054,0.3281383071163784,0.5075000001219884
1.15227849437784,-0.385591649876342,0.196528275174699,0.337757362367158,0,-0.0204217069661268,-0.174783838251349,-0.2195245446526588,-0.0701539299104305,0.5075000001219884
-0.697767009551012,2.3918078347398,-0.470222449950579,0.680153033261556,1,0.0222485833753585,0.2102123731329647,0.1876546274090518,0.1099110524488352,0.5075000001219884
0.908225366310926,-1.16923260998804,0.295712076085351,-0.0884667969477652,0,-0.0248906989080486,-0.2212331486407557,-0.2009822688984563,-0.0746360248228239,0.5075000001219884
1.46768781760664,-0.373959133431616,0.231255161662991,0.478241860805545,0,0.056130720716364,-0.1172683053688842,-0.225904690187823,-0.1407659689570494,0.5075000001219884
0.599510804787902,1.29890654949678,-0.141761534378328,0.790378168209106,1,0.0396398851455689,0.2112305882550562,0.1106550192611941,0.1014999576849528,0.5075000001219884
1.4139639946904,0.32934226567058,0.110572077573417,0.743405742245793,0,0.000844129925888,0.1868767028476348,-0.3211854595966354,0.1626545513743378,0.5075000001219884
-1.87644811776867,-1.04171963476974,-0.0483451139882224,-1.23440714105312,0,-0.1644513511900406,-0.3497730283830795,0.1037710946293369,-0.1679712097347922,0.5075000001219884


<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Train using sklearn classes</b>

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Train locally</b>

In [70]:
from lightgbm import LGBMClassifier
local_obj = LGBMClassifier(num_leaves=5, objective="binary", n_estimators=10, learning_rate=0.01)
local_obj

In [71]:
local_obj.fit(pdf_x, pdf_y)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


In [72]:
type(local_obj)

lightgbm.sklearn.LGBMClassifier

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Deploy</b>

In [73]:
skl_deploy = td_lightgbm.deploy(model_name="skl_model_trained_outside_vantage", model=local_obj)

Model is saved.


In [74]:
skl_deploy

In [75]:
type(skl_deploy)

teradataml.opensource._lightgbm._LightgbmSklearnWrapper

<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>Load</b>

In [76]:
skl_load = td_lightgbm.load("skl_model_trained_outside_vantage")
skl_load

In [77]:
# Predict on data residing in Vantage.
skl_load.predict(df_x)



col1,col2,col3,col4,lgbmclassifier_predict_1
0.140351620495199,-0.667007351250991,0.124834790512454,-0.213012319733515,0
-1.21731323571577,-0.750997387220421,-0.0191260717151705,-0.83162437457057,1
1.13813152232889,-0.257627247407291,0.174062606457366,0.384122223783585,0
-0.440338894029492,2.29067598200844,-0.42387759957544,0.749467322701779,1
0.957701697830466,-1.4105675310624,0.340727932762679,-0.166100047921912,0
0.191240730419773,1.97100133015377,-0.298530178457708,0.890194110648821,1
-1.09939360645383,0.871762881447565,-0.269501041371075,-0.115722005478974,1
-1.32949774623385,-1.509881753279,0.0913394535664962,-1.19095961550441,0
1.01425780621963,-1.06308839279554,0.290750251760953,0.0006656341323431,0
0.241997124883449,1.92001797333299,-0.284340903890614,0.891136021839347,1


<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Remove connection</b>

In [78]:
remove_context()

True