## Import libraries

In [1]:
! pip install -U lightautoml

Collecting lightautoml
  Downloading LightAutoML-0.2.14-py3-none-any.whl (250 kB)
[K     |████████████████████████████████| 250 kB 4.5 MB/s 
Collecting log-calls
  Downloading log_calls-0.3.2.tar.gz (232 kB)
[K     |████████████████████████████████| 232 kB 7.2 MB/s 
Collecting importlib-metadata<2.0,>=1.0
  Downloading importlib_metadata-1.7.0-py2.py3-none-any.whl (31 kB)
Collecting autowoe>=1.2
  Downloading AutoWoE-1.2.5-py3-none-any.whl (204 kB)
[K     |████████████████████████████████| 204 kB 8.6 MB/s 
Collecting lightgbm<3.0,>=2.3
  Downloading lightgbm-2.3.1-py2.py3-none-manylinux1_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 9.5 MB/s 
Collecting json2html
  Downloading json2html-1.3.0.tar.gz (7.0 kB)
Collecting efficientnet-pytorch
  Downloading efficientnet_pytorch-0.7.1.tar.gz (21 kB)
Collecting poetry-core<2.0.0,>=1.0.0
  Downloading poetry_core-1.0.3-py2.py3-none-any.whl (424 kB)
[K     |████████████████████████████████| 424 k

In [2]:
import gc
import pickle
import numpy as np
import pandas as pd
from sklearn.metrics import log_loss
from sklearn.model_selection import StratifiedKFold

from lightautoml.tasks import Task
from lightautoml.automl.presets.tabular_presets import TabularUtilizedAutoML

## Prepare data for model training

In [3]:
with open("../input/tps-may-data-preprocess/TPS_May_Dataset_w_Quantile.txt", 'rb') as handle: 
    data = handle.read()

processed_data = pickle.loads(data)
train_df = processed_data['train_df']
test_df = processed_data['test_df']

Ytrain_oh = pd.get_dummies(train_df['target']).values

del processed_data
gc.collect()

0

## Build and validate the model

In [4]:
FOLD = 5
N_THREADS = 4
TIMEOUT = 60 * 60 * 3

model = TabularUtilizedAutoML(
    task = Task('multiclass',), 
    timeout = TIMEOUT,
    cpu_limit = N_THREADS,
    reader_params = {'n_jobs': N_THREADS, 'cv': FOLD},
)

y_pred_meta_lama = model.fit_predict(train_df, roles={'target':'target'})
print("\n\ny_pred_meta_lama: {}".format(y_pred_meta_lama.shape))

Current random state: {'reader_params': {'random_state': 42}, 'general_params': {'return_all_predictions': False}}
Found reader_params in kwargs, need to combine
Merged variant for reader_params = {'n_jobs': 4, 'cv': 5, 'random_state': 42}
Start automl preset with listed constraints:
- time: 10799.99677824974 seconds
- cpus: 4 cores
- memory: 16 gb

Train data shape: (99918, 952)
Feats was rejected during automatic roles guess: []


Layer 1 ...
Train process start. Time left 10610.084599256516 secs
Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====

Linear model: C = 1e-05 score = -1.106124028817022
Linear model: C = 5e-05 score = -1.1008103072070616
Linear model: C = 0.0001 score = -1.0997530292895576
Linear model: C = 0.0005 score = -1.0993006878299079
Linear model: C = 0.001 score = -1.0992947520913363
Linear model: C = 0.005 score = -1.0993006078286276
Linear model: C = 0.01 score = -1.099296344387486

===== Start wo

Time limit exceeded after calculating fold 2


Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed
Optuna may run 1 secs


Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer


Start fitting Lvl_0_Pipe_1_Mod_1_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 200 rounds
[100]	valid's multi_logloss: 1.0976
[200]	valid's multi_logloss: 1.09582
[300]	valid's multi_logloss: 1.09756
Early stopping, best iteration is:
[176]	valid's multi_logloss: 1.09568
Lvl_0_Pipe_1_Mod_1_LightGBM fitting and predicting completed
Start fitting Lvl_0_Pipe_1_Mod_1_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.09663
[200]	valid's multi_logloss: 1.09971
Early stopping, best iteration is:
[105]	valid's multi_logloss: 1.09652

===== Start working with fold 1 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.09674
Early stopping, best iteration is:
[87]	valid's multi_logloss: 1.0967

===== Sta

Time limit exceeded after calculating fold 3


Stopped by overfitting detector  (100 iterations wait)

bestTest = 1.093529791
bestIteration = 1178

Shrink model to first 1179 iterations.
Lvl_0_Pipe_1_Mod_2_CatBoost fitting and predicting completed
Optuna may run 1 secs
Start fitting Lvl_0_Pipe_1_Mod_3_CatBoost ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_3_CatBoost =====

0:	learn: 1.3684079	test: 1.3684118	best: 1.3684118 (0)	total: 246ms	remaining: 12m 19s
100:	learn: 1.1115094	test: 1.1119702	best: 1.1119702 (100)	total: 26.1s	remaining: 12m 29s
200:	learn: 1.1043802	test: 1.1062400	best: 1.1062400 (200)	total: 55.4s	remaining: 12m 51s
300:	learn: 1.1003718	test: 1.1032463	best: 1.1032463 (300)	total: 1m 23s	remaining: 12m 26s
400:	learn: 1.0964332	test: 1.1004847	best: 1.1004847 (400)	total: 1m 52s	remaining: 12m 7s
500:	learn: 1.0921653	test: 1.0977054	best: 1.0977054 (500)	total: 2m 20s	remaining: 11m 42s
600:	learn: 1.0889103	test: 1.0958915	best: 1.0958915 (600)	total: 2m 48s	remaining: 11m 12s
700:	learn: 1.08

Time limit exceeded in one of the tasks. AutoML will blend level 1 models.                                         
Try to set higher time limits or use Profiler to find bottleneck and optimize Pipelines settings


Blending: Optimization starts with equal weights and score -1.0926703056180176
Blending, iter 0: score = -1.0916697267856714, weights = [0.         0.17624637 0.         0.07115648 0.75259715]
Blending, iter 1: score = -1.0916531301441152, weights = [0.         0.18385096 0.05221703 0.         0.76393205]
Blending, iter 2: score = -1.0916517884611174, weights = [0.         0.15780756 0.05108717 0.         0.7911053 ]
Blending, iter 3: score = -1.0916517842224336, weights = [0.         0.1578078  0.0510859  0.         0.79110634]
Blending, iter 4: score = -1.0916517842224336, weights = [0.         0.1578078  0.0510859  0.         0.79110634]
No score update. Terminated

Automl preset training completed in 7501.04 seconds.


y_pred_meta_lama: (99918, 4)


In [5]:
oof_score = log_loss(Ytrain_oh, y_pred_meta_lama.data)
print("Aggregate OOF Score: {}".format(oof_score))

Aggregate OOF Score: 1.0916515056343399


In [6]:
y_pred_final_lama = model.predict(test_df)

In [7]:
np.savez_compressed('./LAMA_Meta_Features.npz',
                    y_pred_meta_lama=y_pred_meta_lama.data, 
                    oof_score=oof_score,
                    y_pred_final_lama=y_pred_final_lama.data)

## Create submission file

In [8]:
test_df = pd.read_csv("../input/tabular-playground-series-may-2021/test.csv")
submit_df = pd.DataFrame()
submit_df['id'] = test_df['id']
submit_df['Class_1'] = y_pred_final_lama.data[:,0]
submit_df['Class_2'] = y_pred_final_lama.data[:,1]
submit_df['Class_3'] = y_pred_final_lama.data[:,2]
submit_df['Class_4'] = y_pred_final_lama.data[:,3]
submit_df.head()

Unnamed: 0,id,Class_1,Class_2,Class_3,Class_4
0,100000,0.089958,0.615511,0.173822,0.120709
1,100001,0.080117,0.679074,0.153226,0.087583
2,100002,0.084277,0.634632,0.175985,0.105107
3,100003,0.084878,0.536835,0.272012,0.106275
4,100004,0.074894,0.615071,0.191014,0.119021


In [9]:
submit_df.to_csv("./LAMA_submission.csv", index=False)