## Import libraries

In [1]:
! pip install -U lightautoml

Collecting lightautoml
  Downloading LightAutoML-0.2.14-py3-none-any.whl (250 kB)
[K     |████████████████████████████████| 250 kB 1.2 MB/s 
[?25hCollecting autowoe>=1.2
  Downloading AutoWoE-1.2.5-py3-none-any.whl (204 kB)
[K     |████████████████████████████████| 204 kB 5.0 MB/s 
Collecting poetry-core<2.0.0,>=1.0.0
  Downloading poetry_core-1.0.3-py2.py3-none-any.whl (424 kB)
[K     |████████████████████████████████| 424 kB 5.0 MB/s 
[?25hCollecting importlib-metadata<2.0,>=1.0
  Downloading importlib_metadata-1.7.0-py2.py3-none-any.whl (31 kB)
Collecting efficientnet-pytorch
  Downloading efficientnet_pytorch-0.7.1.tar.gz (21 kB)
Collecting json2html
  Downloading json2html-1.3.0.tar.gz (7.0 kB)
Collecting lightgbm<3.0,>=2.3
  Downloading lightgbm-2.3.1-py2.py3-none-manylinux1_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 5.5 MB/s 
Collecting log-calls
  Downloading log_calls-0.3.2.tar.gz (232 kB)
[K     |███████████████████████████

In [2]:
import gc
import pickle
import numpy as np
import pandas as pd
from sklearn.metrics import log_loss
from sklearn.model_selection import StratifiedKFold

from lightautoml.tasks import Task
from lightautoml.automl.presets.tabular_presets import TabularUtilizedAutoML

## Prepare data for model training

In [3]:
with open("../input/tps-may-data-preprocess-v4/TPS_May_Dataset.txt", 'rb') as handle: 
    data = handle.read()

processed_data = pickle.loads(data)
train_df = processed_data['train_df']
test_df = processed_data['test_df']

Ytrain_oh = pd.get_dummies(train_df['target']).values

del processed_data
gc.collect()

0

## Build and validate the model

In [4]:
FOLD = 5
N_THREADS = 4
TIMEOUT = 60 * 60

model = TabularUtilizedAutoML(
    task = Task('multiclass',), 
    timeout = TIMEOUT,
    cpu_limit = N_THREADS,
    reader_params = {'n_jobs': N_THREADS, 'cv': FOLD},
)

y_pred_meta_lama = model.fit_predict(train_df, roles={'target':'target'})
print("\n\ny_pred_meta_lama: {}".format(y_pred_meta_lama.shape))

Current random state: {'reader_params': {'random_state': 42}, 'general_params': {'return_all_predictions': False}}
Found reader_params in kwargs, need to combine
Merged variant for reader_params = {'n_jobs': 4, 'cv': 5, 'random_state': 42}
Start automl preset with listed constraints:
- time: 3599.9968614578247 seconds
- cpus: 4 cores
- memory: 16 gb

Train data shape: (99918, 373)
Feats was rejected during automatic roles guess: []


Layer 1 ...
Train process start. Time left 3501.3849437236786 secs
Start fitting Lvl_0_Pipe_0_Mod_0_LinearL2 ...

===== Start working with fold 0 for Lvl_0_Pipe_0_Mod_0_LinearL2 =====

Linear model: C = 1e-05 score = -1.0976145638345336
Linear model: C = 5e-05 score = -1.0951931741024503
Linear model: C = 0.0001 score = -1.0948487627241699
Linear model: C = 0.0005 score = -1.0947983056326227
Linear model: C = 0.001 score = -1.0948879017522208
Linear model: C = 0.005 score = -1.0948181004950208

===== Start working with fold 1 for Lvl_0_Pipe_0_Mod_0_LinearL

Time limit exceeded after calculating fold 1


Lvl_0_Pipe_1_Mod_0_LightGBM fitting and predicting completed
Optuna may run 1 secs


Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer


Start fitting Lvl_0_Pipe_1_Mod_1_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 200 rounds
[100]	valid's multi_logloss: 1.09884
[200]	valid's multi_logloss: 1.098
[300]	valid's multi_logloss: 1.10038
Early stopping, best iteration is:
[163]	valid's multi_logloss: 1.09746
Lvl_0_Pipe_1_Mod_1_LightGBM fitting and predicting completed
Start fitting Lvl_0_Pipe_1_Mod_1_LightGBM ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.09985
Early stopping, best iteration is:
[99]	valid's multi_logloss: 1.0998

===== Start working with fold 1 for Lvl_0_Pipe_1_Mod_1_LightGBM =====

Training until validation scores don't improve for 100 rounds
[100]	valid's multi_logloss: 1.10015
Early stopping, best iteration is:
[72]	valid's multi_logloss: 1.10008

===== Start working with fold 2 for Lvl_0_Pipe_1

Time limit exceeded after calculating fold 2


Stopped by overfitting detector  (100 iterations wait)

bestTest = 1.091985969
bestIteration = 1478

Shrink model to first 1479 iterations.
Lvl_0_Pipe_1_Mod_2_CatBoost fitting and predicting completed
Optuna may run 1 secs
Start fitting Lvl_0_Pipe_1_Mod_3_CatBoost ...

===== Start working with fold 0 for Lvl_0_Pipe_1_Mod_3_CatBoost =====

0:	learn: 1.3685528	test: 1.3685659	best: 1.3685659 (0)	total: 118ms	remaining: 5m 53s
100:	learn: 1.1091935	test: 1.1103299	best: 1.1103299 (100)	total: 12s	remaining: 5m 45s
200:	learn: 1.1004544	test: 1.1034512	best: 1.1034512 (200)	total: 25s	remaining: 5m 48s
300:	learn: 1.0957982	test: 1.1000620	best: 1.1000620 (300)	total: 37.2s	remaining: 5m 33s
400:	learn: 1.0922140	test: 1.0977777	best: 1.0977777 (400)	total: 49.4s	remaining: 5m 20s
500:	learn: 1.0887825	test: 1.0960105	best: 1.0960105 (500)	total: 1m 2s	remaining: 5m 10s
600:	learn: 1.0859849	test: 1.0951448	best: 1.0951448 (600)	total: 1m 14s	remaining: 4m 56s
700:	learn: 1.0834714	test: 1

Time limit exceeded in one of the tasks. AutoML will blend level 1 models.                                         
Try to set higher time limits or use Profiler to find bottleneck and optimize Pipelines settings


Blending: Optimization starts with equal weights and score -1.0932412460178216
Blending, iter 0: score = -1.0925905967090284, weights = [0.17591688 0.         0.         0.27881792 0.5452652 ]
Blending, iter 1: score = -1.092589245830434, weights = [0.15938377 0.         0.         0.32468563 0.5159306 ]
Blending, iter 2: score = -1.0925892426872879, weights = [0.1600137  0.         0.         0.32201657 0.5179697 ]
Blending, iter 3: score = -1.0925892381658462, weights = [0.16028711 0.         0.         0.32258546 0.51712745]
Blending, iter 4: score = -1.0925892381658462, weights = [0.16028711 0.         0.         0.32258546 0.51712745]
No score update. Terminated

Automl preset training completed in 2934.28 seconds.


y_pred_meta_lama: (99918, 4)


In [5]:
oof_score = log_loss(Ytrain_oh, y_pred_meta_lama.data)
print("Aggregate OOF Score: {}".format(oof_score))

Aggregate OOF Score: 1.0925892381658462


In [6]:
y_pred_final_lama = model.predict(test_df)

In [7]:
np.savez_compressed('./LAMA_Meta_Features.npz',
                    y_pred_meta_lama=y_pred_meta_lama.data, 
                    oof_score=oof_score,
                    y_pred_final_lama=y_pred_final_lama.data)

## Create submission file

In [8]:
test_df = pd.read_csv("../input/tabular-playground-series-may-2021/test.csv")
submit_df = pd.DataFrame()
submit_df['id'] = test_df['id']
submit_df['Class_1'] = y_pred_final_lama.data[:,0]
submit_df['Class_2'] = y_pred_final_lama.data[:,1]
submit_df['Class_3'] = y_pred_final_lama.data[:,2]
submit_df['Class_4'] = y_pred_final_lama.data[:,3]
submit_df.head()

Unnamed: 0,id,Class_1,Class_2,Class_3,Class_4
0,100000,0.095108,0.61896,0.168692,0.117239
1,100001,0.080853,0.662055,0.154097,0.102995
2,100002,0.083764,0.622032,0.184461,0.109743
3,100003,0.08976,0.531642,0.279045,0.099553
4,100004,0.074102,0.604012,0.19992,0.121966


In [9]:
submit_df.to_csv("./LAMA_submission.csv", index=False)