# Housing prices regression

In [3]:
# load libraries
from e2eml.regression import regression_blueprints as rb
from e2eml.full_processing.postprocessing import save_to_production, load_for_production
from e2eml.test.regression_blueprints_test import load_housingprices_data
import pandas as pd
from sklearn.metrics import mean_absolute_error

# Feature engineering
Load & preprocess housing prices dataset.



In [4]:
# load Housing price data
test_df, test_target, val_df, val_df_target, test_categorical_cols = load_housingprices_data()

Do dataframe splits.


# Using e2eml - Run and save a pipeline
We only need a few steps to get ur full pipeline:
- Instantiate class
- Run chosen blueprint
- Save blueprint for later usage

In [5]:
# Instantiate class
housing_ml = rb.RegressionBluePrint(datasource=test_df,
                                         target_variable=test_target,
                                         categorical_columns=test_categorical_cols, # here we specify cat columns (that is optional however)
                                         preferred_training_mode='auto',
                                         tune_mode='accurate')

Preferred training mode auto has been chosen. e2eml will automatically detect, if LGBM and Xgboost canuse GPU acceleration and optimize the workflow accordingly.


In [6]:
"""
In this case we chose Ngboost, which is uses natural gradient. It is really strong for regression problem, but
does not have GPU acceleration at all unfortunately. However we always recommend trying Ngboost if posible.
"""
housing_ml.ml_bp14_regressions_full_processing_ngboost(preprocessing_type='nlp')

Started Execute test train split at 06:48:20.
Started Apply datetime transformation at 06:48:21.
Started Start Spacy, POS tagging + PCA at 06:48:21.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 800 entries, 928 to 799
Data columns (total 80 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             800 non-null    int64  
 1   MSSubClass     800 non-null    int64  
 2   MSZoning       800 non-null    object 
 3   LotFrontage    662 non-null    float64
 4   LotArea        800 non-null    int64  
 5   Street         800 non-null    object 
 6   Alley          52 non-null     object 
 7   LotShape       800 non-null    object 
 8   LandContour    800 non-null    object 
 9   Utilities      800 non-null    object 
 10  LotConfig      800 non-null    object 
 11  LandSlope      800 non-null    object 
 12  Neighborhood   800 non-null    object 
 13  Condition1     800 non-null    object 
 14  Condition2     800 non-null    obje

is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead


Started Execute categorical encoding at 06:48:22.


is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead


Started  Delete columns with high share of NULLs at 06:48:22.
Started Fill nulls at 06:48:22.
Started Execute numerical binning at 06:48:22.
Started Handle outliers at 06:48:23.
Started Remove collinearity at 06:48:23.
Started Execute clustering as a feature at 06:48:23.
Started Scale data at 06:48:23.
Started Execute clustering as a feature at 06:48:24.
Started Execute clustering as a feature at 06:48:24.
Started Execute clustering as a feature at 06:48:25.
Started Execute clustering as a feature at 06:48:25.
Started Execute clustering as a feature at 06:48:26.
Started Execute clustering as a feature at 06:48:26.
Started Execute clustering as a feature at 06:48:27.
Started Execute clustering as a feature at 06:48:27.
Started Select best features at 06:48:28.
Id
MSSubClass
MSZoning
LotFrontage
LotArea
Street
Alley
LotShape
LandContour
Utilities
LotConfig
LandSlope
Neighborhood
Condition1
Condition2
BldgType
HouseStyle
OverallQual
OverallCond
YearBuilt
YearRemodAdd
RoofStyle
RoofMatl
Ex

[32m[I 2021-07-18 06:48:28,792][0m A new study created in memory with name: no-name-004d05c4-c47c-4cb2-aae4-62b4ccced2e9[0m


Started Sort columns alphabetically at 06:48:28.
Started Train Ngboost at 06:48:28.
[iter 0] loss=12.5729 val_loss=12.4403 scale=2.0000 norm=1.3549
[iter 100] loss=11.2319 val_loss=11.3382 scale=2.0000 norm=0.8572
== Early stopping achieved.
== Best iteration / VAL149 (val_loss=11.2208)
[iter 0] loss=12.5898 val_loss=12.4402 scale=2.0000 norm=1.3885
[iter 100] loss=11.2077 val_loss=11.3281 scale=2.0000 norm=0.8549
== Early stopping achieved.
== Best iteration / VAL151 (val_loss=11.1937)
[iter 0] loss=12.6617 val_loss=12.4476 scale=2.0000 norm=1.3854
[iter 100] loss=11.2260 val_loss=11.3371 scale=2.0000 norm=0.8317
== Early stopping achieved.
== Best iteration / VAL139 (val_loss=11.2293)
[iter 0] loss=12.6319 val_loss=12.4420 scale=2.0000 norm=1.4309
[iter 100] loss=11.1825 val_loss=11.3644 scale=2.0000 norm=0.8396
== Early stopping achieved.
== Best iteration / VAL138 (val_loss=11.2662)
[iter 0] loss=12.6411 val_loss=12.4367 scale=2.0000 norm=1.4103
[iter 100] loss=11.1732 val_loss=11.

[32m[I 2021-07-18 06:49:00,390][0m Trial 0 finished with value: -868366768.6671002 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 34809, 'minibatch_frac': 0.4255325705603149, 'learning_rate': 0.01503813231997811}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL145 (val_loss=11.2565)
[iter 0] loss=13.0962 val_loss=13.1064 scale=2.0000 norm=0.6594
[iter 100] loss=13.0739 val_loss=13.0754 scale=2.0000 norm=0.4246
[iter 200] loss=13.0451 val_loss=13.0601 scale=2.0000 norm=0.2778
[iter 300] loss=13.0391 val_loss=13.0533 scale=2.0000 norm=0.1832
[iter 400] loss=13.0225 val_loss=13.0500 scale=2.0000 norm=0.1175
[iter 500] loss=13.0364 val_loss=13.0483 scale=2.0000 norm=0.0858
[iter 600] loss=13.0144 val_loss=13.0474 scale=2.0000 norm=0.0598
[iter 700] loss=13.0409 val_loss=13.0469 scale=2.0000 norm=0.0427
[iter 800] loss=13.0297 val_loss=13.0467 scale=2.0000 norm=0.0341
[iter 900] loss=13.0652 val_loss=13.0465 scale=2.0000 norm=0.0233
[iter 1000] loss=13.0304 val_loss=13.0464 scale=2.0000 norm=0.0178
[iter 1100] loss=13.0230 val_loss=13.0463 scale=2.0000 norm=0.0132
[iter 1200] loss=13.0583 val_loss=13.0462 scale=2.0000 norm=0.0112
[iter 1300] loss=13.0277 val_loss=13.0462 scale=2.0000 norm=0.0100


[32m[I 2021-07-18 06:50:08,370][0m Trial 1 finished with value: -979262513.1644522 and parameters: {'base_learner': 'DecTree_depthNone', 'Dist': 'Exponential', 'n_estimators': 45501, 'minibatch_frac': 0.5274111022795059, 'learning_rate': 0.0024473620880182062}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL1562 (val_loss=13.0471)
[iter 0] loss=12.5444 val_loss=12.4611 scale=1.0000 norm=0.6429
[iter 100] loss=12.1997 val_loss=12.2303 scale=1.0000 norm=0.4838
[iter 200] loss=11.9278 val_loss=12.0064 scale=1.0000 norm=0.4547
[iter 300] loss=11.5863 val_loss=11.7332 scale=1.0000 norm=0.4260
[iter 400] loss=11.3745 val_loss=11.5609 scale=2.0000 norm=0.8600
[iter 500] loss=11.2197 val_loss=11.4931 scale=2.0000 norm=0.8468
== Early stopping achieved.
== Best iteration / VAL553 (val_loss=11.4853)
[iter 0] loss=12.5640 val_loss=12.4652 scale=1.0000 norm=0.6622
[iter 100] loss=12.1965 val_loss=12.2100 scale=1.0000 norm=0.4838
[iter 200] loss=11.9433 val_loss=11.9886 scale=1.0000 norm=0.4577
[iter 300] loss=11.6453 val_loss=11.7440 scale=2.0000 norm=0.8716
[iter 400] loss=11.4019 val_loss=11.5469 scale=1.0000 norm=0.4302
[iter 500] loss=11.2341 val_loss=11.4584 scale=1.0000 norm=0.4268
[iter 600] loss=11.1137 val_loss=11.4330 scale=1.0000 norm=0.41

[32m[I 2021-07-18 06:50:29,177][0m Trial 2 finished with value: -983834344.9944637 and parameters: {'base_learner': 'DecTree_depth2', 'Dist': 'LogNormal', 'n_estimators': 49249, 'minibatch_frac': 0.739419870470148, 'learning_rate': 0.0065552126767597155}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL577 (val_loss=11.4883)
[iter 0] loss=12.5497 val_loss=12.4647 scale=1.0000 norm=0.6605
[iter 100] loss=12.4189 val_loss=12.3683 scale=1.0000 norm=0.5379
[iter 200] loss=12.2675 val_loss=12.2608 scale=2.0000 norm=0.9671
[iter 300] loss=12.1518 val_loss=12.1651 scale=2.0000 norm=0.9089
[iter 400] loss=12.0013 val_loss=12.0743 scale=2.0000 norm=0.8950
[iter 500] loss=11.9272 val_loss=11.9845 scale=2.0000 norm=0.8831
[iter 600] loss=11.8061 val_loss=11.8987 scale=2.0000 norm=0.8814
[iter 700] loss=11.7190 val_loss=11.8170 scale=2.0000 norm=0.8777
[iter 800] loss=11.6326 val_loss=11.7363 scale=2.0000 norm=0.8803
[iter 900] loss=11.5451 val_loss=11.6600 scale=2.0000 norm=0.8773
[iter 1000] loss=11.4301 val_loss=11.5882 scale=2.0000 norm=0.8681
[iter 1100] loss=11.3316 val_loss=11.5220 scale=2.0000 norm=0.8707
[iter 1200] loss=11.2752 val_loss=11.4623 scale=2.0000 norm=0.8635
[iter 1300] loss=11.1814 val_loss=11.4115 scale=2.0000 norm=0.8457


[32m[I 2021-07-18 06:51:47,278][0m Trial 3 finished with value: -937349989.6176481 and parameters: {'base_learner': 'DecTree_depth5', 'Dist': 'LogNormal', 'n_estimators': 2754, 'minibatch_frac': 0.6032950433318143, 'learning_rate': 0.0010909736641288672}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL1586 (val_loss=11.4406)
[iter 0] loss=12.5422 val_loss=12.3177 scale=2.0000 norm=1.2897
== Early stopping achieved.
== Best iteration / VAL17 (val_loss=11.5755)
[iter 0] loss=12.5471 val_loss=12.3472 scale=2.0000 norm=1.3015
== Early stopping achieved.
== Best iteration / VAL20 (val_loss=11.5335)
[iter 0] loss=12.5942 val_loss=12.3472 scale=2.0000 norm=1.2943
== Early stopping achieved.
== Best iteration / VAL18 (val_loss=11.4998)
[iter 0] loss=12.5470 val_loss=12.3353 scale=2.0000 norm=1.3166
== Early stopping achieved.
== Best iteration / VAL14 (val_loss=11.8198)
[iter 0] loss=12.5448 val_loss=12.3628 scale=2.0000 norm=1.3060


[32m[I 2021-07-18 06:51:49,893][0m Trial 4 finished with value: -1176152039.8901126 and parameters: {'base_learner': 'DecTree_depthNone', 'Dist': 'LogNormal', 'n_estimators': 40941, 'minibatch_frac': 0.6798412514071572, 'learning_rate': 0.07213063142382377}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL17 (val_loss=11.6373)
[iter 0] loss=13.1019 val_loss=13.1017 scale=1.0000 norm=0.3263
[iter 100] loss=13.0437 val_loss=13.0461 scale=2.0000 norm=0.1369
== Early stopping achieved.
== Best iteration / VAL144 (val_loss=13.0457)
[iter 0] loss=13.1001 val_loss=13.1023 scale=1.0000 norm=0.3252
[iter 100] loss=13.0381 val_loss=13.0470 scale=2.0000 norm=0.1388
== Early stopping achieved.
== Best iteration / VAL127 (val_loss=13.0468)
[iter 0] loss=13.1293 val_loss=13.1024 scale=1.0000 norm=0.3452
[iter 100] loss=13.0562 val_loss=13.0464 scale=2.0000 norm=0.1363
== Early stopping achieved.
== Best iteration / VAL128 (val_loss=13.0462)
[iter 0] loss=13.0893 val_loss=13.1019 scale=1.0000 norm=0.3277
[iter 100] loss=13.0263 val_loss=13.0490 scale=2.0000 norm=0.1275
== Early stopping achieved.
== Best iteration / VAL110 (val_loss=13.0488)
[iter 0] loss=13.0980 val_loss=13.1033 scale=1.0000 norm=0.3309


[32m[I 2021-07-18 06:51:53,493][0m Trial 5 finished with value: -993829200.3419892 and parameters: {'base_learner': 'DecTree_depth2', 'Dist': 'Exponential', 'n_estimators': 26104, 'minibatch_frac': 0.656091511419505, 'learning_rate': 0.05539786127593042}. Best is trial 0 with value: -868366768.6671002.[0m


[iter 100] loss=13.0247 val_loss=13.0483 scale=2.0000 norm=0.1369
== Early stopping achieved.
== Best iteration / VAL103 (val_loss=13.0482)
[iter 0] loss=12.5395 val_loss=12.4520 scale=1.0000 norm=0.6401
[iter 100] loss=11.6787 val_loss=11.7415 scale=1.0000 norm=0.4371
== Early stopping achieved.
== Best iteration / VAL189 (val_loss=11.4523)
[iter 0] loss=12.5590 val_loss=12.4576 scale=1.0000 norm=0.6591
[iter 100] loss=11.6626 val_loss=11.7742 scale=1.0000 norm=0.4493
== Early stopping achieved.
== Best iteration / VAL187 (val_loss=11.5106)
[iter 0] loss=12.5993 val_loss=12.4613 scale=1.0000 norm=0.6533
[iter 100] loss=11.6465 val_loss=11.7110 scale=2.0000 norm=0.8529
== Early stopping achieved.
== Best iteration / VAL187 (val_loss=11.4577)
[iter 0] loss=12.5525 val_loss=12.4579 scale=1.0000 norm=0.6653
[iter 100] loss=11.6424 val_loss=11.8175 scale=2.0000 norm=0.8911
== Early stopping achieved.
== Best iteration / VAL172 (val_loss=11.6082)
[iter 0] loss=12.5271 val_loss=12.4606 scale

[32m[I 2021-07-18 06:52:00,780][0m Trial 6 finished with value: -1011124625.7909935 and parameters: {'base_learner': 'DecTree_depth2', 'Dist': 'LogNormal', 'n_estimators': 461, 'minibatch_frac': 0.7500226410655508, 'learning_rate': 0.018793167920555572}. Best is trial 0 with value: -868366768.6671002.[0m


[iter 200] loss=11.1096 val_loss=11.4895 scale=2.0000 norm=0.8324
== Early stopping achieved.
== Best iteration / VAL199 (val_loss=11.4888)
[iter 0] loss=12.7216 val_loss=12.5879 scale=1.0000 norm=59225.2607
== Early stopping achieved.
== Best iteration / VAL86 (val_loss=11.4986)
[iter 0] loss=12.7250 val_loss=12.5833 scale=1.0000 norm=58981.9516
== Early stopping achieved.
== Best iteration / VAL88 (val_loss=11.4267)
[iter 0] loss=12.7688 val_loss=12.6007 scale=1.0000 norm=62890.3923
== Early stopping achieved.
== Best iteration / VAL79 (val_loss=11.5669)
[iter 0] loss=12.7339 val_loss=12.5789 scale=1.0000 norm=58495.7370
== Early stopping achieved.
== Best iteration / VAL82 (val_loss=11.5426)
[iter 0] loss=12.6821 val_loss=12.5654 scale=1.0000 norm=57579.2490


[32m[I 2021-07-18 06:52:05,644][0m Trial 7 finished with value: -1005447683.968869 and parameters: {'base_learner': 'DecTree_depth5', 'Dist': 'Normal', 'n_estimators': 42533, 'minibatch_frac': 0.8917603080121715, 'learning_rate': 0.02042941296412333}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL87 (val_loss=11.3737)
[iter 0] loss=13.0990 val_loss=13.1065 scale=1.0000 norm=0.3212
[iter 100] loss=13.0861 val_loss=13.0776 scale=1.0000 norm=0.2201
[iter 200] loss=13.0591 val_loss=13.0625 scale=2.0000 norm=0.2769
[iter 300] loss=13.0485 val_loss=13.0544 scale=2.0000 norm=0.1992
[iter 400] loss=13.0398 val_loss=13.0513 scale=2.0000 norm=0.1737
[iter 500] loss=13.0439 val_loss=13.0500 scale=2.0000 norm=0.1666
[iter 600] loss=13.0228 val_loss=13.0494 scale=2.0000 norm=0.1557
[iter 700] loss=13.0410 val_loss=13.0490 scale=2.0000 norm=0.1504
[iter 800] loss=13.0412 val_loss=13.0487 scale=2.0000 norm=0.1452
[iter 900] loss=13.0447 val_loss=13.0485 scale=2.0000 norm=0.1365
[iter 1000] loss=13.0362 val_loss=13.0484 scale=1.0000 norm=0.0665
== Early stopping achieved.
== Best iteration / VAL1022 (val_loss=13.0484)
[iter 0] loss=13.0973 val_loss=13.1065 scale=1.0000 norm=0.3200
[iter 100] loss=13.0794 val_loss=13.0749 scale=2.0000 norm=0.42

[32m[I 2021-07-18 06:52:30,880][0m Trial 8 finished with value: -993420832.9245838 and parameters: {'base_learner': 'DecTree_depth2', 'Dist': 'Exponential', 'n_estimators': 41943, 'minibatch_frac': 0.6967741763386246, 'learning_rate': 0.004985913883887544}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL984 (val_loss=13.0482)
[iter 0] loss=12.6789 val_loss=12.5477 scale=2.0000 norm=117449.9838
== Early stopping achieved.
== Best iteration / VAL47 (val_loss=11.4285)
[iter 0] loss=12.6866 val_loss=12.5360 scale=2.0000 norm=117817.6941
== Early stopping achieved.
== Best iteration / VAL47 (val_loss=11.3558)
[iter 0] loss=12.7516 val_loss=12.5612 scale=2.0000 norm=127281.1614
== Early stopping achieved.
== Best iteration / VAL44 (val_loss=11.4813)
[iter 0] loss=12.6980 val_loss=12.5369 scale=2.0000 norm=117565.2260
== Early stopping achieved.
== Best iteration / VAL46 (val_loss=11.4897)
[iter 0] loss=12.6740 val_loss=12.5239 scale=2.0000 norm=116204.7415


[32m[I 2021-07-18 06:53:18,334][0m Trial 9 finished with value: -911117485.5259203 and parameters: {'base_learner': 'GradientBoost_depth5', 'Dist': 'Normal', 'n_estimators': 44067, 'minibatch_frac': 0.7457541934249898, 'learning_rate': 0.03379871392671105}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL49 (val_loss=11.3478)
[iter 0] loss=12.5746 val_loss=12.4443 scale=2.0000 norm=1.3562
[iter 100] loss=11.4145 val_loss=11.4820 scale=2.0000 norm=0.8739
== Early stopping achieved.
== Best iteration / VAL189 (val_loss=11.2110)
[iter 0] loss=12.5930 val_loss=12.4456 scale=2.0000 norm=1.3958
[iter 100] loss=11.4182 val_loss=11.4903 scale=2.0000 norm=0.8835
[iter 200] loss=10.9178 val_loss=11.2022 scale=2.0000 norm=0.8812
== Early stopping achieved.
== Best iteration / VAL194 (val_loss=11.1998)
[iter 0] loss=12.6687 val_loss=12.4537 scale=2.0000 norm=1.3949
[iter 100] loss=11.4325 val_loss=11.4971 scale=2.0000 norm=0.8549
== Early stopping achieved.
== Best iteration / VAL180 (val_loss=11.2384)
[iter 0] loss=12.6384 val_loss=12.4513 scale=2.0000 norm=1.4428
[iter 100] loss=11.3917 val_loss=11.5162 scale=2.0000 norm=0.8580
== Early stopping achieved.
== Best iteration / VAL175 (val_loss=11.2713)
[iter 0] loss=12.6601 val_loss=12.4413 scale=

[32m[I 2021-07-18 06:53:58,944][0m Trial 10 finished with value: -872849504.6309563 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 27940, 'minibatch_frac': 0.40496666318149277, 'learning_rate': 0.011729289815382935}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL181 (val_loss=11.2565)
[iter 0] loss=12.5634 val_loss=12.4436 scale=2.0000 norm=1.3483
[iter 100] loss=11.3464 val_loss=11.4315 scale=2.0000 norm=0.8641
== Early stopping achieved.
== Best iteration / VAL168 (val_loss=11.2251)
[iter 0] loss=12.5797 val_loss=12.4465 scale=2.0000 norm=1.3829
[iter 100] loss=11.3428 val_loss=11.4333 scale=2.0000 norm=0.8748
== Early stopping achieved.
== Best iteration / VAL179 (val_loss=11.2110)
[iter 0] loss=12.6533 val_loss=12.4525 scale=2.0000 norm=1.3820
[iter 100] loss=11.3573 val_loss=11.4368 scale=2.0000 norm=0.8516
== Early stopping achieved.
== Best iteration / VAL160 (val_loss=11.2473)
[iter 0] loss=12.6220 val_loss=12.4516 scale=2.0000 norm=1.4261
[iter 100] loss=11.3135 val_loss=11.4657 scale=2.0000 norm=0.8546
== Early stopping achieved.
== Best iteration / VAL160 (val_loss=11.2801)
[iter 0] loss=12.6445 val_loss=12.4400 scale=2.0000 norm=1.4153
[iter 100] loss=11.2983 val_loss=11.4513 scale

[32m[I 2021-07-18 06:54:37,660][0m Trial 11 finished with value: -887215769.2003676 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 26987, 'minibatch_frac': 0.41827308439470645, 'learning_rate': 0.012773172253446092}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL168 (val_loss=11.2584)
[iter 0] loss=12.5764 val_loss=12.4532 scale=2.0000 norm=1.3576
[iter 100] loss=11.7232 val_loss=11.7576 scale=2.0000 norm=0.8936
[iter 200] loss=11.2391 val_loss=11.3350 scale=2.0000 norm=0.8873
== Early stopping achieved.
== Best iteration / VAL289 (val_loss=11.2164)
[iter 0] loss=12.5949 val_loss=12.4549 scale=2.0000 norm=1.3974
[iter 100] loss=11.7320 val_loss=11.7656 scale=2.0000 norm=0.8997
[iter 200] loss=11.2394 val_loss=11.3394 scale=2.0000 norm=0.8662
[iter 300] loss=10.9227 val_loss=11.2062 scale=2.0000 norm=0.7968
== Early stopping achieved.
== Best iteration / VAL293 (val_loss=11.2059)
[iter 0] loss=12.6707 val_loss=12.4606 scale=2.0000 norm=1.3964
[iter 100] loss=11.7582 val_loss=11.7765 scale=2.0000 norm=0.8773
[iter 200] loss=11.2109 val_loss=11.3447 scale=2.0000 norm=0.8454
== Early stopping achieved.
== Best iteration / VAL283 (val_loss=11.2314)
[iter 0] loss=12.6404 val_loss=12.4573 scale=2.000

[32m[I 2021-07-18 06:55:38,365][0m Trial 12 finished with value: -891300848.9282259 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 18570, 'minibatch_frac': 0.40385848761655185, 'learning_rate': 0.007503150947677386}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL289 (val_loss=11.2623)
[iter 0] loss=12.5661 val_loss=12.4602 scale=2.0000 norm=1.3362
[iter 100] loss=12.0954 val_loss=12.1106 scale=2.0000 norm=0.9116
[iter 200] loss=11.8122 val_loss=11.8553 scale=2.0000 norm=0.8939
[iter 300] loss=11.5496 val_loss=11.6307 scale=2.0000 norm=0.8734
[iter 400] loss=11.3301 val_loss=11.4494 scale=2.0000 norm=0.8623
[iter 500] loss=11.1899 val_loss=11.3207 scale=2.0000 norm=0.8458
[iter 600] loss=10.9799 val_loss=11.2515 scale=2.0000 norm=0.8494
== Early stopping achieved.
== Best iteration / VAL670 (val_loss=11.2351)
[iter 0] loss=12.5797 val_loss=12.4635 scale=2.0000 norm=1.3629
[iter 100] loss=12.1065 val_loss=12.1145 scale=2.0000 norm=0.9227
[iter 200] loss=11.8197 val_loss=11.8585 scale=2.0000 norm=0.9036
[iter 300] loss=11.5713 val_loss=11.6327 scale=2.0000 norm=0.8721
[iter 400] loss=11.3377 val_loss=11.4481 scale=2.0000 norm=0.8664
[iter 500] loss=11.1835 val_loss=11.3136 scale=2.0000 norm=0.827

[32m[I 2021-07-18 06:58:23,751][0m Trial 13 finished with value: -888160635.7218754 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 32993, 'minibatch_frac': 0.4824556089595179, 'learning_rate': 0.0031380954406529694}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL674 (val_loss=11.2734)
[iter 0] loss=12.5496 val_loss=12.4057 scale=2.0000 norm=1.3247
== Early stopping achieved.
== Best iteration / VAL65 (val_loss=11.2164)
[iter 0] loss=12.5625 val_loss=12.4070 scale=2.0000 norm=1.3498
== Early stopping achieved.
== Best iteration / VAL64 (val_loss=11.2036)
[iter 0] loss=12.6297 val_loss=12.4144 scale=2.0000 norm=1.3386
== Early stopping achieved.
== Best iteration / VAL61 (val_loss=11.2295)
[iter 0] loss=12.5899 val_loss=12.4083 scale=2.0000 norm=1.3704
== Early stopping achieved.
== Best iteration / VAL58 (val_loss=11.2826)
[iter 0] loss=12.5842 val_loss=12.4052 scale=2.0000 norm=1.3435


[32m[I 2021-07-18 06:58:41,033][0m Trial 14 finished with value: -893760424.0994594 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 12880, 'minibatch_frac': 0.5010206591472192, 'learning_rate': 0.03355684893970818}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL63 (val_loss=11.2551)
[iter 0] loss=12.7201 val_loss=12.5750 scale=2.0000 norm=118528.1665
[iter 100] loss=11.2810 val_loss=11.6328 scale=2.0000 norm=22579.3879
== Early stopping achieved.
== Best iteration / VAL108 (val_loss=11.6226)
[iter 0] loss=12.7221 val_loss=12.5651 scale=2.0000 norm=118044.3562
[iter 100] loss=11.2616 val_loss=11.4914 scale=2.0000 norm=21825.9255
== Early stopping achieved.
== Best iteration / VAL124 (val_loss=11.4380)
[iter 0] loss=12.7715 val_loss=12.5881 scale=2.0000 norm=126495.4393
[iter 100] loss=11.3200 val_loss=11.5707 scale=2.0000 norm=24012.4486
== Early stopping achieved.
== Best iteration / VAL119 (val_loss=11.5311)
[iter 0] loss=12.7389 val_loss=12.5643 scale=2.0000 norm=118308.1225
[iter 100] loss=11.2238 val_loss=11.6768 scale=2.0000 norm=20332.1220
== Early stopping achieved.
== Best iteration / VAL99 (val_loss=11.6764)
[iter 0] loss=12.6875 val_loss=12.5541 scale=2.0000 norm=116132.5832
[iter 1

[32m[I 2021-07-18 06:59:44,825][0m Trial 15 finished with value: -1002927120.8404438 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'Normal', 'n_estimators': 35713, 'minibatch_frac': 0.9414549020007512, 'learning_rate': 0.014726569441982914}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL127 (val_loss=11.4352)
[iter 0] loss=12.5764 val_loss=12.4114 scale=2.0000 norm=1.3515
== Early stopping achieved.
== Best iteration / VAL62 (val_loss=11.2500)
[iter 0] loss=12.5932 val_loss=12.4085 scale=2.0000 norm=1.3850
== Early stopping achieved.
== Best iteration / VAL66 (val_loss=11.2199)
[iter 0] loss=12.6680 val_loss=12.4164 scale=2.0000 norm=1.3855
== Early stopping achieved.
== Best iteration / VAL64 (val_loss=11.2325)
[iter 0] loss=12.6393 val_loss=12.4181 scale=2.0000 norm=1.4320
== Early stopping achieved.
== Best iteration / VAL62 (val_loss=11.3033)
[iter 0] loss=12.6438 val_loss=12.4015 scale=2.0000 norm=1.4113


[32m[I 2021-07-18 07:00:00,747][0m Trial 16 finished with value: -887966273.9122204 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 18617, 'minibatch_frac': 0.4281498025125121, 'learning_rate': 0.03248804347663259}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL62 (val_loss=11.2824)
[iter 0] loss=12.5522 val_loss=12.4453 scale=2.0000 norm=1.3297
[iter 100] loss=11.4256 val_loss=11.5553 scale=2.0000 norm=0.9082
== Early stopping achieved.
== Best iteration / VAL171 (val_loss=11.2773)
[iter 0] loss=12.5486 val_loss=12.4476 scale=2.0000 norm=1.3279
[iter 100] loss=11.4353 val_loss=11.5542 scale=2.0000 norm=0.9093
== Early stopping achieved.
== Best iteration / VAL182 (val_loss=11.2239)
[iter 0] loss=12.6120 val_loss=12.4538 scale=2.0000 norm=1.3224
[iter 100] loss=11.4697 val_loss=11.5785 scale=2.0000 norm=0.9117
== Early stopping achieved.
== Best iteration / VAL171 (val_loss=11.2978)
[iter 0] loss=12.5706 val_loss=12.4496 scale=2.0000 norm=1.3555
[iter 100] loss=11.4119 val_loss=11.6214 scale=2.0000 norm=0.9100
== Early stopping achieved.
== Best iteration / VAL161 (val_loss=11.3937)
[iter 0] loss=12.5752 val_loss=12.4508 scale=2.0000 norm=1.3426
[iter 100] loss=11.4024 val_loss=11.5746 scale=

[32m[I 2021-07-18 07:01:34,456][0m Trial 17 finished with value: -878151935.1915381 and parameters: {'base_learner': 'GradientBoost_depth5', 'Dist': 'LogNormal', 'n_estimators': 33040, 'minibatch_frac': 0.5777341650804667, 'learning_rate': 0.00983752593537482}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL168 (val_loss=11.3173)
[iter 0] loss=12.5699 val_loss=12.4585 scale=2.0000 norm=1.3393
[iter 100] loss=11.9867 val_loss=12.0071 scale=2.0000 norm=0.9009
[iter 200] loss=11.6271 val_loss=11.6782 scale=2.0000 norm=0.8890
[iter 300] loss=11.3045 val_loss=11.4249 scale=2.0000 norm=0.8480
[iter 400] loss=11.0664 val_loss=11.2725 scale=2.0000 norm=0.8359
[iter 500] loss=10.9302 val_loss=11.2278 scale=2.0000 norm=0.8390
== Early stopping achieved.
== Best iteration / VAL500 (val_loss=11.2278)
[iter 0] loss=12.5844 val_loss=12.4605 scale=2.0000 norm=1.3659
[iter 100] loss=11.9989 val_loss=12.0168 scale=2.0000 norm=0.9117
[iter 200] loss=11.6395 val_loss=11.6894 scale=2.0000 norm=0.8954
[iter 300] loss=11.3230 val_loss=11.4304 scale=2.0000 norm=0.8429
[iter 400] loss=11.0748 val_loss=11.2696 scale=2.0000 norm=0.8245
[iter 500] loss=10.9334 val_loss=11.2101 scale=2.0000 norm=0.7991
== Early stopping achieved.
== Best iteration / VAL515 (val_loss

[32m[I 2021-07-18 07:03:32,753][0m Trial 18 finished with value: -887414800.4907106 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'LogNormal', 'n_estimators': 20406, 'minibatch_frac': 0.467545359993562, 'learning_rate': 0.004314063438935396}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL487 (val_loss=11.2670)
[iter 0] loss=13.0899 val_loss=13.1066 scale=2.0000 norm=0.6528
[iter 100] loss=13.0944 val_loss=13.0820 scale=2.0000 norm=0.4850
[iter 200] loss=13.0675 val_loss=13.0680 scale=2.0000 norm=0.3567
[iter 300] loss=13.0561 val_loss=13.0598 scale=2.0000 norm=0.2778
[iter 400] loss=13.0381 val_loss=13.0547 scale=2.0000 norm=0.2105
[iter 500] loss=13.0440 val_loss=13.0516 scale=2.0000 norm=0.1873
[iter 600] loss=13.0238 val_loss=13.0497 scale=2.0000 norm=0.1658
[iter 700] loss=13.0488 val_loss=13.0484 scale=2.0000 norm=0.1455
[iter 800] loss=13.0305 val_loss=13.0476 scale=2.0000 norm=0.1466
[iter 900] loss=13.0567 val_loss=13.0469 scale=2.0000 norm=0.1257
[iter 1000] loss=13.0272 val_loss=13.0465 scale=4.0000 norm=0.2445
[iter 1100] loss=13.0275 val_loss=13.0462 scale=4.0000 norm=0.2273
[iter 1200] loss=13.0519 val_loss=13.0460 scale=4.0000 norm=0.2240
[iter 1300] loss=13.0400 val_loss=13.0459 scale=4.0000 norm=0.2321


[32m[I 2021-07-18 07:07:26,741][0m Trial 19 finished with value: -926847209.8508819 and parameters: {'base_learner': 'GradientBoost_depth2', 'Dist': 'Exponential', 'n_estimators': 10905, 'minibatch_frac': 0.5650347385428828, 'learning_rate': 0.0017143428470539858}. Best is trial 0 with value: -868366768.6671002.[0m


== Early stopping achieved.
== Best iteration / VAL1317 (val_loss=13.0463)
[iter 0] loss=12.5986 val_loss=12.4423 scale=2.0000 norm=1.3468
[iter 100] loss=11.2160 val_loss=11.3539 scale=2.0000 norm=0.8273
== Early stopping achieved.
== Best iteration / VAL146 (val_loss=11.2400)
Started Predict with Ngboost at 07:07:35.
The R2 score is 0.8894877021795203
The MAE score is 15826.206975129666
The Median absolute error score is 10269.211140466694
The MSE score is 23323.225503848906
The RMSE score is 543972847.903388


In [7]:
# Save pipeline
save_to_production(housing_ml, file_name='housing_automl_instance')

# Predict on new data
In the beginning we kept a holdout dataset. We use this to simulate prediction on completely new data.

In [8]:
# load stored pipeline
housing_ml_loaded = load_for_production(file_name='housing_automl_instance')

In [9]:
# predict on new data
housing_ml_loaded.ml_bp14_regressions_full_processing_ngboost(val_df, preprocessing_type='full')

# access predicted labels
val_y_hat = housing_ml_loaded.predicted_values['ngboost']

Started Execute test train split at 07:07:35.
Started Apply datetime transformation at 07:07:35.
Started Handle rare features at 07:07:35.
Started Remove cardinality at 07:07:35.
Started Onehot + PCA categorical features at 07:07:35.
Started Execute categorical encoding at 07:07:35.
Started  Delete columns with high share of NULLs at 07:07:35.
Started Fill nulls at 07:07:35.
Started Execute numerical binning at 07:07:35.
Started Handle outliers at 07:07:36.
Started Remove collinearity at 07:07:36.
Started Execute clustering as a feature at 07:07:36.
Started Execute clustering as a feature at 07:07:36.
Started Execute clustering as a feature at 07:07:36.
Started Execute clustering as a feature at 07:07:36.
Started Execute clustering as a feature at 07:07:36.
Started Execute clustering as a feature at 07:07:37.
Started Execute clustering as a feature at 07:07:37.
Started Execute clustering as a feature at 07:07:37.
Started Execute clustering as a feature at 07:07:37.
Started Select best 

In [10]:
# Assess prediction quality on holdout data
mae = mean_absolute_error(val_df_target, val_y_hat)
print(mae)

16054.641582630808
