# ML-box Avocado

#### Goal :

- Create a ML model using Auto-sklearn for the Avocado dataset
- Get RMSE over the predictions of these model

#### Imports

In [1]:
import numpy as np
import pandas as pd
import sklearn.metrics
from math import sqrt
from mlbox.preprocessing import Reader
from mlbox.preprocessing import Drift_thresholder
from mlbox.optimisation import make_scorer
from mlbox.optimisation import Optimiser
from mlbox.prediction import Predictor

Paths to the train set and the test set.

In [2]:
paths = ["../../Data/avocado_price/processed/train.csv","../../Data/avocado_price/processed/x_test.csv"]

Name of the feature to predict. This columns should only be present in the train set.

In [3]:
target_name = "AveragePrice"

Reading and cleaning all files. Declare a reader for csv files

In [4]:
rd = Reader(sep=',')

Return a dictionnary containing three entries:
- dict["train"] contains training samples withtout target columns
- dict["test"] contains testing elements withtout target columns
- dict["target"] contains target columns for training samples.

In [5]:
data = rd.train_test_split(paths, target_name)


reading csv : train.csv ...
cleaning data ...
CPU time: 0.3984375 seconds

reading csv : x_test.csv ...
cleaning data ...
CPU time: 0.1798863410949707 seconds

> Number of common features : 14

gathering and crunching for train and test datasets ...
reindexing for train and test datasets ...
dropping training duplicates ...
dropping constant variables on training set ...

> Number of categorical features: 0
> Number of numerical features: 14
> Number of training samples : 12226
> Number of test samples : 6023

> You have no missing values on train set...

> Task : regression
count    12226.000000
mean       140.106658
std         40.259655
min         44.000000
25%        110.000000
50%        137.000000
75%        166.000000
max        312.000000
Name: AveragePrice, dtype: float64


Removing the drifting variables

In [6]:
dft = Drift_thresholder()
data = dft.fit_transform(data)


computing drifts ...
CPU time: 1.0780882835388184 seconds

> Top 10 drifts

('year', 0.01652095766931816)
('Month', 0.014052133375274112)
('Total Volume', 0.011363754825062689)
('Day', 0.009383030176192797)
('region', 0.007258513221157337)
('type_organic', 0.006209422086088434)
('type_conventional', 0.005249107658505725)
('4225', 0.0039422234659385325)
('Large Bags', 0.003932261120233038)
('Small Bags', 0.002992500014094013)

> Deleted variables : []
> Drift coefficients dumped into directory : save


Tuning

In [7]:
mape = make_scorer(lambda y_true,
                   y_pred: 100*np.sum(
                                      np.abs(y_true-y_pred)/y_true
                                      )/len(y_true),
                   greater_is_better=False,
                   needs_proba=False)

Declare an optimiser

In [8]:
opt = Optimiser(scoring="mean_squared_error", n_folds=3)

  +str(self.to_path)+"/joblib'. Please clear it regularly.")


Space of hyperparameters

In [9]:
space = {
        'ne__numerical_strategy': {"search": "choice",
                                   "space": [0]},
        'ce__strategy': {"search": "choice",
                         "space": ["label_encoding",
                                   "random_projection",
                                   "entity_embedding"]},
        'fs__threshold': {"search": "uniform",
                          "space": [0.01, 0.3]},
        'est__max_depth': {"search": "choice",
                           "space": [3, 4, 5, 6, 7]}

        }

Optimises hyper-parameters of the whole Pipeline

In [10]:
best = opt.optimise(space,data,40)
print("Final results :" ,opt.evaluate(best, data))

##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}
>>> CA ENCODER :{'strategy': 'entity_embedding'}      
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.22360205192383217}
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': True, 'subsample': 0.9, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'nthread': -1, 'seed': 0}
  0%|          | 0/40 [00:00<?, ?trial/s, best loss=?]


  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 3.6697702407836914 seconds                  
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}    
>>> CA ENCODER :{'strategy': 'entity_embedding'}                               
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.25070684648745273}     
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent'

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                       
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 3.342552423477173 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}    
>>> CA ENCODER :{'strategy': 'label_encoding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.24002696950514701}     
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': Non

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                       
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 2.624854803085327 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}    
>>> CA ENCODER :{'strategy': 'random_projection'}                              
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.18645325184282588}     
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': Non

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                       
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 3.194257974624634 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}    
>>> CA ENCODER :{'strategy': 'entity_embedding'}                               
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.10323443193697357}     
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 5, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': None

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -191.28734114218005                      
VARIANCE : 8.034441749677725 (fold 1 = -180.50057454464076, fold 2 = -199.7730748611163, fold 3 = -193.5883740207831)
CPU time: 5.117115020751953 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}     
>>> CA ENCODER :{'strategy': 'label_encoding'}                                  
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.238535705301914}        
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': N

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                        
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 2.4201340675354004 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}     
>>> CA ENCODER :{'strategy': 'label_encoding'}                                  
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.1770878869849372}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state'

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                        
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 3.330127239227295 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}     
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.21682003732525884}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state':

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                        
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 3.7069294452667236 seconds                                            
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}     
>>> CA ENCODER :{'strategy': 'random_projection'}                               
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.10404794312844765}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 5, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state':

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -191.28734114218005                       
VARIANCE : 8.034441749677725 (fold 1 = -180.50057454464076, fold 2 = -199.7730748611163, fold 3 = -193.5883740207831)
CPU time: 4.601433992385864 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}     
>>> CA ENCODER :{'strategy': 'random_projection'}                               
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.29725397023761374}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 6, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state':

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -174.67717672441282                       
VARIANCE : 5.501864652483698 (fold 1 = -166.9035751198656, fold 2 = -178.27395933465385, fold 3 = -178.85399571871903)
CPU time: 5.422521114349365 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.1379623653208513}        
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                         
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 4.032430648803711 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.2110423765948858}        
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                         
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 3.180536985397339 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.10867804895846078}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 6, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -174.67717672441282                        
VARIANCE : 5.501864652483698 (fold 1 = -166.9035751198656, fold 2 = -178.27395933465385, fold 3 = -178.85399571871903)
CPU time: 5.419253587722778 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.03358825027132237}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 5, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -191.28734114218005                        
VARIANCE : 8.034441749677725 (fold 1 = -180.50057454464076, fold 2 = -199.7730748611163, fold 3 = -193.5883740207831)
CPU time: 5.258180379867554 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'random_projection'}                                
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.08619406694857305}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 5, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -191.28734114218005                        
VARIANCE : 8.034441749677725 (fold 1 = -180.50057454464076, fold 2 = -199.7730748611163, fold 3 = -193.5883740207831)
CPU time: 4.548216104507446 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.05173766265956951}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.267253637313843 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.20893771770988076}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                         
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 4.290418863296509 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.09554515317572372}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.884581804275513 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'random_projection'}                                
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.22894927729272535}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 5, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -191.28734114218005                        
VARIANCE : 8.034441749677725 (fold 1 = -180.50057454464076, fold 2 = -199.7730748611163, fold 3 = -193.5883740207831)
CPU time: 5.61267614364624 seconds                                               
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'random_projection'}                                
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.11772616371702926}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                         
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 3.5180845260620117 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.01026588700843939}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.7330663204193115 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.010968121247092276}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.60931396484375 seconds                                               
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.06077840334239019}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.907495737075806 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.05909078207274554}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.521652698516846 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.012734909559310954}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.993362188339233 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.06734643317590283}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 7.557229042053223 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.07594078564443178}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.15540623664856 seconds                                               
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.027113755265447306}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.011642694473267 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.037021905304778835}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.944048643112183 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.15389306112970028}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 6, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -174.67717672441282                        
VARIANCE : 5.501864652483698 (fold 1 = -166.9035751198656, fold 2 = -178.27395933465385, fold 3 = -178.85399571871903)
CPU time: 5.697422742843628 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.07923030965359963}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.94570779800415 seconds                                               
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.042793686359856475}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.735738515853882 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.27133504086538784}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.607211351394653 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.2730848663935386}        
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 6, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -174.67717672441282                        
VARIANCE : 5.501864652483698 (fold 1 = -166.9035751198656, fold 2 = -178.27395933465385, fold 3 = -178.85399571871903)
CPU time: 6.137502908706665 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.012450585443850743}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.912200450897217 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.2968817217728386}        
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                         
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 3.0989112854003906 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.12809548223907002}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 7, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683                        
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 5.995228052139282 seconds                                              
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.17964931401199252}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 4, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_sta

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -233.3058660049643                         
VARIANCE : 9.602981628831392 (fold 1 = -219.7384050134768, fold 2 = -240.6081386194014, fold 3 = -239.57105438201467)
CPU time: 3.5079338550567627 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'label_encoding'}                                   
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.019964469454224712}      
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 3, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_st

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -311.7605332271602                         
VARIANCE : 10.094937585700304 (fold 1 = -298.0946937094737, fold 2 = -315.0164053211818, fold 3 = -322.17050065082503)
CPU time: 2.5643367767333984 seconds                                             
##################################################### testing hyper-parameters... #####################################################
>>> NA ENCODER :{'numerical_strategy': 0, 'categorical_strategy': '<NULL>'}      
>>> CA ENCODER :{'strategy': 'entity_embedding'}                                 
>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.25306146990786754}       
>>> ESTIMATOR :{'strategy': 'LightGBM', 'max_depth': 6, 'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'split', 'learning_rate': 0.05, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 500, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_s

  positive)

  positive)

  positive)



MEAN SCORE : neg_mean_squared_error = -174.67717672441282                        
VARIANCE : 5.501864652483698 (fold 1 = -166.9035751198656, fold 2 = -178.27395933465385, fold 3 = -178.85399571871903)
CPU time: 6.229955196380615 seconds                                              
100%|██████████| 40/40 [03:20<00:00,  5.02s/trial, best loss: 170.54364618714683]


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BEST HYPER-PARAMETERS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

{'ce__strategy': 'entity_embedding', 'est__max_depth': 7, 'fs__threshold': 0.05173766265956951, 'ne__numerical_strategy': 0}

##################################################### testing hyper-parameters... ##########

  positive)
  positive)
  positive)



MEAN SCORE : neg_mean_squared_error = -170.54364618714683
VARIANCE : 4.3465765548258 (fold 1 = -164.51930673783417, fold 2 = -174.61392378439146, fold 3 = -172.4977080392149)
CPU time: 6.651137828826904 seconds

Final results : -170.54364618714683


Make a prediction

In [11]:
from mlbox.prediction import *
pred=Predictor()
pred.fit_predict(best,data)


fitting the pipeline ...


  positive)


CPU time: 2.3373866081237793 seconds

> Feature importances dumped into directory : save

predicting...
CPU time: 0.3124966621398926 seconds

> Overview on predictions : 

   AveragePrice_predicted
0              117.372834
1              210.799902
2              140.863435
3              209.266109
4              119.459608
5              131.543681
6              139.468283
7              175.802513
8              121.024673
9               96.659715

dumping predictions into directory : save ...


<mlbox.prediction.predictor.Predictor at 0x7fd19ea50bd0>

Getting the predictions and targets

In [12]:
y_pred = pd.read_csv("save/AveragePrice_predictions.csv")
predictions = y_pred.AveragePrice_predicted
y_test = pd.read_csv("../../Data/avocado_price/processed/y_test.csv")

Calculating RMSE

In [13]:
from sklearn.metrics import mean_squared_error
from math import sqrt
print("rmse score:", sqrt(mean_squared_error(y_test, predictions)))

rmse score: 12.225484120626746


## Residual Plot

In [14]:
import matplotlib.pyplot as plt

plt.scatter(predictions, predictions - y_test.AveragePrice, c="grey", label="Testing Data")
plt.legend()
plt.hlines(y=0, xmin=y_test.min(), xmax=y_test.max())
plt.title("Residual Plot")
plt.show()