Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to optimize XGBoost #384

Closed
vlavorini opened this issue Jan 29, 2018 · 4 comments
Closed

Unable to optimize XGBoost #384

vlavorini opened this issue Jan 29, 2018 · 4 comments
Assignees

Comments

@vlavorini
Copy link

Hello,

I'm trying to optimize a XGB model. I followed the example with SVD, creating this:

cs = ConfigurationSpace()
lambda_par=UniformFloatHyperparameter("lambda", 2,7, default_value=4.5)
cs.add_hyperparameters([lambda_par])

scenario = Scenario({"run_obj": "runtime",   # we optimize quality (alternatively runtime)
                 "runcount-limit": 2,  # maximum function evaluations
                 "cs": cs,               # configuration space
                 "deterministic": "true",
                 "cutoff_time": 10
                 })

def xgb_2nd(cfg):
    cfg = {k : cfg[k] for k in cfg if cfg[k]}
    print (cfg)
    model2=XGBClassifier(learning_rate=0.02, max_depth=10, subsample=0.9, n_estimators=50,
                 colsample_bytree=0.9, objective='binary:logistic',
                 seed=99,scale_pos_weight=1.4, min_child_weight=7, 
                 reg_lambda=cfg["lambda"], silent=True)
    asd=model2.fit(df_X, df_y, eval_set=watchlist2, eval_metric='auc', early_stopping_rounds=20 )
    print ("this score=" , asd.best_score)

    return 1-asd.best_score

smac = SMAC(scenario=scenario, rng=np.random.RandomState(42),
    tae_runner=xgb_2nd)

incumbent = smac.optimize()

So, at each run I'm expecting to read the score ("This score=xxxxx"), but actually I don't after the expected timeout it goes on with another value of the parameter.

But if I run the "xgb_2nd" method manyually, xgb_2nd({"lambda":4.5}), it works as expected.

Any Idea of where I'm doing wrong?

@mlindauer
Copy link
Contributor

Thanks for reporting this issue.
Could you please provide the entire example with all imports such that we can try to reproduce it on our end.

@aaronki could you please look into it.

Best,
Marius

@vlavorini
Copy link
Author

The entire example is big, but is just data pre-processing with Pandas.
Those are my import:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler, LabelEncoder, Imputer, MinMaxScaler, QuantileTransformer
from sklearn_pandas import CategoricalImputer
import umap
import seaborn as sns
import xgboost as XGBClassifier
from imblearn.over_sampling import SMOTE

from smac.configspace import ConfigurationSpace
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
    UniformFloatHyperparameter, UniformIntegerHyperparameter
from ConfigSpace.conditions import InCondition

# Import SMAC-utilities
from smac.tae.execute_func import ExecuteTAFuncDict
from smac.scenario.scenario import Scenario
from smac.facade.smac_facade import SMAC

I work within a Jupiter notebook which runs Python3.6

@vlavorini
Copy link
Author

Hello,

an update. Below you can find two sets of codes to reproduce the issue.
If I run only the first, with your optimization routines, it works well.

But If I run the second before, and then I run the optimization routine, there the issue appears.

First code:

import numpy as np

import pandas as pd
from xgboost import XGBClassifier
from smac.configspace import ConfigurationSpace
from ConfigSpace.hyperparameters import UniformFloatHyperparameter
from smac.scenario.scenario import Scenario
from smac.facade.smac_facade import SMAC



#randomly generated data
df_x=pd.DataFrame({"a":np.random.randn(1000),  "c":np.random.rand(1000)})
y=np.random.randint(0, 2, size=1000)

watchlist = [(df_x[:800],y[:800]), (df_x[800:],  y[800:])]



def xgb_opt(cfg):

    cfg = {k : cfg[k] for k in cfg if cfg[k]}
    print (cfg)
    modela=XGBClassifier(learning_rate=0.02, max_depth=10, subsample=0.9, n_estimators=50,
                 colsample_bytree=0.9, objective='binary:logistic',
                 seed=99,scale_pos_weight=1.4, min_child_weight=7, 
                 reg_lambda=cfg["lambda"], silent=True)
    asd=modela.fit(df_x[:800], y[:800], eval_set=watchlist, eval_metric='auc', early_stopping_rounds=20, verbose=False )
    print ("this score=" , asd.best_score)

    return 1-asd.best_score

cs = ConfigurationSpace()
lambda_par=UniformFloatHyperparameter("lambda", 2,7, default_value=4.5)
cs.add_hyperparameters([lambda_par])

scenario = Scenario({"run_obj": "runtime",   # we optimize quality (alternatively runtime)
                 "runcount-limit": 2,  # maximum function evaluations
                 "cs": cs,               # configuration space
                 "deterministic": "true",
                 "cutoff_time": 10
                 })

smac = SMAC(scenario=scenario, rng=np.random.RandomState(42),
    tae_runner=xgb_opt)

incumbent = smac.optimize()

Second code: if I run this before and then I run the other piece above, it shows the issue

model=XGBClassifier(learning_rate=0.02, max_depth=10, subsample=0.9, n_estimators=50,
             colsample_bytree=0.9, objective='binary:logistic',
             seed=99,scale_pos_weight=1.4, min_child_weight=7, 
             reg_lambda=4, silent=True)
asd=model.fit(df_x[:800], y[:800], eval_set=wa

cheers

@aaronkimmig
Copy link
Contributor

Please always provide complete code examples with your output for the reproduction of the issue.
But if I understand you correctly, the following code is not working as opposed to your first code:

import numpy as np

import pandas as pd
from xgboost import XGBClassifier
from smac.configspace import ConfigurationSpace
from ConfigSpace.hyperparameters import UniformFloatHyperparameter
from smac.scenario.scenario import Scenario
from smac.facade.smac_facade import SMAC

#randomly generated data
df_x=pd.DataFrame({"a":np.random.randn(1000),  "c":np.random.rand(1000)})
y=np.random.randint(0, 2, size=1000)

watchlist = [(df_x[:800],y[:800]), (df_x[800:],  y[800:])]



def xgb_opt(cfg):

    cfg = {k : cfg[k] for k in cfg if cfg[k]}
    print (cfg)


    model=XGBClassifier(learning_rate=0.02, max_depth=10, subsample=0.9, n_estimators=50,
                 colsample_bytree=0.9, objective='binary:logistic',
                 seed=99,scale_pos_weight=1.4, min_child_weight=7, 
                 reg_lambda=4,
                 silent=True)
    asd=model.fit(df_x[:800], y[:800], eval_set=watchlist)
    #asd=model.fit(df_x[:800], y[:800], eval_set=watchlist, verbose=False,
    #              eval_metric='auc',
    #              early_stopping_rounds=20)
    print ("this score=" , asd.best_score)

    return 1-asd.best_score

cs = ConfigurationSpace()
lambda_par=UniformFloatHyperparameter("lambda", 2,7, default_value=4.5)
cs.add_hyperparameters([lambda_par])

scenario = Scenario({"run_obj": "runtime",   # we optimize quality (alternatively runtime)
                 "runcount-limit": 2,  # maximum function evaluations
                 "cs": cs,               # configuration space
                 "deterministic": "true",
                 "cutoff_time": 10
                 })

smac = SMAC(scenario=scenario, rng=np.random.RandomState(42),
    tae_runner=xgb_opt)

incumbent = smac.optimize()

This produces the following output for me:

stdout:

{'lambda': 4.5}
[0]	validation_0-error:0.41125	validation_1-error:0.54
[1]	validation_0-error:0.40625	validation_1-error:0.51
[2]	validation_0-error:0.3975	validation_1-error:0.53
[3]	validation_0-error:0.38875	validation_1-error:0.545
[4]	validation_0-error:0.3975	validation_1-error:0.525
[5]	validation_0-error:0.4	validation_1-error:0.535
[6]	validation_0-error:0.40625	validation_1-error:0.51
[7]	validation_0-error:0.405	validation_1-error:0.54
[8]	validation_0-error:0.41125	validation_1-error:0.53
[9]	validation_0-error:0.41375	validation_1-error:0.525
[10]	validation_0-error:0.4125	validation_1-error:0.525
[11]	validation_0-error:0.4025	validation_1-error:0.54
[12]	validation_0-error:0.40625	validation_1-error:0.53
[13]	validation_0-error:0.4075	validation_1-error:0.525
[14]	validation_0-error:0.4	validation_1-error:0.525
[15]	validation_0-error:0.4025	validation_1-error:0.525
[16]	validation_0-error:0.3975	validation_1-error:0.54
[17]	validation_0-error:0.39125	validation_1-error:0.545
[18]	validation_0-error:0.38875	validation_1-error:0.555
[19]	validation_0-error:0.38625	validation_1-error:0.56
[20]	validation_0-error:0.39	validation_1-error:0.55
[21]	validation_0-error:0.38375	validation_1-error:0.55
[22]	validation_0-error:0.38625	validation_1-error:0.56
[23]	validation_0-error:0.3875	validation_1-error:0.545
[24]	validation_0-error:0.38625	validation_1-error:0.555
[25]	validation_0-error:0.38375	validation_1-error:0.545
[26]	validation_0-error:0.3775	validation_1-error:0.54
[27]	validation_0-error:0.37625	validation_1-error:0.55
[28]	validation_0-error:0.37625	validation_1-error:0.55
[29]	validation_0-error:0.375	validation_1-error:0.555
[30]	validation_0-error:0.3725	validation_1-error:0.555
[31]	validation_0-error:0.37625	validation_1-error:0.56
[32]	validation_0-error:0.37625	validation_1-error:0.565
[33]	validation_0-error:0.37375	validation_1-error:0.565
[34]	validation_0-error:0.375	validation_1-error:0.565
[35]	validation_0-error:0.37625	validation_1-error:0.555
[36]	validation_0-error:0.37125	validation_1-error:0.55
[37]	validation_0-error:0.36875	validation_1-error:0.56
[38]	validation_0-error:0.3625	validation_1-error:0.56
[39]	validation_0-error:0.36125	validation_1-error:0.555
[40]	validation_0-error:0.36375	validation_1-error:0.56
[41]	validation_0-error:0.36	validation_1-error:0.555
[42]	validation_0-error:0.36	validation_1-error:0.54
[43]	validation_0-error:0.36625	validation_1-error:0.545
[44]	validation_0-error:0.365	validation_1-error:0.545
[45]	validation_0-error:0.36	validation_1-error:0.55
[46]	validation_0-error:0.365	validation_1-error:0.555
[47]	validation_0-error:0.3625	validation_1-error:0.545
[48]	validation_0-error:0.36375	validation_1-error:0.535
[49]	validation_0-error:0.365	validation_1-error:0.55

stderr:

Process pynisher function call:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/pynisher-0.4.2-py3.6.egg/pynisher/limit_function_call.py", line 83, in subprocess_func
    return_value = ((func(*args, **kwargs), 0))
  File "opt_xgboost.py", line 33, in xgb_opt
    print ("this score=" , asd.best_score)
AttributeError: 'XGBClassifier' object has no attribute 'best_score'
Traceback (most recent call last):
  File "opt_xgboost.py", line 51, in <module>
    incumbent = smac.optimize()
  File "/home/aaron/Dokumente/Studium/17-18-WiSe/UniJob/SMAC/smac/facade/smac_facade.py", line 448, in optimize
    incumbent = self.solver.run()
  File "/home/aaron/Dokumente/Studium/17-18-WiSe/UniJob/SMAC/smac/optimizer/smbo.py", line 172, in run
    self.start()
  File "/home/aaron/Dokumente/Studium/17-18-WiSe/UniJob/SMAC/smac/optimizer/smbo.py", line 145, in start
    self.incumbent = self.initial_design.run()
  File "/home/aaron/Dokumente/Studium/17-18-WiSe/UniJob/SMAC/smac/initial_design/single_config_initial_design.py", line 86, in run
    "0"))
  File "/home/aaron/Dokumente/Studium/17-18-WiSe/UniJob/SMAC/smac/tae/execute_ta_run.py", line 220, in start
    raise FirstRunCrashedException("First run crashed, abort. "
smac.tae.execute_ta_run.FirstRunCrashedException: First run crashed, abort. Please check your setup -- we assume that your defaultconfiguration does not crashes. (To deactivate this exception, use the SMAC scenario option 'abort_on_first_run_crash')

This means that there is an issue with your target algorithm call. Indeed, when I call xgb_opt({ 'lambda': 4.5 }) an exception is thrown.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants