Skip to content

Floating point issue when choosing holdout set train size #538

@kaiweiang

Description

@kaiweiang

This line of code will introduce floating point issue, e.g., when train_size=0.7 will yield 0.30000000000000004 instead of 0.3, thus causing the following error message

[ERROR] [2018-08-27 16:34:59,972:AutoML(1):cb3d3adc8ee73def12eda5903e98a85d] Error creating dummy predictions: {'traceback': 'Traceback (most recent call last):\n  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/evaluation/__init__.py", line 30, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 644, in eval_holdout\n    init_params=init_params\n  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 93, in __init__\n    self.cv = self.get_splitter(self.datamanager)\n  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 568, in get_splitter\n    raise e\n  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py", line 562, in get_splitter\n    next(test_cv.split(y, y))\n  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1204, in split\n    for train, test in self._iter_indices(X, y, groups):\n  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1534, in _iter_indices\n    self.train_size)\n  File "/root/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1710, in _validate_shuffle_split\n    \'train_size.\' % (n_train + n_test, n_samples))\nValueError: The sum of train_size and test_size = 17221, should be smaller than the number of samples 17220. Reduce test_size and/or train_size.\n', 'error': "ValueError('The sum of train_size and test_size = 17221, should be smaller than the number of samples 17220. Reduce test_size and/or train_size.',)", 'configuration_origin': 'DUMMY'} 
[ERROR] [2018-08-27 16:35:01,245:__main__] Unexpected Exception
Traceback (most recent call last):
  File "/var/www/html/MC_PY/mc_modeling/modeling.py", line 26, in main
    alg.fit()
  File "/var/www/html/MC_PY/mc_modules/mc_automl/base.py", line 456, in fit
    self.train()
  File "/var/www/html/MC_PY/mc_modules/mc_automl/classification.py", line 60, in train
    self.automl.fit(self.X_train.copy(), self.y_train.copy(), metric=self.metric, feat_type=self.feature_type)
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/estimators.py", line 466, in fit
    dataset_name=dataset_name,
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/estimators.py", line 248, in fit
    self._automl.fit(*args, **kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/automl.py", line 954, in fit
    only_return_configuration_space=only_return_configuration_space,
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/automl.py", line 199, in fit
    only_return_configuration_space,
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/automl.py", line 461, in _fit
    _proc_smac.run_smbo()
  File "/root/anaconda3/lib/python3.6/site-packages/autosklearn/smbo.py", line 501, in run_smbo
    smac.optimize()
  File "/root/anaconda3/lib/python3.6/site-packages/smac/facade/smac_facade.py", line 400, in optimize
    incumbent = self.solver.run()
  File "/root/anaconda3/lib/python3.6/site-packages/smac/optimizer/smbo.py", line 180, in run
    challengers = self.choose_next(X, Y)
  File "/root/anaconda3/lib/python3.6/site-packages/smac/optimizer/smbo.py", line 247, in choose_next
    incumbent_value = self.runhistory.get_cost(self.incumbent)
  File "/root/anaconda3/lib/python3.6/site-packages/smac/runhistory/runhistory.py", line 271, in get_cost
    config_id = self.config_ids[config]
KeyError: None

My current workaround that prevents this issue is

accuracy = 2 #will only work when the train_size has 2 or less accuracy, it can be increased if you want to
n = 10**accuracy
tr_sz = float(str(train_size)[:2+accuracy])*n
test_size = (n-tr_sz)/n

Can this be fixed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions