Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during training #18

Closed
atercygnus opened this issue Nov 5, 2020 · 3 comments
Closed

Error during training #18

atercygnus opened this issue Nov 5, 2020 · 3 comments

Comments

@atercygnus
Copy link

atercygnus commented Nov 5, 2020

I'm trying to use automl_alex to make a baseline for this task, but this triggered an error.

my code is:

import pandas as pd
from automl_alex import AutoMLRegressor

X_train = df[df.columns.difference(['y'], sort=False)]
y_train = df.y
X_test = pd.read_csv('./data/test.csv', index_col='ID')

model = AutoMLRegressor(X_train, y_train, X_test, cat_features=X_train.columns, verbose=1)

%%time
predict_test, predict_train = model.fit_predict(verbose=2)

Step 1: Model 0


100%|██████████| 1/1 [00:14<00:00, 14.74s/it]

Model 1
One iteration takes ~ 4.5 sec

Start Auto calibration parameters
[I 2020-11-05 11:35:57,966] A new study created in memory with name: no-name-ecd42a72-44ac-442e-9b71-d621e2a383ea
Start optimization with the parameters:
CV_Folds = 5
Score_CV_Folds = 2
Feature_Selection = True
Opt_lvl = 2
Cold_start = 44.0
Early_stoping = 100
Metric = mean_squared_error
Direction = minimize
##################################################
Default model OptScore = 109.2303
Optimize: : 55it [20:47, 22.68s/it, | Model: ExtraTrees | OptScore: 105.3311 | Best mean_squared_error: 88.854 +- 16.477105]

Predict from Models_1
100%|██████████| 3/3 [01:27<00:00, 29.22s/it]
0%| | 0/1 [00:00<?, ?it/s]

Calc predict policy on Models_1:
| posible_repeats: 0 | stack_top: 1 | n_repeats: 1
100%|██████████| 1/1 [02:28<00:00, 148.44s/it]

Mean Score mean_squared_error on 5 Folds: 76.0092 std: 15.236319

Models_1 Mean mean_squared_error Score Train: 76.0097

Model 2

One iteration takes ~ 10.6 sec

Start Auto calibration parameters
Start optimization with the parameters:
CV_Folds = 5
Score_CV_Folds = 1
Feature_Selection = True
Opt_lvl = 1
Cold_start = 10
Early_stoping = 50
Metric = mean_squared_error
Direction = minimize
##################################################
Default model OptScore = 75.9577
Optimize: : 16it [02:44, 24.17s/it, | Model: MLP | OptScore: 109.1348 | Best mean_squared_error: 109.1348 ]

stack trace is:
/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: overflow encountered in matmul
ret = a @ b
/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: invalid value encountered in matmul
ret = a @ b
Trial 16 failed because of the following error: ValueError("Input contains NaN, infinity or a value too large for dtype('float64').")
Traceback (most recent call last):
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/optuna/study.py", line 799, in _run_trial
result = func(trial)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 420, in objective
**data_kwargs,
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 762, in cross_val_score
res = self.cross_val(predict=False,**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 700, in cross_val
y_test=val_y.reset_index(drop=True),
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/sklearn_models.py", line 88, in _fit
model.model.fit(X_train, y_train,)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 641, in fit
return self._fit(X, y, incremental=False)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 371, in _fit
intercept_grads, layer_units, incremental)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 554, in _fit_stochastic
self._update_no_improvement_count(early_stopping, X_val, y_val)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 597, in update_no_improvement_count
self.validation_scores
.append(self.score(X_val, y_val))
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/base.py", line 552, in score
return r2_score(y, y_pred, sample_weight=sample_weight)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 589, in r2_score
y_true, y_pred, multioutput)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 86, in _check_reg_targets
y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

python: 3.7.4
ubuntu: Ubuntu 18.04.5 LTS
packages installed:
alembic==1.4.3 argon2-cffi==20.1.0 async-generator==1.10 attrs==20.2.0 automl-alex==0.10.7 backcall==0.2.0 bleach==3.2.1 catboost==0.24.2 category-encoders==2.2.2 certifi==2020.6.20 cffi==1.14.3 chardet==3.0.4 cliff==3.4.0 cmaes==0.7.0 cmd2==1.3.11 colorama==0.4.4 colorlog==4.4.0 cycler==0.10.0 decorator==4.4.2 defusedxml==0.6.0 entrypoints==0.3 graphviz==0.14.2 idna==2.10 importlib-metadata==2.0.0 ipykernel==5.3.4 ipython==7.19.0 ipython-genutils==0.2.0 jedi==0.17.2 Jinja2==2.11.2 joblib==0.17.0 json5==0.9.5 jsonschema==3.2.0 jupyter-client==6.1.7 jupyter-core==4.6.3 jupyterlab==2.2.9 jupyterlab-pygments==0.1.2 jupyterlab-server==1.2.0 kiwisolver==1.3.1 lightgbm==3.0.0 Mako==1.1.3 MarkupSafe==1.1.1 matplotlib==3.3.2 mistune==0.8.4 nbclient==0.5.1 nbconvert==6.0.7 nbformat==5.0.8 nest-asyncio==1.4.2 notebook==6.1.4 numpy==1.19.4 optuna==2.2.0 packaging==20.4 pandas==1.1.4 pandocfilters==1.4.3 parso==0.7.1 patsy==0.5.1 pbr==5.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.0.1 plotly==4.12.0 prettytable==0.7.2 prometheus-client==0.8.0 prompt-toolkit==3.0.8 ptyprocess==0.6.0 pycparser==2.20 Pygments==2.7.2 pyparsing==2.4.7 pyperclip==1.8.1 pyrsistent==0.17.3 python-dateutil==2.8.1 python-editor==1.0.4 pytz==2020.4 PyYAML==5.3.1 pyzmq==19.0.2 requests==2.24.0 retrying==1.3.3 scikit-learn==0.23.2 scipy==1.5.3 seaborn==0.11.0 Send2Trash==1.5.0 six==1.15.0 SQLAlchemy==1.3.20 statsmodels==0.12.1 stevedore==3.2.2 terminado==0.9.1 testpath==0.4.4 threadpoolctl==2.1.0 tornado==6.1 tqdm==4.51.0 traitlets==5.0.5 urllib3==1.25.11 wcwidth==0.2.5 webencodings==0.5.1 xgboost==1.2.1 zipp==3.4.0

@Alex-Lekov
Copy link
Owner

can you send a link to the dataset or the kaggle notebook itself?

@Alex-Lekov
Copy link
Owner

I can't reproduce the error without the data itself and the environment.

@Alex-Lekov
Copy link
Owner

fix in v0.11.24 #19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants