scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered #240

ThomasWolf0701 · 2020-11-23T09:41:35Z

Describe the bug
When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))
RuntimeError: CUDA error: device-side assert triggered

What is the current behavior?
It works when the matrix I use contains only integers but fails with floats.
I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32,
Also set the batch size to a very low level.

If the current behavior is a bug, please provide the steps to reproduce.
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

Expected behavior

Screenshots

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)
Traceback (most recent call last):

File "", line 1, in
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 329, in fit
fit_params_steps = self._check_fit_params(**fit_params)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 248, in _check_fit_params
"=sample_weight)`.".format(pname))

ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight).

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)
No early stopping will be performed, last training weights will be used.
Traceback (most recent call last):

File "", line 1, in
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py", line 335, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 173, in fit
self._train_epoch(train_dataloader)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 349, in _train_epoch
batch_logs = self._train_batch(X, y)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py", line 384, in _train_batch
output, M_loss = self.network(X)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 276, in forward
return self.tabnet(x)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 151, in forward
out = self.feat_transformersstep

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 375, in forward
x = self.shared(x)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py", line 409, in forward
scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))

RuntimeError: CUDA error: device-side assert triggered

Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:

Additional context

The text was updated successfully, but these errors were encountered:

Optimox · 2020-11-23T09:50:29Z

Hello,

This line ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight) makes me think that you are using tabnet inside a sklearn pipeline.

Tabnet is not compatible with all sklearn pipeline, I guess that's the problem. Could you share the code you are running with TabNet?

ThomasWolf0701 · 2020-11-23T10:02:47Z

Here it is:

imputer = SimpleImputer(missing_values=np.nan,strategy='mean')
scorer = make_scorer(mean_squared_error, greater_is_better= False)

inner_cv = TimeSeriesSplit(n_splits=5)#.split(featureMatrix)
outer_cv = PredefinedHoldoutSplit(valid_indices=[range(0,100,1)]

#set the training parameters for Random Forest
paramsTab = {
'm__n_steps': randint(1,3),
'm__n_a': randint(8,64),
'm__n_d': randint(8,64),
'm__gamma': uniform(1, 1),
'm__n_shared': randint(1, 5),
'm__n_independent': randint(1, 5),
'm__momentum': loguniform(0.01, 0.4),
"m__mask_type":["sparsemax", "entmax"]
}

tab_model = TabNetRegressor(device_name = "cuda")

tab_model = Pipeline(steps=[('i', imputer),('m', tab_model)])

tab_search = RandomizedSearchCV(tab_model,scoring = scorer ,param_distributions=paramsTab, random_state=42, cv=inner_cv, verbose=5, n_jobs=1, return_train_score=True)

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)
tab_search.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))

ThomasWolf0701 · 2020-11-23T10:06:33Z

Also tried without batch_size and now i get:
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))
No early stopping will be performed, last training weights will be used.
Traceback (most recent call last):

File "", line 1, in
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values))