You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using GPU in script?: Yes, multi GeForce RTX 2080 Ti GPUs
NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2
I use os.environ["CUDA_VISIBLE_DEVICES"]="6,7" to choose GPUs and everything else in the code is pretty straightforward with using BertClassifier() as model. I am able to run it with CPU with no such issue.
I had some issue with Transformers then I resolved it by actually removing the bits of code that sets up DataParallel, huggingface/transformers#10634. I am still not sure why this happens.
0it [00:00, ?it/s]Building sklearn text classifier...
Loading bert-base-uncased model...
Defaulting to linear classifier/regressor
Loading Pytorch checkpoint
train data size: 1320, validation data size: 146
Training : 0%| | 0/42 [00:09<?, ?it/s]
0it [00:27, ?it/s] | 0/42 [00:00<?, ?it/s]
Traceback (most recent call last):
File "seg_pred_skl.py", line 46, in <module>
model.fit(X_train, y_train)
File "/mnt/sdb/env1/lib/python3.6/site-packages/bert_sklearn/sklearn.py", line 374, in fit
self.model = finetune(self.model, texts_a, texts_b, labels, config)
File "/mnt/sdb/env1/lib/python3.6/site-packages/bert_sklearn/finetune.py", line 121, in finetune
loss, _ = model(*batch)
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/sdb/env1/lib/python3.6/site-packages/bert_sklearn/model/model.py", line 95, in forward
output_all_encoded_layers=False)
File "/mnt/sdb/env1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/sdb/env1/lib/python3.6/site-packages/bert_sklearn/model/pytorch_pretrained/modeling.py", line 959, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration
The text was updated successfully, but these errors were encountered:
transformers
version: 4.3.3I use
os.environ["CUDA_VISIBLE_DEVICES"]="6,7"
to choose GPUs and everything else in the code is pretty straightforward with usingBertClassifier()
as model. I am able to run it with CPU with no such issue.I had some issue with Transformers then I resolved it by actually removing the bits of code that sets up
DataParallel
, huggingface/transformers#10634. I am still not sure why this happens.The text was updated successfully, but these errors were encountered: