You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to run the following command source setup.sh && runexp anli-part infobert roberta-base 2e-5 32 128 -1 1000 42 1e-5 5e-3 6 0.1 0 4e-2 8e-2 0 3 5e-3 0.5 0.9
But I got the following error. Traceback:
04/08/2022 19:30:17 - INFO - datasets.anli - Saving features into cached file anli_data/cached_dev_RobertaTokenizer_128_anli-part [took 0.690 s]
04/08/2022 19:30:17 - INFO - filelock - Lock 139893720074960 released on anli_data/cached_dev_RobertaTokenizer_128_anli-part.lock
04/08/2022 19:30:17 - INFO - local_robust_trainer - You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
04/08/2022 19:30:17 - INFO - local_robust_trainer - ***** Running training *****
04/08/2022 19:30:17 - INFO - local_robust_trainer - Num examples = 942069
04/08/2022 19:30:17 - INFO - local_robust_trainer - Num Epochs = 3
04/08/2022 19:30:17 - INFO - local_robust_trainer - Instantaneous batch size per device = 32
04/08/2022 19:30:17 - INFO - local_robust_trainer - Total train batch size (w. parallel, distributed & accumulation) = 32
04/08/2022 19:30:17 - INFO - local_robust_trainer - Gradient Accumulation steps = 1
04/08/2022 19:30:17 - INFO - local_robust_trainer - Total optimization steps = 88320
Iteration: 0%| | 0/29440 [00:00<?, ?it/s]
Epoch: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "./run_anli.py", line 395, in <module>
main()
File "./run_anli.py", line 239, in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File "/root/InfoBERT/ANLI/local_robust_trainer.py", line 731, in train
full_loss, loss_dict = self._adv_training_step(model, inputs, optimizer)
File "/root/InfoBERT/ANLI/local_robust_trainer.py", line 1031, in _adv_training_step
outputs = model(**inputs)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/roberta.py", line 345, in forward
inputs_embeds=inputs_embeds,
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/bert.py", line 822, in forward
output_hidden_states=output_hidden_states,
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/bert.py", line 494, in forward
output_attentions,
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/bert.py", line 416, in forward
hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/bert.py", line 347, in forward
hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/InfoBERT/ANLI/models/bert.py", line 239, in forward
mixed_query_layer = self.query(hidden_states)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Traceback (most recent call last):
File "/root/miniconda3/envs/infobert/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/root/miniconda3/envs/infobert/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/infobert/bin/python', '-u', './run_anli.py', '--local_rank=0', '--model_name_or_path', 'roberta-base', '--task_name', 'anli-part', '--do_train', '--do_eval', '--data_dir', 'anli_data', '--max_seq_length', '128', '--per_device_train_batch_size', '32', '--learning_rate', '2e-5', '--max_steps', '-1', '--warmup_steps', '1000', '--weight_decay', '1e-5', '--seed', '42', '--beta', '5e-3', '--logging_dir', 'infobert-roberta-base-anli-part-sl128-lr2e-5-bs32-ts-1-ws1000-wd1e-5-seed42-beta5e-3-alpha5e-3--cl0.5-ch0.9-alr4e-2-amag8e-2-anm0-as3-hdp0.1-adp0-version6', '--output_dir', 'infobert-roberta-base-anli-part-sl128-lr2e-5-bs32-ts-1-ws1000-wd1e-5-seed42-beta5e-3-alpha5e-3--cl0.5-ch0.9-alr4e-2-amag8e-2-anm0-as3-hdp0.1-adp0-version6', '--version', '6', '--evaluate_during_training', '--logging_steps', '500', '--save_steps', '500', '--hidden_dropout_prob', '0.1', '--attention_probs_dropout_prob', '0', '--overwrite_output_dir', '--adv_lr', '4e-2', '--adv_init_mag', '8e-2', '--adv_max_norm', '0', '--adv_steps', '3', '--alpha', '5e-3', '--cl', '0.5', '--ch', '0.9']' returned non-zero exit status 1.
Do you know how to fix this?
Thank you so much.
Other Information:
OS: Ubuntu 20.04.3 LTS
GPU: NVIDIA A100
Python 3.7.13
The text was updated successfully, but these errors were encountered:
Hi, I'm trying to run the following command
source setup.sh && runexp anli-part infobert roberta-base 2e-5 32 128 -1 1000 42 1e-5 5e-3 6 0.1 0 4e-2 8e-2 0 3 5e-3 0.5 0.9
But I got the following error.
Traceback:
Do you know how to fix this?
Thank you so much.
Other Information:
The text was updated successfully, but these errors were encountered: