Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training error on klue-dp task #6

Open
pion0926 opened this issue Dec 3, 2021 · 0 comments
Open

Training error on klue-dp task #6

pion0926 opened this issue Dec 3, 2021 · 0 comments

Comments

@pion0926
Copy link

pion0926 commented Dec 3, 2021

Abstract(요약) 🔥

run-all.sh multi gpu 실행 시 일부 task(dependency parsing)가 정상적으로 작동하지 않습니다.

error-message:

RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

How to Reproduce(재현 방법) 🤔

[python==3.7.11]

git clone --recursive https://github.com/KLUE-benchmark/KLUE-Baseline.git
pip install -r requirements.txt
pip install torch==1.7.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html (cuda version matching with torch)

run-all.sh 수정:
KLUE-DP
task="klue-dp"

python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 5e-5 --num_train_epochs 15 --gradient_accumulation_steps 1 --warmup_ratio 0.2 --train_batch_size 32 --patience 10000 --max_seq_length 256 --metric_key uas_macro_f1 --gpus 0 --num_workers 4

->

python run_klue.py train --task ${task} --output_dir ${OUTPUT_DIR} --data_dir ${DATA_DIR}/${task}-${VERSION} --model_name_or_path klue/roberta-large --learning_rate 3e-5 --num_train_epochs 10 --train_batch_size 16 --eval_batch_size 16 --max_seq_length 510 --gradient_accumulation_steps 2 --warmup_ratio 0.2 --weight_decay 0.01 --max_grad_norm 1.0 --patience 100000 --metric_key slot_micro_f1 --gpus 1 2 3 --num_workers 8

bash run-all.sh

RuntimeError: The size of tensor a (23) must match the size of tensor b (25) at non-singleton dimension 2

How to solve (어떻게 해결할 수 있을까요) 🙋‍♀

single GPU에선 메모리 부족으로 roBERTa-Large 모델로 학습이 불가하여
혹시 도움 받을 수 있을까 싶어 문의드립니다!

감사합니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant