-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bus error (core dumped) #174
Comments
Which DGL version you are using? |
it's 0.4.3 follow the official guideline https://dglke.dgl.ai/doc/install.html |
Can you run |
Can you do some simple math to see if you have enough memory to hold the embeddings? |
i think memory is enough. DGLBACKEND=pytorch dglke_train --model_name TransE_l2 --dataset patient --batch_size 1000 --neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 24000 --log_interval 100 --batch_size_eval 16 -adv --regularization_coef 1.00E-09 --test --gpu 0 1 2 3 --mix_cpu_gpu --data_path ./data/ --format raw_udd_hrt --data_files train.txt valid.txt test.txt --neg_sample_size_eval 10000 |
Then, it maybe related to multi-gpu implementation. Did you try pytorch 1.6? |
Thank you. |
i tried torch 1.6 cuda 10.2 and python 3.8. it's still the same error. so what other reason maybe cause to this issue? |
Can you try run it with GDB and use backtrace to see where cause the crash? |
Bus error usually means your shared memory size is not big enough. If you are using docker, please pass |
DGLBACKEND=pytorch dglke_train --model_name TransE_l2 --dataset patient --batch_size 1000 --neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 24000 --log_interval 100 --batch_size_eval 16 -adv --regularization_coef 1.00E-09 --test --gpu 0 1 2 3 --mix_cpu_gpu --data_path ./data/ --format raw_udd_hrt --data_files train.txt valid.txt test.txt --neg_sample_size_eval 10000
does this error means out of memory?
The text was updated successfully, but these errors were encountered: