Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory! Could you please tell me your GPU card type? #33

Open
sijun-zhou opened this issue Jul 13, 2018 · 19 comments
Open

out of memory! Could you please tell me your GPU card type? #33

sijun-zhou opened this issue Jul 13, 2018 · 19 comments

Comments

@sijun-zhou
Copy link

sijun-zhou commented Jul 13, 2018

Hi, Huijuan @huijuan88
I am using a card of 1080Ti with 11G memory, but 2.5G was used by other students, so I was only left with 8.5G memory with GPU. But when I run the test script in ActivityNet with your provided script, only loaded one 1 video's frams(768 images), but out of memory at the step:
blobs_out = net.forward(**forward_kwargs)
"""
F0713 15:08:15.452706 22317 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
"""

so could you plz tell me what is your GPU type and how many GPUs have you used when testing and training this code?
Thanks in advance!

@sijun-zhou
Copy link
Author

I reduce the 768 images to 160 images. It is working fine with me with 8.5G memory left. But if I use 768 images nearly 5 times larger. So I guess I need 40G to 50G GPU memories. And it is difficult to run on pycaffe with multiple GPUs. Could you plz help me! I am a new to action detection. Really appreciated!

@YanYan0716
Copy link

@sijun-zhou hello, I have meet the same problem, do you solved it? and i an also a new about the action detection, thanks a lot

@sijun-zhou
Copy link
Author

@yanqian123 I used 1080 Ti *1. My problem solved when I open CUDNN for the project.

@YanYan0716
Copy link

about Makefile.config CUDNN==1 ? right?? thanks again

@YanYan0716
Copy link

@sijun-zhou about Makefile.config CUDNN==1 ? right?? thanks again

@sijun-zhou
Copy link
Author

@yanqian123 yes

@YanYan0716
Copy link

i am sorry to say,it did not work, my gpu is 1050, but when i set CUDNN==1, i could not solve my problem. could you give me some advice?

@YanYan0716
Copy link

@sijun-zhou thank you again

@sijun-zhou
Copy link
Author

@yanqian123 As far as i am remember, if you do not change batch size(700+? i don't remember it clearly). It will consume approximate 5-6G GPU memory. It obvious that 1050 cannot support it.

@viswalal
Copy link

I am getting the same out of memory error while testing.

F1115 21:40:12.954958 25933 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

Any way to handle this like reducing batch size or number of frames?
GPU - GeForce 940MX ( 4 GB Only)

@Xchangjiang
Copy link

@viswalal hello, I have meet the same problem, I think it's due to a mismatch in the number of GPUs. I only have one GPU, but it is 'GPU_ID: 1', it should be 'GPU_ID: 0' , but I can‘t find the config file, do you solved it?

@viswalal
Copy link

viswalal commented Nov 26, 2018

@Xchangjiang Hi, I also have only one GPU. For me, GPU ID is coming as 0 in log while running script_test.sh. I am not able to resolve it. While running the test, I have checked GPU usage. It is increasing and getting crashed when memory is full. I am not able to reduce the batch size. Actually not able to identify where to change it.

@huijuan88
Copy link
Collaborator

huijuan88 commented Nov 27, 2018 via email

@viswalal
Copy link

@huijuan88 , Hello.. I think the GPU ID 0 is correct for me. Since my GPU is only 4 GB it is getting crashed. I want to change the batch size for running script_test.sh like we set 'batch_size' in the network definition prototxt file. ( or maybe reducing the number of frames it loads at a time will help).

@huijuan88
Copy link
Collaborator

huijuan88 commented Nov 27, 2018 via email

@viswalal
Copy link

@huijuan88 thank you.. I will try that

@viswalal
Copy link

viswalal commented Nov 28, 2018

@huijuan88 hi, I have tried with length=256,128,64 and 32 and changed the data generation also (by editing generate_roidb_512.py and running the same) still getting the same error. I am stuck at this point.

@huijuan88
Copy link
Collaborator

huijuan88 commented Nov 30, 2018 via email

@mxguo
Copy link

mxguo commented Dec 26, 2018

@huijuan88 hi, I have tried with length=256,128,64 and 32 and changed the data generation also (by editing generate_roidb_512.py and running the same) still getting the same error. I am stuck at this point.

@viswalal hi, I also meet this problem, and the error is still there although I tried with length=256,128,64,32 and 16 in the generate_roidb_512.py, Have you solved this problem? Really appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants