Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training NetVLAD++ process killed #28

Closed
GriesserP opened this issue Sep 21, 2021 · 2 comments
Closed

Training NetVLAD++ process killed #28

GriesserP opened this issue Sep 21, 2021 · 2 comments

Comments

@GriesserP
Copy link

Hello,
while executing: from the python src/main.py --SoccerNet_path=my/path/to/soccernet from the SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling folder, the process get an out of memory kill signal from the kernel. This doesn't happen with the reduced features i.e. with --features ResNET152_TF2_PCA512.npy.
The signal is sent when trying to run the line 131 of the SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling/src/dataset.py which is called by the line 24 in SoccerNetv2-DevKit/Task1-ActionSpotting/TemporallyAwarePooling/src/main.py

Here is the log from my /var/log/syslog:
Sep 21 11:26:03 MS-7B79 kernel: [440320.649437] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1004.slice/session-752.scope,task=python,pid=126284,uid=1004
Sep 21 11:26:03 MS-7B79 kernel: [440320.649479] Out of memory: Killed process 126284 (python) total-vm:55748164kB, anon-rss:31574096kB, file-rss:0kB, shmem-rss:4kB, UID:1004 pgtables:63392kB oom_score_adj:0
Sep 21 11:26:03 MS-7B79 kernel: [440321.292489] oom_reaper: reaped process 126284 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

My config is:
OS: Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-84-generic x86_64)
Ram: 32GB
GPU: NVIDIA GeForce RTX 2080

Maybe I simply don't have enough ram ? Do you have any suggestion ?

@SilvioGiancola
Copy link
Owner

Hi @GriesserP ,

I believe your issue is simply that you have not enough RAM, I was running with 60 or 90GB of RAM, I cannot recall exactly.

A first attempt to solve your issue would be for you to try reducing the batch_size to something smaller (--batch_size=256 by default).

Another reason for such high memory consumption may come from the dataloader, that pre-processes all games by extracting clips to train on (see that class). You could either read those clips in the __getitem__, but that will be extremely slow. Alternatively you could consider less games to train on.

I hope that helps, and sorry NetVLAD++ is that RAM-hungry :)

@GriesserP
Copy link
Author

Thank you very much for your answer! I will try to bypass the issue with your suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants