Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GA3C source code has High CPU usage causing System freeze or crash #20

Open
developeralgo8888 opened this issue Mar 19, 2017 · 6 comments

Comments

@developeralgo8888
Copy link

developeralgo8888 commented Mar 19, 2017

The code runs fine but leaks CPU and Memory and will crush your system . I am using Glances diagnostic or monitoring tool ( pip install glances ) . You will notice that if you leave your code running for a long time the CPU context switches increases substantially and the CPU & Memory keeps increasing until your code hangs or crushes . CPU usage increased from 6.7% to 64% and Memory from 10% to 79% at that point it caused the system freeze. When i look at the Nvidia TITAN X ( Maxwell --12 GB mem) usage it is only using about 300 MB out 12 GB. So it seems while most of the heavy lifting should be offloaded to the GPU in this case it does not seem to be the case. I have 8 x TITAN Maxwell GPUs with 2 x Intel Xeon 2660 v3
(2 CPU with total 40 CPU Cores ) with 128GB of DDR4 memory and i can use any of them . Still i get same results , the CPU will keep increasing

Any insights?

Other original A3C or various hybrid ( CPU & GPU ) versions seem to offload most of the heavy lifting to GPU and causes no system freezes but not with GA3C

Testing it on various amounts of data and games

@mbz
Copy link
Contributor

mbz commented Mar 19, 2017

That's an interesting observation. I've tested the code on a Maxwell TITAN X myself and didn't observe such behavior. Can you please share the version of your libraries (python, TensorFlow, cuda, ...) . My (blind) guess is that this is a problem with TensorFlow. It would be great if you share your Motherboard spec since PCI-E is the bottleneck here.

Two side notes:

  1. The low memory usage is due small model size. Please note that neither A3C nor GA3C have any "experience memory" so they do not utilize GPU memory as an storage and the only stored object is the model itself. But I will be interested in your GPU-utilization (check with nvidia-smi command).
  2. The current version of the code is single-GPU so you currently cannot utilize more than one-GPU.

@ifrosio
Copy link
Collaborator

ifrosio commented Mar 19, 2017 via email

@mbz
Copy link
Contributor

mbz commented Mar 19, 2017

@ifrosio that's a very good point. @developeralgo8888 please try with DYNAMIC_SETTINGS=False

@developeralgo8888
Copy link
Author

High_CPU_and_memory.txt

@developeralgo8888
Copy link
Author

Please find attached. i restarted the run and it has started increasing as we go .

@developeralgo8888
Copy link
Author

with DYNAMIC_SETTINGS=False ,

The CPU remains stable but you do have memory leak . The memory keeps increasing until the system freeze

i have attached the snapshots which are roughly 12 hours apart
High_CPU_and_memory.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants