New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOINC may not use all CPUs in some cases #1775
Comments
There are also two cases when GPU also may not have work. I am not sure if they should go to this issue, but they looks related:
This was observed on previous Windows BOINC version (do not remember exactly - 7.6.23?). I did not try to reproduce it on current version. |
If possible, see if you can reproduce scheduling problems on the BOINC Client Emulator:
http://boinc.berkeley.edu/dev/sim_web.php
This makes it 100x easier for me to fix them.
…-- David
On 1/30/2017 9:47 PM, sirzooro wrote:
There are also two cases when GPU also may not have work. I am not sure if they
should go to this issue, but they looks related:
* on systems with multiple GPUs some of them may not get work if all CPUs are
busy. Details and logs from someone with 4 Titans are here
<https://boinc.berkeley.edu/dev/forum_thread.php?id=10746>. I also had similar
problem with my 2 GPUs, and fixed it in the same way - created app_config for
GPU apps to reduce requires CPU to small value like 0.01;-
* similar problem also exists with GPU tasks which needs multiple GPUs. Moo!
Wrapper projects sends such WUs, it sent me ones which needed both of my 2
GPU. For some reason presence of such tasks in work queue also was a problem
for scheduler, sometimes it also assigned work for only 1 GPU. All other GPU
apps were configured to use small fractional CPU part, so it looks like
something related to these Moo! Wrapper tasks. When I finished crunching all
downloaded WUs, BOINC started working as expected again.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1775 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AA8KgWSPEOjrUJ2BgRTGHl5ucqvCqr8sks5rXssNgaJpZM4Lwvfr>.
|
Thanks for link. I will try to play with it a bit. |
This also happens when for one project is used <max_concurrent> option in app_config.xml to limit the number of tasks to run simultaneously . If BOINC has plenty of workunits for such project, it doesn't request more work from others and some cores remain dry |
One more issue, just reported on WUProp forum:
|
I see the same as sorosc on LHC, as they have a job limit of 24. If I set on this project to unlimited for the Sixtrack app then it will queue based on the cache settings of BOINC, if I set to 24 then it queues no task it just runs upto that limit and when one task is finished it gets another. |
One more case (maybe duplicate of some already mentioned one): DENIS performs some maintenance work now and it sends WUs, but input files cannot be downloaded so WUs ends with "download error". This somehow prevents downloading WUs from Asteroids - my backup project. I saw this in log when I tried to manually update project to download new WUs:
Looks that these faulty DENIS WUs prevented downloads of other ones from backup project. I had 16 of them in the queue. Remaining 16 CPUs were getting WUs from Asteroids as expected. This was on BOINC 7.6.22 for Linux. |
One one case, this one is interesting. I am crunching "GFN-13 Prime Search" from "PRIVATE GFN SERVER" (run by stream, https://www.primegrid.com/forum_thread.php?id=6511). One of results for completed WU could not be uploaded, and somehow it prevented downloading of new WUs from this project - BOINC client switched to backup project. This is what I found in log:
I have aborted this upload and requested project update. After doing this new WUs were downloaded without problem:
I am not sure if this is problem with client or server, it may be on either side. |
Another example here is if a task goes to the state VM unmanagble it depletes the queues tasks and just sits there with 1 bad task till you abort the it reloads n tasks |
And next one: I configured one project via app_config.xml to use 22 out of 32 cores. Remaining 10 were left for another project with very short tasks. That 2nd project also has very limited WU supply, so BOINC was not able to build buffer for it. As a result BOINC kept downloading tasks from 1st project until it filled work queue. At this point it stopped trying to download tasks from 2nd project because queue was full, so 10 cores reserved for it were idle. |
Can you reproduce this on the client emulator? |
My PC became VM unmanagable, here is sim with the required files, I didn't look to see if the SIM was blocked? https://boinc.berkeley.edu/dev/sim_web.php?action=show_simulation&scen=154&sim=0 Here is one with 24 job limit |
I am cleaning up my work queue before next PrimeGrid challenge, and found case when BOINC Client does not run tasks on all available cores. Now I have 3 rosetta@home tasks running on 3 out of 8 available CPUs. There are also some ATLAS@Home and Cosmology@Home tasks waiting, but they require 7 or 8 CPUs per WU. Most projects are now set to not download new tasks, except for one with zero resource usage set. It looks that BOINC only checks if there are some other tasks available in the queue and do not try to download new ones from project with zero resource usage set when there are some downloaded tasks waiting. This is wrong, it should also check required CPU count for them and compare it with current free CPU count to eliminate cases like this.
I suspect that other similar cases may also exists, e.g. when some tasks are waiting but there is not enough memory to run them, please take a look on them too.
Windows 10 64bit, BOINC 7.6.33
Edit: there is one more case. I suspended rosetta project and BOINC started crunching one Cosmology WU. It finished it and started ATLAS WU. It required more memory so it stopped working (status is Waiting for memory). Now BOINC does not use any CPU (except for small fraction reserved for GPU and NCI tasks), even if there are other Cosmology tasks ready to start.
The text was updated successfully, but these errors were encountered: