BOINC may not use all CPUs in some cases #1775

sirzooro · 2017-01-29T09:06:10Z

I am cleaning up my work queue before next PrimeGrid challenge, and found case when BOINC Client does not run tasks on all available cores. Now I have 3 rosetta@home tasks running on 3 out of 8 available CPUs. There are also some ATLAS@Home and Cosmology@Home tasks waiting, but they require 7 or 8 CPUs per WU. Most projects are now set to not download new tasks, except for one with zero resource usage set. It looks that BOINC only checks if there are some other tasks available in the queue and do not try to download new ones from project with zero resource usage set when there are some downloaded tasks waiting. This is wrong, it should also check required CPU count for them and compare it with current free CPU count to eliminate cases like this.

I suspect that other similar cases may also exists, e.g. when some tasks are waiting but there is not enough memory to run them, please take a look on them too.

Windows 10 64bit, BOINC 7.6.33

Edit: there is one more case. I suspended rosetta project and BOINC started crunching one Cosmology WU. It finished it and started ATLAS WU. It required more memory so it stopped working (status is Waiting for memory). Now BOINC does not use any CPU (except for small fraction reserved for GPU and NCI tasks), even if there are other Cosmology tasks ready to start.

sirzooro · 2017-01-31T05:47:57Z

There are also two cases when GPU also may not have work. I am not sure if they should go to this issue, but they looks related:

on systems with multiple GPUs some of them may not get work if all CPUs are busy. Details and logs from someone with 4 Titans are here. I also had similar problem with my 2 GPUs, and fixed it in the same way - created app_config for GPU apps to reduce requires CPU to small value like 0.01;-
similar problem also exists with GPU tasks which needs multiple GPUs. Moo! Wrapper projects sends such WUs, it sent me ones which needed both of my 2 GPU. For some reason presence of such tasks in work queue also was a problem for scheduler, sometimes it also assigned work for only 1 GPU. All other GPU apps were configured to use small fractional CPU part, so it looks like something related to these Moo! Wrapper tasks. When I finished crunching all downloaded WUs, BOINC started working as expected again.

This was observed on previous Windows BOINC version (do not remember exactly - 7.6.23?). I did not try to reproduce it on current version.

davidpanderson · 2017-01-31T07:58:07Z

If possible, see if you can reproduce scheduling problems on the BOINC Client Emulator: http://boinc.berkeley.edu/dev/sim_web.php This makes it 100x easier for me to fix them.

…

-- David

On 1/30/2017 9:47 PM, sirzooro wrote: There are also two cases when GPU also may not have work. I am not sure if they should go to this issue, but they looks related: * on systems with multiple GPUs some of them may not get work if all CPUs are busy. Details and logs from someone with 4 Titans are here <https://boinc.berkeley.edu/dev/forum_thread.php?id=10746>. I also had similar problem with my 2 GPUs, and fixed it in the same way - created app_config for GPU apps to reduce requires CPU to small value like 0.01;- * similar problem also exists with GPU tasks which needs multiple GPUs. Moo! Wrapper projects sends such WUs, it sent me ones which needed both of my 2 GPU. For some reason presence of such tasks in work queue also was a problem for scheduler, sometimes it also assigned work for only 1 GPU. All other GPU apps were configured to use small fractional CPU part, so it looks like something related to these Moo! Wrapper tasks. When I finished crunching all downloaded WUs, BOINC started working as expected again. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1775 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8KgWSPEOjrUJ2BgRTGHl5ucqvCqr8sks5rXssNgaJpZM4Lwvfr>.

sirzooro · 2017-02-01T16:27:43Z

Thanks for link. I will try to play with it a bit.

sorcrosc · 2017-02-03T22:12:05Z

This also happens when for one project is used <max_concurrent> option in app_config.xml to limit the number of tasks to run simultaneously . If BOINC has plenty of workunits for such project, it doesn't request more work from others and some cores remain dry

sirzooro · 2017-03-20T07:06:57Z

One more issue, just reported on WUProp forum:

Just in case anyone encounters the same issue. A couple of inactive NCI projects prevented me from getting any work for any hardware on one system today. BM (7.6.33 [x64]) event log:
Not requesting tasks: don't need (CPU: not highest priority project; Miner ASIC: not highest priority project; NVIDIA GPU: not highest priority project)

Had run out of Asic & GPU work and was about to run out of CPU work (only 2/7 logical cores being used). BM just kept asking for nci work (PoD style) & ignored the other projects/devices completely.
Serious scheduler bug IMO + stupid error message (CPU isn't a project, even if the code deludes itself into thinking otherwise).

Toby-Broom · 2017-04-12T20:54:58Z

I see the same as sorosc on LHC, as they have a job limit of 24. If I set on this project to unlimited for the Sixtrack app then it will queue based on the cache settings of BOINC, if I set to 24 then it queues no task it just runs upto that limit and when one task is finished it gets another.

sirzooro · 2017-06-23T06:13:03Z

One more case (maybe duplicate of some already mentioned one): DENIS performs some maintenance work now and it sends WUs, but input files cannot be downloaded so WUs ends with "download error". This somehow prevents downloading WUs from Asteroids - my backup project. I saw this in log when I tried to manually update project to download new WUs:

300324 Asteroids@home 2017-06-23 07:24:28 Sending scheduler request: Requested by user.
300325 Asteroids@home 2017-06-23 07:24:28 Not requesting tasks: don't need (not highest priority project)

Looks that these faulty DENIS WUs prevented downloads of other ones from backup project. I had 16 of them in the queue. Remaining 16 CPUs were getting WUs from Asteroids as expected. This was on BOINC 7.6.22 for Linux.

sirzooro · 2017-07-27T18:51:10Z

One one case, this one is interesting. I am crunching "GFN-13 Prime Search" from "PRIVATE GFN SERVER" (run by stream, https://www.primegrid.com/forum_thread.php?id=6511). One of results for completed WU could not be uploaded, and somehow it prevented downloading of new WUs from this project - BOINC client switched to backup project. This is what I found in log:

225112	PRIVATE GFN SERVER	2017-07-27 17:32:53	Requesting new tasks for CPU	
225113	PRIVATE GFN SERVER	2017-07-27 17:32:59	Scheduler request completed: got 0 new tasks	
225114	PRIVATE GFN SERVER	2017-07-27 17:32:59	Result gfn13_72132256_1499672386_1 is no longer usable	
225115	PRIVATE GFN SERVER	2017-07-27 17:32:59	No tasks sent

I have aborted this upload and requested project update. After doing this new WUs were downloaded without problem:

226443	PRIVATE GFN SERVER	2017-07-27 20:04:54	update requested by user	
226444	PRIVATE GFN SERVER	2017-07-27 20:04:56	Sending scheduler request: Requested by user.	
226445	PRIVATE GFN SERVER	2017-07-27 20:04:56	Reporting 1 completed tasks	
226446	PRIVATE GFN SERVER	2017-07-27 20:04:56	Requesting new tasks for CPU	
226447	PRIVATE GFN SERVER	2017-07-27 20:05:01	Scheduler request completed: got 15 new tasks

I am not sure if this is problem with client or server, it may be on either side.

Toby-Broom · 2017-07-27T20:05:21Z

Another example here is if a task goes to the state VM unmanagble it depletes the queues tasks and just sits there with 1 bad task till you abort the it reloads n tasks

sirzooro · 2017-09-15T18:38:51Z

And next one: I configured one project via app_config.xml to use 22 out of 32 cores. Remaining 10 were left for another project with very short tasks. That 2nd project also has very limited WU supply, so BOINC was not able to build buffer for it. As a result BOINC kept downloading tasks from 1st project until it filled work queue. At this point it stopped trying to download tasks from 2nd project because queue was full, so 10 cores reserved for it were idle.

davidpanderson · 2017-09-15T23:05:09Z

Can you reproduce this on the client emulator?
https://boinc.berkeley.edu/dev/sim_web.php
That makes it easier for me to fix the problem.

Toby-Broom · 2017-09-18T05:18:21Z

My PC became VM unmanagable, here is sim with the required files, I didn't look to see if the SIM was blocked? https://boinc.berkeley.edu/dev/sim_web.php?action=show_simulation&scen=154&sim=0

Here is one with 24 job limit
https://boinc.berkeley.edu/dev/sim_web.php?action=simulation_form&scen=155

ChristianBeer added C: Client - Scheduler Policy E: to be determined P: Major T: Defect labels Jan 30, 2017

ChristianBeer added this to the Client/Manager 8.0 milestone Apr 12, 2017

ChristianBeer added the C: Client - Logging label Apr 12, 2017

Ageless93 added this to Backlog in Client and Manager via automation Nov 11, 2017

AenBleidd added this to To do in BOINC Client/Manager Oct 28, 2019

AenBleidd mentioned this issue Apr 29, 2021

Only one GPU is used. #4352

Closed

AenBleidd removed this from Backlog in Client and Manager Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOINC may not use all CPUs in some cases #1775

BOINC may not use all CPUs in some cases #1775

sirzooro commented Jan 29, 2017 •

edited

sirzooro commented Jan 31, 2017 •

edited

davidpanderson commented Jan 31, 2017 via email

sirzooro commented Feb 1, 2017

sorcrosc commented Feb 3, 2017 •

edited

sirzooro commented Mar 20, 2017 •

edited

Toby-Broom commented Apr 12, 2017

sirzooro commented Jun 23, 2017 •

edited

sirzooro commented Jul 27, 2017 •

edited

Toby-Broom commented Jul 27, 2017

sirzooro commented Sep 15, 2017 •

edited

davidpanderson commented Sep 15, 2017

Toby-Broom commented Sep 18, 2017 •

edited

BOINC may not use all CPUs in some cases #1775

BOINC may not use all CPUs in some cases #1775

Comments

sirzooro commented Jan 29, 2017 • edited

sirzooro commented Jan 31, 2017 • edited

davidpanderson commented Jan 31, 2017 via email

sirzooro commented Feb 1, 2017

sorcrosc commented Feb 3, 2017 • edited

sirzooro commented Mar 20, 2017 • edited

Toby-Broom commented Apr 12, 2017

sirzooro commented Jun 23, 2017 • edited

sirzooro commented Jul 27, 2017 • edited

Toby-Broom commented Jul 27, 2017

sirzooro commented Sep 15, 2017 • edited

davidpanderson commented Sep 15, 2017

Toby-Broom commented Sep 18, 2017 • edited

sirzooro commented Jan 29, 2017 •

edited

sirzooro commented Jan 31, 2017 •

edited

sorcrosc commented Feb 3, 2017 •

edited

sirzooro commented Mar 20, 2017 •

edited

sirzooro commented Jun 23, 2017 •

edited

sirzooro commented Jul 27, 2017 •

edited

sirzooro commented Sep 15, 2017 •

edited

Toby-Broom commented Sep 18, 2017 •

edited