Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] High CPU usage in certain circumstances #106

Closed
yochananmarqos opened this issue Nov 26, 2019 · 84 comments
Closed

[BUG] High CPU usage in certain circumstances #106

yochananmarqos opened this issue Nov 26, 2019 · 84 comments
Assignees
Labels

Comments

@yochananmarqos
Copy link
Contributor

Clight version the issue has been seen with:

4.0

Used distribution:

Manjaro GNOME

Describe the bug
A few things:

  • At the top of the log, it says:
(W)[10:32:14]{opts.c:194}	SCREEN forcefully disabled as Clightd was built without screen support.

Is there a missing Cmake flag in the PKGBUILD?

  • When I start my laptop (Dell Inspiron 5575) on battery power, the clight process uses a lot of CPU and no log is generated. If I kill clight, restart the clightd.service and start clight again, everything is fine.

  • When keyboard backlight calibration is on, the backlight does not come on at all in the dark. It kind of defeats the purpose of having a keyboard backlight.

  • When dpms is enabled, the screen turns of while watching a full-screen video. I use SMPlayer.

Expected behavior
Self-explanatory.

To Reproduce
Normal use.

clight.log

@FedeDP
Copy link
Owner

FedeDP commented Nov 26, 2019

Hi! Thanks for reporting these issues!

At the top of the log, it says:

Fixed in Clightd 4.0-2 (https://aur.archlinux.org/packages/clightd/). Thanks!

When dpms is enabled, the screen turns of while watching a full-screen video. I use SMPlayer.

SMplayer uses mpv as backend. Unfortunately org.freedesktop.ScreenSaver support is broken in mpv at the moment. I have the same issue. I just right click on clight desktop file and select "Pause backlight calibration"; or you can use eg: vlc!
Hopefully it will be fixed soon.

When keyboard backlight calibration is on, the backlight does not come on at all in the dark. It kind of defeats the purpose of having a keyboard backlight.

This is weird! Can you report here output of: busctl call org.freedesktop.UPower /org/freedesktop/UPower/KbdBacklight org.freedesktop.UPower.KbdBacklight GetMaxBrightness ? Thanks!

When I start my laptop (Dell Inspiron 5575) on battery power, the clight process uses a lot of CPU and no log is generated. If I kill clight, restart the clightd.service and start clight again, everything is fine.

This is weird; i cannot reproduce it here. Your log seems fine.
Can you paste here a log from a clight fault run?

@yochananmarqos
Copy link
Contributor Author

yochananmarqos commented Nov 26, 2019

Fixed in Clightd 4.0-2 (https://aur.archlinux.org/packages/clightd/).

Aha! I thought there was a Cmake flag missing. 😄️ Thanks!

output of: busctl call org.freedesktop.UPower /org/freedesktop/UPower/KbdBacklight org.freedesktop.UPower.KbdBacklight GetMaxBrightness

i 2

I re-enabled keyboard backlight calibration and I'll see if I can catch something abnormal in the log.

@FedeDP
Copy link
Owner

FedeDP commented Nov 26, 2019

Note how kbd backlight is set to (1.0 - backlight_level) * max_kbd_backlight (in your case 2): https://github.com/FedeDP/Clight/blob/master/src/modules/backlight.c#L268.
Moreover, it is rounded to nearest integer, thus it should be set to "2" on low ambient brightness, ie: your maximum!

Let me know about the other issue too! Thanks!

@NICHOLAS85
Copy link
Contributor

I’ve also experienced the issue with the clight process using a lot of cpu a couple times. I haven’t been able to pinpoint when this occurs or what causes it, I usually only notice once I feel my laptop warming up abnormally. Killing and starting the clight process fixes this. I’ll leave on the verbose setting and attempt to share the log the next time this occurs.

@FedeDP
Copy link
Owner

FedeDP commented Nov 26, 2019

Great, thanks for joining! I have been using clight devel for months and it didn't occur here, or I didn't notice...

EDIT: care to share your configs @yochananmarqos @NICHOLAS85 ?

@NICHOLAS85
Copy link
Contributor

@yochananmarqos
Copy link
Contributor Author

Here's my clight.conf.

@yochananmarqos
Copy link
Contributor Author

I noticed a webcam error in my journal after rebooting:

Nov 26 20:44:36 inspirognome kernel: uvcvideo: Failed to query (SET_CUR) UVC control 10 on unit 2: -32 (exp. 2).
Nov 26 20:44:36 inspirognome kernel: uvcvideo: Failed to query (SET_CUR) UVC control 4 on unit 1: -32 (exp. 4).

Only one error in the clight.log:

(D)[20:44:36]{bus.c:279}	get_screen_brightness(): Input/output error

@FedeDP
Copy link
Owner

FedeDP commented Nov 27, 2019

Here’s a link to my current config file:

Here's my clight.conf.

You both have dimmer dpms and gamma disabled. I will try with this config if i can reproduce the issue!

I noticed a webcam error in my journal after rebooting:

That should not be an issue.

Only one error in the clight.log:

This is interesting. It seems that XGetImage() fails here: https://github.com/FedeDP/Clightd/blob/master/src/modules/screen.c#L95.

@FedeDP FedeDP self-assigned this Nov 27, 2019
@FedeDP FedeDP added the bug label Nov 27, 2019
@yochananmarqos
Copy link
Contributor Author

I can't reproduce the issue with the keyboard backlight now. Maybe it was a fluke or kernel 5.4 fixed it. It fixed not being able to resume from suspend, so you never know. ¯\_(ツ)_/¯

@FedeDP
Copy link
Owner

FedeDP commented Nov 28, 2019

Let's focus on high cpu bug then!

@NICHOLAS85
Copy link
Contributor

clight.log
The high cpu bug has occurred again, attached is a log. Some info is I recently rebooted. Additionally, I built clightd with SCREEN support but this log states it was built without at the top. After restarting clight that message at the top dissapears.

@FedeDP
Copy link
Owner

FedeDP commented Nov 30, 2019

Mmmh I can't see anything wrong there. Is clightd.service enabled?

@NICHOLAS85
Copy link
Contributor

Yes, heres an output

$ systemctl status clightd
● clightd.service - Bus service to manage various screen related properties (gamma, dpms, backlight)
   Loaded: loaded (/lib/systemd/system/clightd.service; disabled; vendor preset: enabled)
   Active: active (running) since Sat 2019-11-30 03:21:47 EST; 1h 58min ago
 Main PID: 12989 (clightd)
   Memory: 3.0M
   CGroup: /system.slice/clightd.service
           └─12989 /usr/lib/clightd/clightd

Nov 30 03:21:47 Nicks-Laptop systemd[1]: Starting Bus service to manage various screen related properties (gamma, dpms, backlight)...
Nov 30 03:21:47 Nicks-Laptop clightd[12989]: Registered Als sensor.
Nov 30 03:21:47 Nicks-Laptop clightd[12989]: (watch_displays_using_udev) Starting
Nov 30 03:21:47 Nicks-Laptop clightd[12989]: Registered Camera sensor.
Nov 30 03:21:47 Nicks-Laptop systemd[1]: Started Bus service to manage various screen related properties (gamma, dpms, backlight).

@FedeDP
Copy link
Owner

FedeDP commented Nov 30, 2019

It seems disabled though:
Loaded: loaded (/lib/systemd/system/clightd.service; **disabled**; vendor preset: enabled)

Btw stil unable to reproduce it here unfortunately, even with your configs :(

@yochananmarqos
Copy link
Contributor Author

Here's a log where the keyboard backlight did not come on in a dark room and and it did not adjust the backlight at all.
clight.log

@FedeDP
Copy link
Owner

FedeDP commented Nov 30, 2019

(D)[20:03:08]{bus.c:279} get_screen_brightness(): Input/output error
(D)[20:03:08]{timer.c:40} Disarmed timerfd on fd 25.
(D)[20:03:09]{backlight.c:317} Average frames brightness: 0.000000.
(I)[20:03:09]{backlight.c:252} Ambient brightness: 0.000. Clogged capture detected.

It seems like Clightd is not working well here: SCREEN module is failing to retrieve screen brightness and webcam captures return 0.00000 ambient brightness.
Consequently, Clight treats 0.0000 value as clogged capture and keyboard backlight is not touched (neither is screen backlight).

I always though 0.0000 was an impossible number for ambient brightness...is your room this dark? I think Clightd is misbehaving here...

@yochananmarqos
Copy link
Contributor Author

Yes, it was very dark. I was watching videos at night. I just updated to kernel 5.4.1, I'll see if there's any change tonight.

@yochananmarqos
Copy link
Contributor Author

The log was the same tonight, but I caught it chewing on the CPU again. When it happens, it stops logging.

The uvcvideo errors in the journal have to have something to do with it as it wasn't happening before v4.0.

@FedeDP
Copy link
Owner

FedeDP commented Dec 2, 2019

Yes, it was very dark. I was watching videos at night. I just updated to kernel 5.4.1, I'll see if there's any change tonight.

Still i think that 0.00000 is an unreachable value...the issue itself can be quickly fixed (i should just set shutter_threshold default value to < 0 when conf option is commented), i'd like to understand if it is a normal value (if you can reproduce the "0.00000" ambient brightness it would be great!).
May be it is just you webcam's quirk :)

The log was the same tonight, but I caught it chewing on the CPU again. When it happens, it stops logging.

Interesting! I was not able to reproduce it yet.
Are you sure it is Clight and not Clightd that is using CPU? Did something happened before (eg: system was resumed from suspend...)?

The uvcvideo errors in the journal have to have something to do with it as it wasn't happening before v4.0.

It is because Clightd is now always resetting webcam default values before capturing to avoid captures with different parameters!
It the webcam still when those errors appear in dmesg? Can you check with any application?

@NICHOLAS85
Copy link
Contributor

@FedeDP, sorry for the late response. It is true clightd was somehow disabled at some point but still functioning. I've re-enabled it for now and will continue to try to narrow down the environment causing clight to get stuck using high cpu. I can also confirm that the process is clight and not clightd which has this CPU problem. I believe I've noticed it happening most after wakes from hibernation, but not 100% of the time.

@FedeDP
Copy link
Owner

FedeDP commented Dec 3, 2019

I've re-enabled it for now and will continue to try to narrow down the environment causing clight to get stuck using high cpu.

Thank you very much!

I can also confirm that the process is clight and not clightd which has this CPU problem.

Great information, thanks.

I believe I've noticed it happening most after wakes from hibernation, but not 100% of the time.

That is unfortunate...my laptop does not support hibernation very well thus i would not be able to test it...

@FedeDP
Copy link
Owner

FedeDP commented Dec 16, 2019

Any news on this topic?
I am really willing to kill this bug! :)

@yochananmarqos
Copy link
Contributor Author

The high CPU issue hasn't resurfaced at all for me.

@FedeDP
Copy link
Owner

FedeDP commented Dec 17, 2019

That is great! But it still leaves us with some unresolved questions...

@NICHOLAS85
Copy link
Contributor

High CPU resurfaced once again, it had been quite a while since it last occured. I rebooted after updating many packages including my kernel and such. Noticed the high cpu shortly after the reboot.

Log attached: clightlog.txt

@FedeDP
Copy link
Owner

FedeDP commented Dec 24, 2019

Once more i can only see an unusual error about get_screen_brightness call.
Can you try to disable SCREEN module and see if issue does persist?
Thank you and merry xmas!

@NICHOLAS85
Copy link
Contributor

I've set no_screen to true and I'll let you know if it occurs again. Merry Christmas!

@FedeDP
Copy link
Owner

FedeDP commented Jun 22, 2020

I believe this has happened consistently after waking from hibernation or post reboot.

Yup it is not an issue; ie: it means that Clight is restarted before your webcam/screen is properly restarted, thus it may fail to eg: change gamma temperature/get screen brightness/use webcam.

Yes this occurred with the latest Clight-git (767e184) unfortunately.

That is very unfortunate :(

@finkrer
Copy link

finkrer commented Jul 9, 2020

image
Oh no, here it is again.

The log looks like this.

@FedeDP
Copy link
Owner

FedeDP commented Jul 9, 2020

Unfortunately once more nothing is shown in there :(

I am currently unable to fix it anytime soon, as i was hit by that like 2 times in 6 months; i will probably release clight 4.1 soon, even if this bug persists, as it is too rare to block a release with so many improvements and bugfixes.

After that, i will hopefully find the time to carry on libmodule 6.0.0 (https://github.com/FedeDP/libmodule/tree/devel) and consequently clight + clightd porting to new API.
Finally, i really hope that new libmodule can fix it, as i am starting to think that this issue is a bug from libmodule indeed.

@FedeDP FedeDP pinned this issue Sep 8, 2020
@FedeDP
Copy link
Owner

FedeDP commented Dec 20, 2020

Hi everyone!
libmodule 5.0.1 was released; it contains a really small yet important fix on epoll loop error management.
I don't really know if that will fix this issue but it was worth a try.
Let me know if issue arises again with libmodule 5.0.1!

@finkrer
Copy link

finkrer commented Jan 6, 2021

libmodule 5.0.1-1, clight 4.2-2, clightd 5.1-1.

Still there unfortunately.

@NICHOLAS85
Copy link
Contributor

Just got hit with it again after months unfortunately with the latest commits as of today.
b44c1e1
FedeDP/libmodule@3f60063

@FedeDP
Copy link
Owner

FedeDP commented Jan 9, 2021

Sigh 😔 that was worth a try.
I am currently back at working on libmodule 6.0.0 that will bring loooots of changes and improvements.
Hopefully it will fix this too.
It feels bad being unable to fix this; i could add some debug print here and there but it takes so much time waiting on issue to appear that I fear it would be pretty useless.

@tduboys
Copy link

tduboys commented Jan 17, 2021

Hi all,
I just installed Arch on a new desktop hardware, and got the CPU-load error with libmodule 5.0.1-1, clight 4.2-2, clightd 5.1-1.

the only error I see on logs is : (D)[22:36:12]{bus.c:175} set_backlight_level(): Connection timed out
I disabled screen module because I got this error on loop. Since, I got it only once, but didnt change the CPU-load issue.
If there is someone I can do for help on this issue, please ask

I never had issue with clight on other installs.

@promeneur
Copy link
Collaborator

Clight and clightd master branch 12/2/2021
openSuse 15.2

Each time I start clight, after a delay it consumes 100 % of one core. I must terminate clight. I can't use clight.

No problem with package clight 4.2 and clightd 5.1.

@FedeDP
Copy link
Owner

FedeDP commented Feb 12, 2021

It this 100% reproducible? Does it happen every time?
(i obviously am not able to reproduce it, it's working here)

@promeneur
Copy link
Collaborator

It's a random phenomenon.

No problem for 20 mn.

@promeneur
Copy link
Collaborator

promeneur commented Feb 13, 2021

clight.conf.txt

clight.log

Again this morning.
After 3 h without any problem
Just (about 20 s) before dimmer switch off the screen (I get a black screen after 5 mn)
I had to terminate clight.

@FedeDP
Copy link
Owner

FedeDP commented Feb 13, 2021

Just (about 20 s) before dimmer switch off the screen (I get a black screen after 5 mn)

I don't think that's the root cause; anyway thank you very much! You are getting hitted by lots of weird bugs, i am sorry.
I pushed a new commit that should help tracking the module that is causing the high cpu usage. Can you upgrade clight?
Hopefully, as you are hit often, we will be able to track this down!

@promeneur
Copy link
Collaborator

I updated yesterday. Is there a new master branch version today ?

@FedeDP
Copy link
Owner

FedeDP commented Feb 13, 2021

Just pushed!

@FedeDP
Copy link
Owner

FedeDP commented Feb 15, 2021

To everyone: i recently added some more debug logs that should greatly help finding the root cause; i am optimist that we can fix this!
If you get hit by the bug, before killing clight, please do something like:

busctl --user set-property org.clight.clight /org/clight/clight/Conf org.clight.clight.Conf Verbose "b" true

Then wait some seconds and finally kill clight.
Then please share the log with me!
Thank you very much!

@FedeDP
Copy link
Owner

FedeDP commented Feb 19, 2021

User @NICHOLAS85 got it by this with new clight master. I can now tell that the issue comes from "Inhibit" dbus monitor that seems to get crazy and continuously report monitor fd as readable.
I added a new DEBUG print; it would be of great help if you all could upgrade to new clight master. Thank you!

@promeneur
Copy link
Collaborator

3 mn => 560 MB log !

i give you an extract : the beginning and the end.

clight_extract.log

The problem occurs at the same time there were many disk accesses.

@FedeDP
Copy link
Owner

FedeDP commented Feb 20, 2021

@promeneur yay!! Thank you very much! I will fix this bug finally O.O

FedeDP added a commit that referenced this issue Feb 20, 2021
…oping on our dup fd; instead, close our own monitor bus connection and open a new one.
@FedeDP
Copy link
Owner

FedeDP commented Feb 20, 2021

New git master should have a proper fix!
I eagerly await for your tests!
@promeneur you seem to get hit highly often, thus you are the better tester available :)
I'd give us a week of test, if no issue appear, this is fixed!
Moreover, please note that you will find a warn message "Connection reset by peer" in your logs if the issue would have happened with old version.

It seems like the "inhibit" dbus monitor connection is closed by peer (ie: dbus server most probably; not clight) for some reasons; and i was not actually checking the condition.
Now i will gracefully close my side of connection too, and open a new one to go on monitoring.

@FedeDP FedeDP unpinned this issue Feb 24, 2021
@FedeDP
Copy link
Owner

FedeDP commented Feb 25, 2021

This bug is (finally) fixed in latest Clight master!
Thanks to all of you, that greatly helped to debug it; really i appreciated every single minute you spent testing over and over, or just taking the time to chime in.
Thank you!

@FedeDP FedeDP closed this as completed Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants