New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kswapd 100% CPU-usage #219

Closed
WereCatf opened this Issue Mar 9, 2016 · 36 comments

Comments

Projects
None yet
8 participants
@WereCatf
Contributor

WereCatf commented Mar 9, 2016

I don't know what's wrong with this, but e.g. when testing a PSOne-emulator on the OPi PC I noticed one CPU-core was pegged 100% by kswapd. I also noticed it happening during some compiles and stuff. I hear jernej knows something about this, someone who can contact him should possibly ask what's going on.

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Mar 9, 2016

Member

Yes, it happend to me once while just playing around on desktop, video ... so far I was not able to recreate ... Clearly something is wrong.

Member

igorpecovnik commented Mar 9, 2016

Yes, it happend to me once while just playing around on desktop, video ... so far I was not able to recreate ... Clearly something is wrong.

@WereCatf

This comment has been minimized.

Show comment
Hide comment
@WereCatf

WereCatf Mar 9, 2016

Contributor

It doesn't actually seem to be related to swap, despite it being kswapd that goes nuts -- every time I've seen it happen so far there has been literally 0 bytes in swap.

Contributor

WereCatf commented Mar 9, 2016

It doesn't actually seem to be related to swap, despite it being kswapd that goes nuts -- every time I've seen it happen so far there has been literally 0 bytes in swap.

@kubajar

This comment has been minimized.

Show comment
Hide comment
@kubajar

kubajar Mar 9, 2016

Maybe it's related to https://bugzilla.kernel.org/show_bug.cgi?id=65201

I can see 100% kswapd cpu usage if I upload about 100 MB of data to orange pi 2 over samba.

Temporary solution:
root@orangepi2mini:~# cat /bin/cpuload.sh
#!/bin/sh
CPU2=top -b -n 1 | grep kswapd | awk '{print $9}'
CPU=${CPU2%.*}
if [ $CPU -gt 90 ]; then
echo 3 > /proc/sys/vm/drop_caches
echo $CPU
fi

crontab -l
*/5 * * * * /bin/cpuload.sh

I will test some suggestions in mentioned thread and post results.

Have a nice day.

JK

kubajar commented Mar 9, 2016

Maybe it's related to https://bugzilla.kernel.org/show_bug.cgi?id=65201

I can see 100% kswapd cpu usage if I upload about 100 MB of data to orange pi 2 over samba.

Temporary solution:
root@orangepi2mini:~# cat /bin/cpuload.sh
#!/bin/sh
CPU2=top -b -n 1 | grep kswapd | awk '{print $9}'
CPU=${CPU2%.*}
if [ $CPU -gt 90 ]; then
echo 3 > /proc/sys/vm/drop_caches
echo $CPU
fi

crontab -l
*/5 * * * * /bin/cpuload.sh

I will test some suggestions in mentioned thread and post results.

Have a nice day.

JK

@WereCatf

This comment has been minimized.

Show comment
Hide comment
@WereCatf

WereCatf Mar 9, 2016

Contributor

@kubajar I already tested that, but I am not getting any results. Are you?

Contributor

WereCatf commented Mar 9, 2016

@kubajar I already tested that, but I am not getting any results. Are you?

@kubajar

This comment has been minimized.

Show comment
Hide comment
@kubajar

kubajar Mar 10, 2016

What I did:
sysctl.conf: vm.swappiness=0
fstab: #/var/swap none swap sw 0 0 (note #)
What is interesting: If I add vm.min_free_kbytes=67584 to sysctl.conf, then "echo 3 > /proc/sys/vm/drop_caches" doesn't work.

I analyzed some patches for kernel 3.7 (kswapd 100% CPU), it seems, that some of them are partially applied to 3.4.110 vmscan.c, maybe there is some piece of code in patches, that will help.

I also tried to update boot.scr using mkimage and add mem=somenumber, but after adding mem=968M Orange pi 2 is unable to boot.

Maybe some setenv bootargs tuning will help, do You have any ideas?

Have a nice day.

JK

kubajar commented Mar 10, 2016

What I did:
sysctl.conf: vm.swappiness=0
fstab: #/var/swap none swap sw 0 0 (note #)
What is interesting: If I add vm.min_free_kbytes=67584 to sysctl.conf, then "echo 3 > /proc/sys/vm/drop_caches" doesn't work.

I analyzed some patches for kernel 3.7 (kswapd 100% CPU), it seems, that some of them are partially applied to 3.4.110 vmscan.c, maybe there is some piece of code in patches, that will help.

I also tried to update boot.scr using mkimage and add mem=somenumber, but after adding mem=968M Orange pi 2 is unable to boot.

Maybe some setenv bootargs tuning will help, do You have any ideas?

Have a nice day.

JK

@WereCatf

This comment has been minimized.

Show comment
Hide comment
@WereCatf

WereCatf Mar 10, 2016

Contributor

@kubajar Not at the moment, no, and I'm busy with improving desktop-extras and stuff at the moment -- really need to get that stuff in proper shape and get some repos up. If you can't figure out what to do with kswapd then we'll just have to let this issue linger for a bit longer, I suppose. It's not like we don't already have a bunch of issues, what's one more in the pile, eh? ;)

Contributor

WereCatf commented Mar 10, 2016

@kubajar Not at the moment, no, and I'm busy with improving desktop-extras and stuff at the moment -- really need to get that stuff in proper shape and get some repos up. If you can't figure out what to do with kswapd then we'll just have to let this issue linger for a bit longer, I suppose. It's not like we don't already have a bunch of issues, what's one more in the pile, eh? ;)

@kubajar

This comment has been minimized.

Show comment
Hide comment
@kubajar

kubajar Mar 11, 2016

vm.swappiness=0 and vm.min_free_kbytes=0 eliminates this problem completely for me. The higher vm.min_free_kbytes value is, the higher is occurence of kswapd problem, so I tried to disable it completely and it works.

I also have "tmpfs /tmp tmpfs defaults,noatime,nosuid,size=100m 0 0" in fstab, but I don't think it affects this problem, but not tested without it. /var/swap is removed from fstab.

I know, that disabling swap is a bit dangerous, but swapping to sd card is bad idea too, what a pity, that zram isn't present in 3.4.x...

Have a nice day.

JK

kubajar commented Mar 11, 2016

vm.swappiness=0 and vm.min_free_kbytes=0 eliminates this problem completely for me. The higher vm.min_free_kbytes value is, the higher is occurence of kswapd problem, so I tried to disable it completely and it works.

I also have "tmpfs /tmp tmpfs defaults,noatime,nosuid,size=100m 0 0" in fstab, but I don't think it affects this problem, but not tested without it. /var/swap is removed from fstab.

I know, that disabling swap is a bit dangerous, but swapping to sd card is bad idea too, what a pity, that zram isn't present in 3.4.x...

Have a nice day.

JK

@WereCatf

This comment has been minimized.

Show comment
Hide comment
@WereCatf

WereCatf Mar 11, 2016

Contributor

What are the downsides of setting it to 0, though?

Also, VanirAOSP/kernel_sony_msm8x27@a72a945 seems to have zswap backported, could possibly add it Armbian - kernel. There are a few more commits there related to zswap, but if you're feeling adventurous you could always try those.

Contributor

WereCatf commented Mar 11, 2016

What are the downsides of setting it to 0, though?

Also, VanirAOSP/kernel_sony_msm8x27@a72a945 seems to have zswap backported, could possibly add it Armbian - kernel. There are a few more commits there related to zswap, but if you're feeling adventurous you could always try those.

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser Mar 22, 2016

Member

Any updates on this?

Member

ThomasKaiser commented Mar 22, 2016

Any updates on this?

@jernejsk

This comment has been minimized.

Show comment
Hide comment
@jernejsk

jernejsk Mar 22, 2016

Contributor

Maybe it is the same (not very well understood) issue that I have on OpenELEC. You can try to compile kernel with CONFIG_CMA disabled and see if that helps. It always works for me. Downside of this workaround is that 256MB (or whatever value is set in CONFIG_ION_SUNXI_RESERVE_LIST) will not be visible for system.

Contributor

jernejsk commented Mar 22, 2016

Maybe it is the same (not very well understood) issue that I have on OpenELEC. You can try to compile kernel with CONFIG_CMA disabled and see if that helps. It always works for me. Downside of this workaround is that 256MB (or whatever value is set in CONFIG_ION_SUNXI_RESERVE_LIST) will not be visible for system.

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Apr 11, 2016

Very closely following this issue, as me too affected by this running 5.5.

avinashga23 commented Apr 11, 2016

Very closely following this issue, as me too affected by this running 5.5.

@jernejsk

This comment has been minimized.

Show comment
Hide comment
@jernejsk

jernejsk Apr 11, 2016

Contributor

There is a chance that newer version of Allwinner's kernel, provided by FriendlyARM, doesn't have this issue...

Contributor

jernejsk commented Apr 11, 2016

There is a chance that newer version of Allwinner's kernel, provided by FriendlyARM, doesn't have this issue...

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@jernejsk

This comment has been minimized.

Show comment
Hide comment
@jernejsk

jernejsk Apr 12, 2016

Contributor

Yes
12. apr. 2016 7:59 AM je oseba Igor Pečovnik notifications@github.com napisala:This one?
https://github.com/friendlyarm/h3_lichee/tree/master/linux-3.4

—You are receiving this because you commented.Reply to this email directly or view it on GitHub

Contributor

jernejsk commented Apr 12, 2016

Yes
12. apr. 2016 7:59 AM je oseba Igor Pečovnik notifications@github.com napisala:This one?
https://github.com/friendlyarm/h3_lichee/tree/master/linux-3.4

—You are receiving this because you commented.Reply to this email directly or view it on GitHub

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Apr 12, 2016

I build Armbian 5.7 headless by changing CONFIG_ION_SUNXI_RESERVE_LIST to 64MB, now this issue is not observed even under heavy load (Cassandra and Zookeeper together 👍 also total memory available for system is 937 MB for Orange PI PC. I have built Orange PI Plus image with 32 MB. I will receive plus 2 board today and will be testing soon.

avinashga23 commented Apr 12, 2016

I build Armbian 5.7 headless by changing CONFIG_ION_SUNXI_RESERVE_LIST to 64MB, now this issue is not observed even under heavy load (Cassandra and Zookeeper together 👍 also total memory available for system is 937 MB for Orange PI PC. I have built Orange PI Plus image with 32 MB. I will receive plus 2 board today and will be testing soon.

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Apr 12, 2016

Member

@jernejsk
Working on it. I hope we will get alternative H3 kernel by the end of the day.

Member

igorpecovnik commented Apr 12, 2016

@jernejsk
Working on it. I hope we will get alternative H3 kernel by the end of the day.

@ThomasKaiser

This comment has been minimized.

Show comment
Hide comment
@ThomasKaiser

ThomasKaiser Apr 12, 2016

Member

@igorpecovnik If you really try to rebase all the stuff on FriendlyARM's kernel please drop the gc2035 patch entirely since a new version has to be built anyway.

Member

ThomasKaiser commented Apr 12, 2016

@igorpecovnik If you really try to rebase all the stuff on FriendlyARM's kernel please drop the gc2035 patch entirely since a new version has to be built anyway.

@Doctorslo

This comment has been minimized.

Show comment
Hide comment

Doctorslo commented Apr 16, 2016

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Apr 16, 2016

Member

OK, will try. Here is a kernel if you wanna join testing:
http://mirror.igorpecovnik.com/test/CMA-linux-image-sun8i_5.07_armhf.deb

Member

igorpecovnik commented Apr 16, 2016

OK, will try. Here is a kernel if you wanna join testing:
http://mirror.igorpecovnik.com/test/CMA-linux-image-sun8i_5.07_armhf.deb

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Apr 16, 2016

Hi @igorpecovnik is the new kernel based on friendly arm's kernel?

avinashga23 commented Apr 16, 2016

Hi @igorpecovnik is the new kernel based on friendly arm's kernel?

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Apr 16, 2016

Member

@avinashga23
yes

BTW: i am running deluge download now for almost two hours. Half of this time I added extra stress - in 2 minutes cron: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 20s

Everything is normal.

Member

igorpecovnik commented Apr 16, 2016

@avinashga23
yes

BTW: i am running deluge download now for almost two hours. Half of this time I added extra stress - in 2 minutes cron: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 20s

Everything is normal.

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Apr 16, 2016

Great, I will try this on my Orange Pi Plus 2 and orange pi PC now

avinashga23 commented Apr 16, 2016

Great, I will try this on my Orange Pi Plus 2 and orange pi PC now

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Apr 16, 2016

@igorpecovnik I installed the package by dpkg -i method. I have run stressful applications like cassandra and zookeeper and monitoring htop and temperature, So far working OK without kswap problem (I am sure if it is there it would have shown the signs by now). The good part is entire 1000 MB is available for the system. I would like to test this package one Orange PI Plus 2 (2GB) Board.

Shall i follow the same installation process for plus 2 too?

avinashga23 commented Apr 16, 2016

@igorpecovnik I installed the package by dpkg -i method. I have run stressful applications like cassandra and zookeeper and monitoring htop and temperature, So far working OK without kswap problem (I am sure if it is there it would have shown the signs by now). The good part is entire 1000 MB is available for the system. I would like to test this package one Orange PI Plus 2 (2GB) Board.

Shall i follow the same installation process for plus 2 too?

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Apr 16, 2016

Member

My Opi+ is also working fine after 4 hours of hard work ... I guess we can close this issue and fingers crossed that we won't need to reopen it.

Opi+2 should work fine out of the box with this kernel.

Member

igorpecovnik commented Apr 16, 2016

My Opi+ is also working fine after 4 hours of hard work ... I guess we can close this issue and fingers crossed that we won't need to reopen it.

Opi+2 should work fine out of the box with this kernel.

@deltasigh

This comment has been minimized.

Show comment
Hide comment
@deltasigh

deltasigh Apr 29, 2016

I run the latest 5.07 to compile a package running make -j3 and kswapd0 appears using 100% of a cpu!
I disabled the normal 128M swap with swapoff and inserted an usb flash with 1.5 GB swap in the partition table and activated with swapon.
Any information you'd like me to post? I

deltasigh commented Apr 29, 2016

I run the latest 5.07 to compile a package running make -j3 and kswapd0 appears using 100% of a cpu!
I disabled the normal 128M swap with swapoff and inserted an usb flash with 1.5 GB swap in the partition table and activated with swapon.
Any information you'd like me to post? I

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik Apr 29, 2016

Member

5.07 is not fixed yet. You need to either build it yourself or use a kernel upgrade from this post.

Member

igorpecovnik commented Apr 29, 2016

5.07 is not fixed yet. You need to either build it yourself or use a kernel upgrade from this post.

@deltasigh

This comment has been minimized.

Show comment
Hide comment
@deltasigh

deltasigh May 3, 2016

Igor, I downloaded 5.10, where it is mentioned that the kswad0 is "fixed", unfortunately, it still comes up with the 100% on one of the CPUs

deltasigh commented May 3, 2016

Igor, I downloaded 5.10, where it is mentioned that the kswad0 is "fixed", unfortunately, it still comes up with the 100% on one of the CPUs

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 May 3, 2016

I did upgraded to 5.10 yesterday and my PI's are running from past 12 hours, i have not observed this issue till now. whats the output of your uname -a?

avinashga23 commented May 3, 2016

I did upgraded to 5.10 yesterday and my PI's are running from past 12 hours, i have not observed this issue till now. whats the output of your uname -a?

@igorpecovnik

This comment has been minimized.

Show comment
Hide comment
@igorpecovnik

igorpecovnik May 3, 2016

Member

Can you provide me an example how to reproduce this error? I was not able to catch it since.

Member

igorpecovnik commented May 3, 2016

Can you provide me an example how to reproduce this error? I was not able to catch it since.

@deltasigh

This comment has been minimized.

Show comment
Hide comment
@deltasigh

deltasigh May 7, 2016

For the last 5 days it became my obsession to recreate the problem but I did not have much luck. It appears that if the system starts with a set of programs that do not wake up ksawpd0, it will run OK for as long as I care to run it with no problem. If I start with another set of programs, kswapd0 will appear to run at 100% but not for long! the most time it gathered after 12 hrs operation and trying different actions the total time of kswapd0 was under 00:02:00. Bottom line: I think you took care of this annoying problem! Thank you.

deltasigh commented May 7, 2016

For the last 5 days it became my obsession to recreate the problem but I did not have much luck. It appears that if the system starts with a set of programs that do not wake up ksawpd0, it will run OK for as long as I care to run it with no problem. If I start with another set of programs, kswapd0 will appear to run at 100% but not for long! the most time it gathered after 12 hrs operation and trying different actions the total time of kswapd0 was under 00:02:00. Bottom line: I think you took care of this annoying problem! Thank you.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 3, 2016

I get this behavior, but don't know how to reproduce. It tends to revolve around FF on my OPi+. Usually the system comes to a grinding halt, mouse unresponsive, but the CPU monitor on the taskbar, while noticably erratic, does still continue. CTRL-ALT-F1 to try login with root doesn't work. can enter 'root', but not prompted for password until the issue resolves itself. The issue does resolve itself if patient. Yesterday, I turned off swap via an open terminal as the symptoms began. This worked, crashing FF in the process. dmesg provides no useful information other than FF crashed due to being out of memory.

This box is usually left on 24/7 connected to my main TV via HDMI. I installed armbian 5.10 and then ran nand-sata-install. Upgrading to 5.11 made this problem noticeably worse. I'm not sure I even had the issue before that, but after 5.11, it happened daily before killing swap.

What can I do after the system "comes back" to acquire useful information?

ghost commented Jun 3, 2016

I get this behavior, but don't know how to reproduce. It tends to revolve around FF on my OPi+. Usually the system comes to a grinding halt, mouse unresponsive, but the CPU monitor on the taskbar, while noticably erratic, does still continue. CTRL-ALT-F1 to try login with root doesn't work. can enter 'root', but not prompted for password until the issue resolves itself. The issue does resolve itself if patient. Yesterday, I turned off swap via an open terminal as the symptoms began. This worked, crashing FF in the process. dmesg provides no useful information other than FF crashed due to being out of memory.

This box is usually left on 24/7 connected to my main TV via HDMI. I installed armbian 5.10 and then ran nand-sata-install. Upgrading to 5.11 made this problem noticeably worse. I'm not sure I even had the issue before that, but after 5.11, it happened daily before killing swap.

What can I do after the system "comes back" to acquire useful information?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 3, 2016

Oh, I should mention that there's never any serious use of swap space. Mabye 10 meg. But it's not like it's running out of space. I use a 128MB swap. When the system recovers, the load averages go back to normal with the 5 and 15 min numbers over 5.

ghost commented Jun 3, 2016

Oh, I should mention that there's never any serious use of swap space. Mabye 10 meg. But it's not like it's running out of space. I use a 128MB swap. When the system recovers, the load averages go back to normal with the 5 and 15 min numbers over 5.

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Jun 4, 2016

@emullins the kswapd issue will only effect one CPU core. Even with this issue present you still have 3 CPU cores free for other tasks. I belive issue you are facing is not kswapd related. Try what is the observation using htop

avinashga23 commented Jun 4, 2016

@emullins the kswapd issue will only effect one CPU core. Even with this issue present you still have 3 CPU cores free for other tasks. I belive issue you are facing is not kswapd related. Try what is the observation using htop

@avinashga23

This comment has been minimized.

Show comment
Hide comment
@avinashga23

avinashga23 Jun 4, 2016

This issue was fixed

avinashga23 commented Jun 4, 2016

This issue was fixed

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 7, 2016

I'm not so sure it's fixed. Yes, it is limited to one core, but it still brings the desktop to its knees and mostly inhibiting the mouse. I have seen it since upgrading to 5.13, but not in the last couple days. My use of the device has changed due to other issues, and I don't spend as much time on it. It does seem related to FF, possibly just because that's a memory hog compared to everything else I run.

Browsing on the device is excruciating anyway, even running from EMMC. So I limit my FF use on the OPi+ now.

ghost commented Jun 7, 2016

I'm not so sure it's fixed. Yes, it is limited to one core, but it still brings the desktop to its knees and mostly inhibiting the mouse. I have seen it since upgrading to 5.13, but not in the last couple days. My use of the device has changed due to other issues, and I don't spend as much time on it. It does seem related to FF, possibly just because that's a memory hog compared to everything else I run.

Browsing on the device is excruciating anyway, even running from EMMC. So I limit my FF use on the OPi+ now.

@jernejsk

This comment has been minimized.

Show comment
Hide comment
@jernejsk

jernejsk Jun 9, 2016

Contributor

Igor, while I was preparing H3 linux repo with all the known fixes, I found out that while your were fixing update patches (3.4.39 -> 3.4.112) you removed a bit too much in mm/vmscan.c. Check this two commits:
jernejsk/linux@dc93269
jernejsk/linux@f82b435

Please be aware that there is additional change in balance_pgdat() function which must be included to make this patch useful (around line 2927):

if (!zone_balanced(zone, testorder, 0, end_zone)) {
                all_zones_ok = 0;
                /*
                 * We are still under min water mark.  This
                 * means that we have a GFP_ATOMIC allocation
                 * failure risk. Hurry up!
                 */
                if (!zone_watermark_ok_safe(zone, order,
                        min_wmark_pages(zone), end_zone, 0))
                    has_under_min_watermark_zone = 1;
            } else {
...
Contributor

jernejsk commented Jun 9, 2016

Igor, while I was preparing H3 linux repo with all the known fixes, I found out that while your were fixing update patches (3.4.39 -> 3.4.112) you removed a bit too much in mm/vmscan.c. Check this two commits:
jernejsk/linux@dc93269
jernejsk/linux@f82b435

Please be aware that there is additional change in balance_pgdat() function which must be included to make this patch useful (around line 2927):

if (!zone_balanced(zone, testorder, 0, end_zone)) {
                all_zones_ok = 0;
                /*
                 * We are still under min water mark.  This
                 * means that we have a GFP_ATOMIC allocation
                 * failure risk. Hurry up!
                 */
                if (!zone_watermark_ok_safe(zone, order,
                        min_wmark_pages(zone), end_zone, 0))
                    has_under_min_watermark_zone = 1;
            } else {
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment