Slab leak in Android Kernel #57

grassead · 2020-04-14T19:32:54Z

Hi,

When trying Android 8 on a Sabrelite, I encounter an issue (located in the kernel (4.9.88 - commit 2324d06)).

The "Slab memory" is leaking when the graphics system (SurfaceFlinger) does composition.

For example, I wrote an application that pop a Toast as fast as possible.

Before running the test, cat /proc/meminfo gives:
Slab: 42092 kB
SReclaimable: 15076 kB
SUnreclaim: 27016 kB

After Running the application for 10-15 minutes:
Slab: 78960 kB
SReclaimable: 14828 kB
SUnreclaim: 64132 kB

After killing the application:
Slab: 80956 kB
SReclaimable: 14960 kB
SUnreclaim: 65996 kB

The "SUnreclaim" is not freed.

Could you please help me to fix this issue.

Thanks,

The text was updated successfully, but these errors were encountered:

gibsson · 2020-04-16T12:13:44Z

Hi Adrien,

Thanks for this report. We've seen issues with Android 8 indeed.

We'll look into it as soon as possible but we have many ongoing projects at this time.

Regards,
Gary

JeansHH · 2020-08-12T10:40:07Z

+1 Are there any updates?

ep-skolberg · 2020-08-13T06:30:54Z

+1
for us, this issue blocking an upgrade to Android 8

gibsson · 2020-08-17T08:04:35Z

Sorry we haven't had a chance to look at it yet.
We'll keep you posted as soon as we have an update.

JeansHH · 2020-08-20T06:58:40Z

I looked into the code and it looks like the fences created in viv_fence_create (os/linux/kernel/gc_hal_kernel_sync.c) are never released. Maybe I am wrong or this is not relevant. But if this is the problem, is there a way to release the fence?

gibsson · 2020-08-21T11:55:12Z

Quick update, now looking into this issue.
@JeansHH unfortunately as far as I can tell the userspace should be the one releasing it.
This would point to the Vivante binary blobs which wouldn't bee too surprising.
Good news is that I was able to reproduce the issue 100% of the time, will keep you guys posted.

gibsson · 2020-08-25T09:02:34Z

Another update:

@JeansHH I confirm that there is some leak in the fence that it is unfortunately not the main issue I'm afraid.
Enabling KMEMLEAK showed the fence leak:

unreferenced object 0xd3e19080 (size 128):
  comm "surfaceflinger", pid 301, jiffies 4294948055 (age 102.340s)
  hex dump (first 32 bytes):
    01 00 00 00 74 7b 6e c1 00 00 00 00 00 00 00 00  ....t{n.........
    90 90 e1 d3 90 90 e1 d3 c0 90 e1 d3 00 00 00 00  ................
  backtrace:
    [<c0291a58>] kmem_cache_alloc_trace+0x1a4/0x2b0
    [<c09945ac>] viv_fence_create+0x3c/0x1ec
    [<c0962560>] gckOS_CreateNativeFence+0x7c/0x11c
    [<c096cfe8>] gckKERNEL_Dispatch+0x468/0x12d4
    [<c096e124>] gckDEVICE_Dispatch+0x2d0/0x2d4
    [<c0966614>] drv_ioctl+0x140/0x320
    [<c02b3c9c>] do_vfs_ioctl+0xc0/0x9c4
    [<c02b461c>] SyS_ioctl+0x7c/0x8c
    [<c01088c0>] ret_fast_syscall+0x0/0x48
    [<ffffffff>] 0xffffffff

However after updating some part of the driver to p9, that fence leak is gone but the main leak isn't.
See my patch here if you want to give it a try:
https://gist.github.com/gibsson/45921590256a95c5b868e37387e1909d

I confirm that this issue is gone with the p9.0.0_2.2.0_ga release, if you're interested, our partner Kynetics does have a release for Nitrogen platforms: https://www.kynetics.com/android-bsp/boundary-devices
I tried backporting p9 Vivante binaries to o8 but it's more difficult than expected as:
1- memory allocation mechanism changed
2- new binaries depend on libraries that don't exist in i.MX Oreo (libdrm_vivante, libdrm_android)

JeansHH · 2020-08-26T06:14:13Z

@gibsson Thank you such much for your effort. I really appreciate it. I will give your patch a try.

I will talk to my colleagues checking if we can migrate to android 9. Do you keep working on the issue?

Once again: Thank you such much!

JeansHH · 2020-08-26T13:29:28Z

The behaviour is much better with this patch

RomainNaour · 2020-09-23T13:15:04Z

Hello @gibsson,

I just tested with the latest kernel for Android 8 (boundary-imx-o8.0.0_1.0.0-ga branch)
5f75170
But the but is still present.

In the end, the nxp firmware o8.0.0_1.0.0-ga is buggy... what about o8.1.0_1.3.0 firmware ?
Can we use them instead ?

Best regards,
Romain

gibsson · 2020-09-23T13:21:53Z

Hi Romain,

Yes, there were 2 leaks before, that patch only fixes 1 of them (the one in the kernel).
The issue has been reproduced on NXP EVK platform with their prebuilt images so clearly comes from NXP release.
Not sure about the other Oreo releases, as none of them were GA for i.MX6Q.
To be honest I've tried porting the libraries from o8.1.0_1.3.0 to o8.0.0 but it's not as straightforward as it sounds since NXP changed its graphics libs to depend on libdrm which wasn't the case in o8.0.0.
All I can say is that the issue doesn't occur on Nougat nor Pie so we now strongly recommend not to use Oreo.

Regards,
Gary

RomainNaour · 2020-09-23T15:07:44Z

Hi Gary,

Thank you for your quick reply!
What about convincing NXP to do a fix release of these o8.0.0_1.0.0-ga firmware ?

Best regards,
Romain

gibsson · 2020-09-23T15:13:27Z

Hi Romain,

I've tried, with no luck. They recommended changing release as well.
You can try as well, it doesn't hurt, maybe that will make them change their mind.
Sorry for the inconvenience.

Regards,
Gary

RomainNaour · 2020-09-23T15:16:53Z

Hi Gary,

No problem, my customer asked me to ask this question.
We'll try on our side as well.

Thanks,
Romain

RomainNaour · 2020-09-24T07:25:29Z

Hi Gary,

Here is the link to the NXP community forum where the request has been posted yesterday:
https://community.nxp.com/t5/i-MX-Graphics/Memory-leak-spotted-on-i-MX6-with-Android-8/m-p/1157843#M15

Best regards,
Romain

gibsson · 2020-09-24T07:32:35Z

Hi Romain,
Thanks for creating that post. I've replied making sure to mention it was reproduced on SabreSD, otherwise the answer will be "it's because you don't use NXP platform".
Now, I think it would be best for you to share the apk there as well as repro steps and procedure to see the memory leak increase.
Thanks,
Gary

RomainNaour · 2020-09-24T09:26:38Z

Hi Gary,
Done:
https://community.nxp.com/t5/i-MX-Graphics/Memory-leak-spotted-on-i-MX6-with-Android-8/m-p/1158567/highlight/true#M17

Best regards,
Romain

RomainNaour · 2020-09-24T14:27:29Z

Hello @JeansHH @ep-skolberg,

Can you add a comment about your issue on the NXP forum?
It would help to convince NXP to take a look at our issue.

Thanks,
Romain

JeansHH · 2020-09-25T10:11:09Z

done

RomainNaour · 2020-10-08T15:44:19Z

Hello,

NXP did a test on Pixel mobile and reproduced the issue using our app.

https://community.nxp.com/t5/i-MX-Graphics/Memory-leak-spotted-on-i-MX6-with-Android-8/m-p/1161467/highlight/true#M21

So it's not clear if it's really a imx6 issue or not.

gibsson · 2020-10-08T15:48:45Z

Has anyone other than NXP been able to verify that claim?

RomainNaour · 2020-10-08T16:20:55Z

Not yet, I was looking at testing Android 8 on a potato board or a Rasperry-pi.

https://libre.computer/2018/09/27/android-release-for-tritium-and-le-potato

If the issue is really not related to NXP firmware but Android AOSP part, the issue should be reproducible with an emulator ?

Thanks again Gary!

RomainNaour · 2020-10-13T22:07:21Z

Hi Gary,

I tried to reproduce the leak using two different board: a RasperryPi 3 with LineageOS image (15.1 based on Android 8.0 but with a kernel 4.4), a Le Potato from the link above (Android 8.0.0, kernel 4.9.61).

I'm unable to reproduce the issue so far, even on Le Potato board using an Android image very close to the image provided by BoundaryDevice for the Sabrelite board.

I'm not sure how NXP is able to reproduce the issue on a Android Pixel phone...

Best regards,
Romain

gibsson · 2020-10-14T06:53:07Z

Hi,
Please share those findings on the community forum. To be honest, I don't believe the testing done on Pixel was correct.
Regards

grassead · 2020-10-17T19:10:12Z

Hi,

I reproduced this issue on my Pixel 2 (to a lesser extent) on android-8.0.0_r34 (the last aosp release for Pixel 2) running on kernel 4.4.56-g594d847d09a1.

18h12
walleye:/ # cat /proc/meminfo
Slab: 135940 kB
SReclaimable: 47344 kB
SUnreclaim: 88596 kB

20h24
walleye:/ # cat /proc/meminfo
Slab: 137952 kB
SReclaimable: 47604 kB
SUnreclaim: 90348 kB

Thanks,

gibsson · 2020-11-13T16:48:15Z

I am closing this issue as the kernel doesn't have any leak any longer.
Also, it has been proven that an update of Vivante libraries can fix the issue.

…le_activate [ Upstream commit 5808fec ] In case if isi.nr_pages is 0, we are making sis->pages (which is unsigned int) a huge value in iomap_swapfile_activate() by assigning -1. This could cause a kernel crash in kernel v4.18 (with below signature). Or could lead to unknown issues on latest kernel if the fake big swap gets used. Fix this issue by returning -EINVAL in case of nr_pages is 0, since it is anyway a invalid swapfile. Looks like this issue will be hit when we have pagesize < blocksize type of configuration. I was able to hit the issue in case of a tiny swap file with below test script. https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/scripts/swap-issue.sh kernel crash analysis on v4.18 ============================== On v4.18 kernel, it causes a kernel panic, since sis->pages becomes a huge value and isi.nr_extents is 0. When 0 is returned it is considered as a swapfile over NFS and SWP_FILE is set (sis->flags |= SWP_FILE). Then when swapoff was getting called it was calling a_ops->swap_deactivate() if (sis->flags & SWP_FILE) is true. Since a_ops->swap_deactivate() is NULL in case of XFS, it causes below panic. Panic signature on v4.18 kernel: ======================================= root@qemu:/home/qemu# [ 8291.723351] XFS (loop2): Unmounting Filesystem [ 8292.123104] XFS (loop2): Mounting V5 Filesystem [ 8292.132451] XFS (loop2): Ending clean mount [ 8292.263362] Adding 4294967232k swap on /mnt1/test/swapfile. Priority:-2 extents:1 across:274877906880k [ 8292.277834] Unable to handle kernel paging request for instruction fetch [ 8292.278677] Faulting instruction address: 0x00000000 cpu 0x19: Vector: 400 (Instruction Access) at [c0000009dd5b7ad0] pc: 0000000000000000 lr: c0000000003eb9dc: destroy_swap_extents+0xfc/0x120 sp: c0000009dd5b7d50 msr: 8000000040009033 current = 0xc0000009b6710080 paca = 0xc00000003ffcb280 irqmask: 0x03 irq_happened: 0x01 pid = 5604, comm = swapoff Linux version 4.18.0 (riteshh@xxxxxxx) (gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)) #57 SMP Wed Mar 3 01:33:04 CST 2021 enter ? for help [link register ] c0000000003eb9dc destroy_swap_extents+0xfc/0x120 [c0000009dd5b7d50] c0000000025a7058 proc_poll_event+0x0/0x4 (unreliable) [c0000009dd5b7da0] c0000000003f0498 sys_swapoff+0x3f8/0x910 [c0000009dd5b7e30] c00000000000bbe4 system_call+0x5c/0x70 Exception: c01 (System Call) at 00007ffff7d208d8 Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> [djwong: rework the comment to provide more details] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>

gibsson closed this as completed Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slab leak in Android Kernel #57

Slab leak in Android Kernel #57

grassead commented Apr 14, 2020

gibsson commented Apr 16, 2020

JeansHH commented Aug 12, 2020

ep-skolberg commented Aug 13, 2020

gibsson commented Aug 17, 2020

JeansHH commented Aug 20, 2020

gibsson commented Aug 21, 2020

gibsson commented Aug 25, 2020

JeansHH commented Aug 26, 2020

JeansHH commented Aug 26, 2020

RomainNaour commented Sep 23, 2020

gibsson commented Sep 23, 2020

RomainNaour commented Sep 23, 2020

gibsson commented Sep 23, 2020

RomainNaour commented Sep 23, 2020

RomainNaour commented Sep 24, 2020

gibsson commented Sep 24, 2020

RomainNaour commented Sep 24, 2020

RomainNaour commented Sep 24, 2020

JeansHH commented Sep 25, 2020

RomainNaour commented Oct 8, 2020

gibsson commented Oct 8, 2020

RomainNaour commented Oct 8, 2020

RomainNaour commented Oct 13, 2020

gibsson commented Oct 14, 2020

grassead commented Oct 17, 2020

gibsson commented Nov 13, 2020

Slab leak in Android Kernel #57

Slab leak in Android Kernel #57

Comments

grassead commented Apr 14, 2020

gibsson commented Apr 16, 2020

JeansHH commented Aug 12, 2020

ep-skolberg commented Aug 13, 2020

gibsson commented Aug 17, 2020

JeansHH commented Aug 20, 2020

gibsson commented Aug 21, 2020

gibsson commented Aug 25, 2020

JeansHH commented Aug 26, 2020

JeansHH commented Aug 26, 2020

RomainNaour commented Sep 23, 2020

gibsson commented Sep 23, 2020

RomainNaour commented Sep 23, 2020

gibsson commented Sep 23, 2020

RomainNaour commented Sep 23, 2020

RomainNaour commented Sep 24, 2020

gibsson commented Sep 24, 2020

RomainNaour commented Sep 24, 2020

RomainNaour commented Sep 24, 2020

JeansHH commented Sep 25, 2020

RomainNaour commented Oct 8, 2020

gibsson commented Oct 8, 2020

RomainNaour commented Oct 8, 2020

RomainNaour commented Oct 13, 2020

gibsson commented Oct 14, 2020

grassead commented Oct 17, 2020

gibsson commented Nov 13, 2020