netmgrd-crash after reconnect to known wifi on Nexus 5X #334

Closed
octohex opened this Issue Jul 8, 2016 · 13 comments

Comments

Projects
None yet
3 participants
@octohex

octohex commented Jul 8, 2016

After disconnecting a known Wifi for some time and reconnecting to it later, the system becomes unresponsive when using the network. The first time i connect to the Wifi everything works as expected - it just occurs after the Wifi was not in use (e.g using LTE) and by reconnecting to it later.
A reboot fixes the issue. After examining the log, it seems like netmgrd crashes.

stipped logcat: https://github.com/octohex/logs/blob/master/netmgrd-crash

I run the latest update (MTC19Z.2016.07.07.02.16.02), but this did also occur on previous releases.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Jul 8, 2016

Contributor

This is a known upstream memory corruption bug. Since netmgrd is proprietary, it's not feasible for us to fix. This is OpenBSD malloc working as intended.

It makes sense to have an issue open tracking it though.

Contributor

thestinger commented Jul 8, 2016

This is a known upstream memory corruption bug. Since netmgrd is proprietary, it's not feasible for us to fix. This is OpenBSD malloc working as intended.

It makes sense to have an issue open tracking it though.

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Jul 9, 2016

07-10 01:37:42.950 15045 15120 F libc    : Fatal signal 11 (SIGSEGV), code 2, fault addr 0x7ec786b000 in tid 15120 (netmgrd)
07-10 01:37:43.054  3740  3740 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-10 01:37:43.055  3740  3740 F DEBUG   : Build fingerprint: 'Android/aosp_bullhead/bullhead:6.0.1/MTC19V/2016.07.03.03.18.57:user/release-keys'
07-10 01:37:43.056  4364  6343 W NativeCrashListener: Couldn't find ProcessRecord for pid 15045
07-10 01:37:43.056  3740  3740 F DEBUG   : Revision: 'rev_1.0'
07-10 01:37:43.058  3740  3740 F DEBUG   : ABI: 'arm64'
07-10 01:37:43.060  3740  3740 E DEBUG   : AM write failed: Broken pipe
07-10 01:37:43.062  3740  3740 F DEBUG   : pid: 15045, tid: 15120, name: netmgrd  >>> /system/bin/netmgrd <<<
07-10 01:37:43.063  3740  3740 F DEBUG   : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x7ec786b000
07-10 01:37:43.081  3740  3740 F DEBUG   :     x0   0000007ec786aad4  x1   0000007ec786b000  x2   dfdfdfdfdfdfdfdf  x3   dfdfdfdfdfdfdfdf
07-10 01:37:43.082  3740  3740 F DEBUG   :     x4   dfdfdfdfffffffff  x5   0000000000000000  x6   0000000000000000  x7   dededededededede
07-10 01:37:43.083  3740  3740 F DEBUG   :     x8   ffffffffffffffff  x9   dededededededede  x10  ffffffffffffffff  x11  0101010101010101
07-10 01:37:43.084  3740  3740 F DEBUG   :     x12  0000007ed752f717  x13  0000000000000061  x14  000000007fffff9e  x15  0000000000000000
07-10 01:37:43.084  3740  3740 F DEBUG   :     x16  0000007f6ef18d10  x17  0000007f6ee88be4  x18  0000007ed752f7a0  x19  0000007ed752f724
07-10 01:37:43.085  3740  3740 F DEBUG   :     x20  0000007ed752f828  x21  0000007ed752f730  x22  0000005555bf79da  x23  00000000ffffffff
07-10 01:37:43.086  3740  3740 F DEBUG   :     x24  0000007ed752fdc8  x25  0000000000000000  x26  0000007ec786aad4  x27  00000000fffffff8
07-10 01:37:43.086  3740  3740 F DEBUG   :     x28  0000007ed752f770  x29  0000007ed752f5a0  x30  0000007f6eecb7bc
07-10 01:37:43.087  3740  3740 F DEBUG   :     sp   0000007ed752f5a0  pc   0000007f6ee88bf4  pstate 0000000060000000
07-10 01:37:43.096  3740  3740 F DEBUG   : 
07-10 01:37:43.096  3740  3740 F DEBUG   : backtrace:
07-10 01:37:43.097  3740  3740 F DEBUG   :     #00 pc 000000000001abf4  /system/lib64/libc.so (strlen+16)
07-10 01:37:43.097  3740  3740 F DEBUG   :     #01 pc 000000000005d7b8  /system/lib64/libc.so (__vfprintf+8896)
07-10 01:37:43.098  3740  3740 F DEBUG   :     #02 pc 0000000000075cd4  /system/lib64/libc.so (snprintf+336)
07-10 01:37:43.099  3740  3740 F DEBUG   :     #03 pc 000000000005d124  /system/bin/netmgrd
07-10 01:37:43.099  3740  3740 F DEBUG   :     #04 pc 000000000005cd60  /system/bin/netmgrd
07-10 01:37:43.100  3740  3740 F DEBUG   :     #05 pc 000000000005d858  /system/bin/netmgrd
07-10 01:37:43.101  3740  3740 F DEBUG   :     #06 pc 000000000005dad8  /system/bin/netmgrd
07-10 01:37:43.101  3740  3740 F DEBUG   :     #07 pc 0000000000059b74  /system/bin/netmgrd
07-10 01:37:43.102  3740  3740 F DEBUG   :     #08 pc 0000000000059ba4  /system/bin/netmgrd
07-10 01:37:43.102  3740  3740 F DEBUG   :     #09 pc 00000000000595f8  /system/bin/netmgrd
07-10 01:37:43.103  3740  3740 F DEBUG   :     #10 pc 000000000002dea0  /system/bin/netmgrd
07-10 01:37:43.104  3740  3740 F DEBUG   :     #11 pc 0000000000037d38  /system/bin/netmgrd
07-10 01:37:43.104  3740  3740 F DEBUG   :     #12 pc 000000000003ecc0  /system/bin/netmgrd
07-10 01:37:43.105  3740  3740 F DEBUG   :     #13 pc 00000000000486c4  /system/bin/netmgrd
07-10 01:37:43.105  3740  3740 F DEBUG   :     #14 pc 000000000000b760  /vendor/lib64/libdsutils.so (stm2_process_input+492)
07-10 01:37:43.285  3740  3740 F DEBUG   : 
07-10 01:37:43.285  3740  3740 F DEBUG   : Tombstone written to: /data/tombstones/tombstone_02

Same thing here. Can you link to the upstream bug?

I suspect this is the reason why battery dies very fast with Wi-Fi. Every time netmgrd dies (and it dies pretty often according to my logcat), it writes a tombstone. It's basically a loop of death.

Rudd-O commented Jul 9, 2016

07-10 01:37:42.950 15045 15120 F libc    : Fatal signal 11 (SIGSEGV), code 2, fault addr 0x7ec786b000 in tid 15120 (netmgrd)
07-10 01:37:43.054  3740  3740 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-10 01:37:43.055  3740  3740 F DEBUG   : Build fingerprint: 'Android/aosp_bullhead/bullhead:6.0.1/MTC19V/2016.07.03.03.18.57:user/release-keys'
07-10 01:37:43.056  4364  6343 W NativeCrashListener: Couldn't find ProcessRecord for pid 15045
07-10 01:37:43.056  3740  3740 F DEBUG   : Revision: 'rev_1.0'
07-10 01:37:43.058  3740  3740 F DEBUG   : ABI: 'arm64'
07-10 01:37:43.060  3740  3740 E DEBUG   : AM write failed: Broken pipe
07-10 01:37:43.062  3740  3740 F DEBUG   : pid: 15045, tid: 15120, name: netmgrd  >>> /system/bin/netmgrd <<<
07-10 01:37:43.063  3740  3740 F DEBUG   : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x7ec786b000
07-10 01:37:43.081  3740  3740 F DEBUG   :     x0   0000007ec786aad4  x1   0000007ec786b000  x2   dfdfdfdfdfdfdfdf  x3   dfdfdfdfdfdfdfdf
07-10 01:37:43.082  3740  3740 F DEBUG   :     x4   dfdfdfdfffffffff  x5   0000000000000000  x6   0000000000000000  x7   dededededededede
07-10 01:37:43.083  3740  3740 F DEBUG   :     x8   ffffffffffffffff  x9   dededededededede  x10  ffffffffffffffff  x11  0101010101010101
07-10 01:37:43.084  3740  3740 F DEBUG   :     x12  0000007ed752f717  x13  0000000000000061  x14  000000007fffff9e  x15  0000000000000000
07-10 01:37:43.084  3740  3740 F DEBUG   :     x16  0000007f6ef18d10  x17  0000007f6ee88be4  x18  0000007ed752f7a0  x19  0000007ed752f724
07-10 01:37:43.085  3740  3740 F DEBUG   :     x20  0000007ed752f828  x21  0000007ed752f730  x22  0000005555bf79da  x23  00000000ffffffff
07-10 01:37:43.086  3740  3740 F DEBUG   :     x24  0000007ed752fdc8  x25  0000000000000000  x26  0000007ec786aad4  x27  00000000fffffff8
07-10 01:37:43.086  3740  3740 F DEBUG   :     x28  0000007ed752f770  x29  0000007ed752f5a0  x30  0000007f6eecb7bc
07-10 01:37:43.087  3740  3740 F DEBUG   :     sp   0000007ed752f5a0  pc   0000007f6ee88bf4  pstate 0000000060000000
07-10 01:37:43.096  3740  3740 F DEBUG   : 
07-10 01:37:43.096  3740  3740 F DEBUG   : backtrace:
07-10 01:37:43.097  3740  3740 F DEBUG   :     #00 pc 000000000001abf4  /system/lib64/libc.so (strlen+16)
07-10 01:37:43.097  3740  3740 F DEBUG   :     #01 pc 000000000005d7b8  /system/lib64/libc.so (__vfprintf+8896)
07-10 01:37:43.098  3740  3740 F DEBUG   :     #02 pc 0000000000075cd4  /system/lib64/libc.so (snprintf+336)
07-10 01:37:43.099  3740  3740 F DEBUG   :     #03 pc 000000000005d124  /system/bin/netmgrd
07-10 01:37:43.099  3740  3740 F DEBUG   :     #04 pc 000000000005cd60  /system/bin/netmgrd
07-10 01:37:43.100  3740  3740 F DEBUG   :     #05 pc 000000000005d858  /system/bin/netmgrd
07-10 01:37:43.101  3740  3740 F DEBUG   :     #06 pc 000000000005dad8  /system/bin/netmgrd
07-10 01:37:43.101  3740  3740 F DEBUG   :     #07 pc 0000000000059b74  /system/bin/netmgrd
07-10 01:37:43.102  3740  3740 F DEBUG   :     #08 pc 0000000000059ba4  /system/bin/netmgrd
07-10 01:37:43.102  3740  3740 F DEBUG   :     #09 pc 00000000000595f8  /system/bin/netmgrd
07-10 01:37:43.103  3740  3740 F DEBUG   :     #10 pc 000000000002dea0  /system/bin/netmgrd
07-10 01:37:43.104  3740  3740 F DEBUG   :     #11 pc 0000000000037d38  /system/bin/netmgrd
07-10 01:37:43.104  3740  3740 F DEBUG   :     #12 pc 000000000003ecc0  /system/bin/netmgrd
07-10 01:37:43.105  3740  3740 F DEBUG   :     #13 pc 00000000000486c4  /system/bin/netmgrd
07-10 01:37:43.105  3740  3740 F DEBUG   :     #14 pc 000000000000b760  /vendor/lib64/libdsutils.so (stm2_process_input+492)
07-10 01:37:43.285  3740  3740 F DEBUG   : 
07-10 01:37:43.285  3740  3740 F DEBUG   : Tombstone written to: /data/tombstones/tombstone_02

Same thing here. Can you link to the upstream bug?

I suspect this is the reason why battery dies very fast with Wi-Fi. Every time netmgrd dies (and it dies pretty often according to my logcat), it writes a tombstone. It's basically a loop of death.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Jul 9, 2016

Contributor

I haven't looked for an upstream bug report. I'm just pointing out that it is an upstream memory corruption bug, in Qualcomm's proprietary code. There might be a way to configure malloc for netmgrd to make the crashes less frequent, but netmgrd is still going to have a use-after-free bug.

Contributor

thestinger commented Jul 9, 2016

I haven't looked for an upstream bug report. I'm just pointing out that it is an upstream memory corruption bug, in Qualcomm's proprietary code. There might be a way to configure malloc for netmgrd to make the crashes less frequent, but netmgrd is still going to have a use-after-free bug.

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Jul 9, 2016

Well, fuck.

Any phones free of this bullshit?

Rudd-O commented Jul 9, 2016

Well, fuck.

Any phones free of this bullshit?

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Jul 9, 2016

Contributor

Haven't run into any of these issues on NVIDIA's Tegra platform, but there aren't any Tegra Nexus phones, only the Nexus 9 and Pixel C. Qualcomm's code is full of these memory corruption bugs in normal code paths. It's hard for us to deal with it. If they would run their code with Valgrind and Address Sanitizer, they would find most of these issues themselves.

I somewhat doubt that they would do anything about bug reports stating that OpenBSD malloc uncovers use-after-free bugs in lots of their code.

For bugs that are actually in AOSP, it's often easy to debug them and submit patches: https://android.googlesource.com/platform/bootable/recovery.git/+/c5631fc09666a9542d2882299d40500d18d1f68c.

Contributor

thestinger commented Jul 9, 2016

Haven't run into any of these issues on NVIDIA's Tegra platform, but there aren't any Tegra Nexus phones, only the Nexus 9 and Pixel C. Qualcomm's code is full of these memory corruption bugs in normal code paths. It's hard for us to deal with it. If they would run their code with Valgrind and Address Sanitizer, they would find most of these issues themselves.

I somewhat doubt that they would do anything about bug reports stating that OpenBSD malloc uncovers use-after-free bugs in lots of their code.

For bugs that are actually in AOSP, it's often easy to debug them and submit patches: https://android.googlesource.com/platform/bootable/recovery.git/+/c5631fc09666a9542d2882299d40500d18d1f68c.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Jul 9, 2016

Contributor

I'm not particularly good at debugging these issues... especially without access to source code. It's often pretty hard to figure out why the use-after-free is happening, unlike something like an out-of-bounds access where the code making the mistake is usually where the error is detected.

Contributor

thestinger commented Jul 9, 2016

I'm not particularly good at debugging these issues... especially without access to source code. It's often pretty hard to figure out why the use-after-free is happening, unlike something like an out-of-bounds access where the code making the mistake is usually where the error is detected.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Aug 11, 2016

Contributor

Does this crash still happen for you?

Contributor

thestinger commented Aug 11, 2016

Does this crash still happen for you?

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Aug 11, 2016

I would have to check. Since I rebooted the phone, I get no crashes, but once the phone begins getting hot, that's when I see those.

Rudd-O commented Aug 11, 2016

I would have to check. Since I rebooted the phone, I get no crashes, but once the phone begins getting hot, that's when I see those.

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Aug 11, 2016

Wait, I just checked. The crash is still happening reliably every 10 seconds or so. It happens just as WifiStateMachine says starting scan for "<current WIFI network>"WPA_PSK with 5180. Then SIGSEGV on netmgrd.

Then netmgrd restarts. Then there's an error message that says:

netmgrd W type=1400 audit(0.0:5522): avc: denied { read write } for name="diag" dev="tmpfs" ino=10218 scontext=u:r:netmgrd:s0 tcontext=u:object_r:diag_device:s0 tclass=chr_file permissive=0

These AVC messages precede netmgrd's death by a few seconds.

Rudd-O commented Aug 11, 2016

Wait, I just checked. The crash is still happening reliably every 10 seconds or so. It happens just as WifiStateMachine says starting scan for "<current WIFI network>"WPA_PSK with 5180. Then SIGSEGV on netmgrd.

Then netmgrd restarts. Then there's an error message that says:

netmgrd W type=1400 audit(0.0:5522): avc: denied { read write } for name="diag" dev="tmpfs" ino=10218 scontext=u:r:netmgrd:s0 tcontext=u:object_r:diag_device:s0 tclass=chr_file permissive=0

These AVC messages precede netmgrd's death by a few seconds.

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Aug 11, 2016

Turning Wi-Fi off makes netmgrd no longer crash. However, the AVC denial messages persist.

Rudd-O commented Aug 11, 2016

Turning Wi-Fi off makes netmgrd no longer crash. However, the AVC denial messages persist.

@Rudd-O

This comment has been minimized.

Show comment Hide comment
@Rudd-O

Rudd-O Aug 11, 2016

(They only happen once more after WiFi is turned off.)

Rudd-O commented Aug 11, 2016

(They only happen once more after WiFi is turned off.)

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Aug 18, 2016

Contributor

The avc denials are not a CopperheadOS issue, and they are probably not a bug at all. It's expected that there will be avc denials. Ideally, code can be changed to avoid trying something that won't succeed but it's not always true and disabling auditing of the denials is not always sensible.

Contributor

thestinger commented Aug 18, 2016

The avc denials are not a CopperheadOS issue, and they are probably not a bug at all. It's expected that there will be avc denials. Ideally, code can be changed to avoid trying something that won't succeed but it's not always true and disabling auditing of the denials is not always sensible.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger Oct 30, 2016

Contributor

This can be reopened if it ever reoccurs.

Contributor

thestinger commented Oct 30, 2016

This can be reopened if it ever reoccurs.

@thestinger thestinger closed this Oct 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment