Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
netmgrd-crash after reconnect to known wifi on Nexus 5X #334
Comments
thestinger
added
Type: bug
upstream
labels
Jul 8, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
thestinger
Jul 8, 2016
Contributor
This is a known upstream memory corruption bug. Since netmgrd is proprietary, it's not feasible for us to fix. This is OpenBSD malloc working as intended.
It makes sense to have an issue open tracking it though.
|
This is a known upstream memory corruption bug. Since netmgrd is proprietary, it's not feasible for us to fix. This is OpenBSD malloc working as intended. It makes sense to have an issue open tracking it though. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
Jul 9, 2016
07-10 01:37:42.950 15045 15120 F libc : Fatal signal 11 (SIGSEGV), code 2, fault addr 0x7ec786b000 in tid 15120 (netmgrd)
07-10 01:37:43.054 3740 3740 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-10 01:37:43.055 3740 3740 F DEBUG : Build fingerprint: 'Android/aosp_bullhead/bullhead:6.0.1/MTC19V/2016.07.03.03.18.57:user/release-keys'
07-10 01:37:43.056 4364 6343 W NativeCrashListener: Couldn't find ProcessRecord for pid 15045
07-10 01:37:43.056 3740 3740 F DEBUG : Revision: 'rev_1.0'
07-10 01:37:43.058 3740 3740 F DEBUG : ABI: 'arm64'
07-10 01:37:43.060 3740 3740 E DEBUG : AM write failed: Broken pipe
07-10 01:37:43.062 3740 3740 F DEBUG : pid: 15045, tid: 15120, name: netmgrd >>> /system/bin/netmgrd <<<
07-10 01:37:43.063 3740 3740 F DEBUG : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x7ec786b000
07-10 01:37:43.081 3740 3740 F DEBUG : x0 0000007ec786aad4 x1 0000007ec786b000 x2 dfdfdfdfdfdfdfdf x3 dfdfdfdfdfdfdfdf
07-10 01:37:43.082 3740 3740 F DEBUG : x4 dfdfdfdfffffffff x5 0000000000000000 x6 0000000000000000 x7 dededededededede
07-10 01:37:43.083 3740 3740 F DEBUG : x8 ffffffffffffffff x9 dededededededede x10 ffffffffffffffff x11 0101010101010101
07-10 01:37:43.084 3740 3740 F DEBUG : x12 0000007ed752f717 x13 0000000000000061 x14 000000007fffff9e x15 0000000000000000
07-10 01:37:43.084 3740 3740 F DEBUG : x16 0000007f6ef18d10 x17 0000007f6ee88be4 x18 0000007ed752f7a0 x19 0000007ed752f724
07-10 01:37:43.085 3740 3740 F DEBUG : x20 0000007ed752f828 x21 0000007ed752f730 x22 0000005555bf79da x23 00000000ffffffff
07-10 01:37:43.086 3740 3740 F DEBUG : x24 0000007ed752fdc8 x25 0000000000000000 x26 0000007ec786aad4 x27 00000000fffffff8
07-10 01:37:43.086 3740 3740 F DEBUG : x28 0000007ed752f770 x29 0000007ed752f5a0 x30 0000007f6eecb7bc
07-10 01:37:43.087 3740 3740 F DEBUG : sp 0000007ed752f5a0 pc 0000007f6ee88bf4 pstate 0000000060000000
07-10 01:37:43.096 3740 3740 F DEBUG :
07-10 01:37:43.096 3740 3740 F DEBUG : backtrace:
07-10 01:37:43.097 3740 3740 F DEBUG : #00 pc 000000000001abf4 /system/lib64/libc.so (strlen+16)
07-10 01:37:43.097 3740 3740 F DEBUG : #01 pc 000000000005d7b8 /system/lib64/libc.so (__vfprintf+8896)
07-10 01:37:43.098 3740 3740 F DEBUG : #02 pc 0000000000075cd4 /system/lib64/libc.so (snprintf+336)
07-10 01:37:43.099 3740 3740 F DEBUG : #03 pc 000000000005d124 /system/bin/netmgrd
07-10 01:37:43.099 3740 3740 F DEBUG : #04 pc 000000000005cd60 /system/bin/netmgrd
07-10 01:37:43.100 3740 3740 F DEBUG : #05 pc 000000000005d858 /system/bin/netmgrd
07-10 01:37:43.101 3740 3740 F DEBUG : #06 pc 000000000005dad8 /system/bin/netmgrd
07-10 01:37:43.101 3740 3740 F DEBUG : #07 pc 0000000000059b74 /system/bin/netmgrd
07-10 01:37:43.102 3740 3740 F DEBUG : #08 pc 0000000000059ba4 /system/bin/netmgrd
07-10 01:37:43.102 3740 3740 F DEBUG : #09 pc 00000000000595f8 /system/bin/netmgrd
07-10 01:37:43.103 3740 3740 F DEBUG : #10 pc 000000000002dea0 /system/bin/netmgrd
07-10 01:37:43.104 3740 3740 F DEBUG : #11 pc 0000000000037d38 /system/bin/netmgrd
07-10 01:37:43.104 3740 3740 F DEBUG : #12 pc 000000000003ecc0 /system/bin/netmgrd
07-10 01:37:43.105 3740 3740 F DEBUG : #13 pc 00000000000486c4 /system/bin/netmgrd
07-10 01:37:43.105 3740 3740 F DEBUG : #14 pc 000000000000b760 /vendor/lib64/libdsutils.so (stm2_process_input+492)
07-10 01:37:43.285 3740 3740 F DEBUG :
07-10 01:37:43.285 3740 3740 F DEBUG : Tombstone written to: /data/tombstones/tombstone_02
Same thing here. Can you link to the upstream bug?
I suspect this is the reason why battery dies very fast with Wi-Fi. Every time netmgrd dies (and it dies pretty often according to my logcat), it writes a tombstone. It's basically a loop of death.
Rudd-O
commented
Jul 9, 2016
Same thing here. Can you link to the upstream bug? I suspect this is the reason why battery dies very fast with Wi-Fi. Every time netmgrd dies (and it dies pretty often according to my logcat), it writes a tombstone. It's basically a loop of death. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
thestinger
Jul 9, 2016
Contributor
I haven't looked for an upstream bug report. I'm just pointing out that it is an upstream memory corruption bug, in Qualcomm's proprietary code. There might be a way to configure malloc for netmgrd to make the crashes less frequent, but netmgrd is still going to have a use-after-free bug.
|
I haven't looked for an upstream bug report. I'm just pointing out that it is an upstream memory corruption bug, in Qualcomm's proprietary code. There might be a way to configure malloc for netmgrd to make the crashes less frequent, but netmgrd is still going to have a use-after-free bug. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
commented
Jul 9, 2016
|
Well, fuck. Any phones free of this bullshit? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
thestinger
Jul 9, 2016
Contributor
Haven't run into any of these issues on NVIDIA's Tegra platform, but there aren't any Tegra Nexus phones, only the Nexus 9 and Pixel C. Qualcomm's code is full of these memory corruption bugs in normal code paths. It's hard for us to deal with it. If they would run their code with Valgrind and Address Sanitizer, they would find most of these issues themselves.
I somewhat doubt that they would do anything about bug reports stating that OpenBSD malloc uncovers use-after-free bugs in lots of their code.
For bugs that are actually in AOSP, it's often easy to debug them and submit patches: https://android.googlesource.com/platform/bootable/recovery.git/+/c5631fc09666a9542d2882299d40500d18d1f68c.
|
Haven't run into any of these issues on NVIDIA's Tegra platform, but there aren't any Tegra Nexus phones, only the Nexus 9 and Pixel C. Qualcomm's code is full of these memory corruption bugs in normal code paths. It's hard for us to deal with it. If they would run their code with Valgrind and Address Sanitizer, they would find most of these issues themselves. I somewhat doubt that they would do anything about bug reports stating that OpenBSD malloc uncovers use-after-free bugs in lots of their code. For bugs that are actually in AOSP, it's often easy to debug them and submit patches: https://android.googlesource.com/platform/bootable/recovery.git/+/c5631fc09666a9542d2882299d40500d18d1f68c. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
thestinger
Jul 9, 2016
Contributor
I'm not particularly good at debugging these issues... especially without access to source code. It's often pretty hard to figure out why the use-after-free is happening, unlike something like an out-of-bounds access where the code making the mistake is usually where the error is detected.
|
I'm not particularly good at debugging these issues... especially without access to source code. It's often pretty hard to figure out why the use-after-free is happening, unlike something like an out-of-bounds access where the code making the mistake is usually where the error is detected. |
thestinger
added this to the Release milestone
Jul 12, 2016
thestinger
added
the
Component: hardened malloc
label
Jul 12, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment|
Does this crash still happen for you? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
Aug 11, 2016
I would have to check. Since I rebooted the phone, I get no crashes, but once the phone begins getting hot, that's when I see those.
Rudd-O
commented
Aug 11, 2016
|
I would have to check. Since I rebooted the phone, I get no crashes, but once the phone begins getting hot, that's when I see those. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
Aug 11, 2016
Wait, I just checked. The crash is still happening reliably every 10 seconds or so. It happens just as WifiStateMachine says starting scan for "<current WIFI network>"WPA_PSK with 5180. Then SIGSEGV on netmgrd.
Then netmgrd restarts. Then there's an error message that says:
netmgrd W type=1400 audit(0.0:5522): avc: denied { read write } for name="diag" dev="tmpfs" ino=10218 scontext=u:r:netmgrd:s0 tcontext=u:object_r:diag_device:s0 tclass=chr_file permissive=0
These AVC messages precede netmgrd's death by a few seconds.
Rudd-O
commented
Aug 11, 2016
|
Wait, I just checked. The crash is still happening reliably every 10 seconds or so. It happens just as WifiStateMachine says Then netmgrd restarts. Then there's an error message that says:
These AVC messages precede netmgrd's death by a few seconds. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
Aug 11, 2016
Turning Wi-Fi off makes netmgrd no longer crash. However, the AVC denial messages persist.
Rudd-O
commented
Aug 11, 2016
|
Turning Wi-Fi off makes netmgrd no longer crash. However, the AVC denial messages persist. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Rudd-O
commented
Aug 11, 2016
|
(They only happen once more after WiFi is turned off.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
thestinger
Aug 18, 2016
Contributor
The avc denials are not a CopperheadOS issue, and they are probably not a bug at all. It's expected that there will be avc denials. Ideally, code can be changed to avoid trying something that won't succeed but it's not always true and disabling auditing of the denials is not always sensible.
|
The avc denials are not a CopperheadOS issue, and they are probably not a bug at all. It's expected that there will be avc denials. Ideally, code can be changed to avoid trying something that won't succeed but it's not always true and disabling auditing of the denials is not always sensible. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment|
This can be reopened if it ever reoccurs. |
octohex commentedJul 8, 2016
After disconnecting a known Wifi for some time and reconnecting to it later, the system becomes unresponsive when using the network. The first time i connect to the Wifi everything works as expected - it just occurs after the Wifi was not in use (e.g using LTE) and by reconnecting to it later.
A reboot fixes the issue. After examining the log, it seems like netmgrd crashes.
stipped logcat: https://github.com/octohex/logs/blob/master/netmgrd-crash
I run the latest update (MTC19Z.2016.07.07.02.16.02), but this did also occur on previous releases.