system reboots due to memory corruption detected by heap canaries and UAF detection in zygote #254

Closed
canary5 opened this Issue May 5, 2016 · 17 comments

Comments

Projects
None yet
2 participants
@canary5

canary5 commented May 5, 2016

I have almost daily systems restarts. Not sure when it started, i think from march.
This error in logs always:
Fatal signal 6 (SIGABRT), code -6
Do you need full adb?

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 5, 2016

Contributor

I need more information than that.

Contributor

thestinger commented May 5, 2016

I need more information than that.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 5, 2016

Contributor

The device you're using, logs, whether it's a pristine install (i.e. whether stuff like gapps has been sidedloaded), etc.

Contributor

thestinger commented May 5, 2016

The device you're using, logs, whether it's a pristine install (i.e. whether stuff like gapps has been sidedloaded), etc.

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 5, 2016

Device 5x, all version from March i think but not 100% sure. Clean Os no gapps, just few apps more installed. I have older log and todays, i will post little later

canary5 commented May 5, 2016

Device 5x, all version from March i think but not 100% sure. Clean Os no gapps, just few apps more installed. I have older log and todays, i will post little later

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 5, 2016

Sent logs by emeil

canary5 commented May 5, 2016

Sent logs by emeil

@thestinger thestinger added the upstream label May 5, 2016

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 5, 2016

Contributor

Appears to be an upstream bug detected by OpenBSD malloc then. Not much that can be done about it because I don't have the resources to consider those in-scope right now. You can turn off canaries in Settings -> Security -> Advanced and it will probably turn back into silent memory corruption like stock rather than being detected.

Contributor

thestinger commented May 5, 2016

Appears to be an upstream bug detected by OpenBSD malloc then. Not much that can be done about it because I don't have the resources to consider those in-scope right now. You can turn off canaries in Settings -> Security -> Advanced and it will probably turn back into silent memory corruption like stock rather than being detected.

@thestinger thestinger removed the unconfirmed label May 5, 2016

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 5, 2016

But those security options the reason me using Cos;)) but before march it never happened. March updates fixed overall performance and battery, so i can live one system reboot daily. If need more logs every reboot i can provide

canary5 commented May 5, 2016

But those security options the reason me using Cos;)) but before march it never happened. March updates fixed overall performance and battery, so i can live one system reboot daily. If need more logs every reboot i can provide

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 5, 2016

Contributor

The security features in the Settings app are only the ones deemed too expensive performance-wise to enable by default. Only a few features are exposed there. Most of the security features are already there by default.

CopperheadOS follows the upstream releases, which is where the performance and battery life improvements came from. Apparently, the new branch also introduced this memory corruption issue. It's not a bug in our code and we don't have the resources to address upstream bugs in Android. I was doing it before, but I no longer want to spend my free time on that.

Contributor

thestinger commented May 5, 2016

The security features in the Settings app are only the ones deemed too expensive performance-wise to enable by default. Only a few features are exposed there. Most of the security features are already there by default.

CopperheadOS follows the upstream releases, which is where the performance and battery life improvements came from. Apparently, the new branch also introduced this memory corruption issue. It's not a bug in our code and we don't have the resources to address upstream bugs in Android. I was doing it before, but I no longer want to spend my free time on that.

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 5, 2016

Ok, tnx for great work. So those 2 canaries options need to be disabled right to fix memory corruprion? And its just 5x or 6p problems as well?

canary5 commented May 5, 2016

Ok, tnx for great work. So those 2 canaries options need to be disabled right to fix memory corruprion? And its just 5x or 6p problems as well?

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 5, 2016

Contributor

Disabling the canary option (just that single one) is enough to avoid the issue from your logs. It won't fix the memory corruption. That's an upstream bug that is going to occur regardless. It's just not going to make the process abort, since the feature that's able to detect it will be disabled.

Contributor

thestinger commented May 5, 2016

Disabling the canary option (just that single one) is enough to avoid the issue from your logs. It won't fix the memory corruption. That's an upstream bug that is going to occur regardless. It's just not going to make the process abort, since the feature that's able to detect it will be disabled.

@thestinger thestinger changed the title from system reboots to system reboots due to memory corruption detected by heap canaries May 7, 2016

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 7, 2016

I disabled it, but got another reboot today. Log sent by email

canary5 commented May 7, 2016

I disabled it, but got another reboot today. Log sent by email

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 7, 2016

Contributor

That's another memory corruption issue from upstream (use-after-free). Not much that I can do about all of these issues.

Contributor

thestinger commented May 7, 2016

That's another memory corruption issue from upstream (use-after-free). Not much that I can do about all of these issues.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 7, 2016

Contributor

Logs aren't enough to fix issues like this. It gives a backtrace from where it was detected (the free causing the allocation that was used after free to be moved from the quarantine), not where the issue actually occurred. It's designed as a hardening feature, not a debugging one. The same thing applies to canaries: it finds out on free, not when the canary was overwritten. There's very little that can be done without setting up debugging, but I can't reproduce the issues so there's nothing to do.

Contributor

thestinger commented May 7, 2016

Logs aren't enough to fix issues like this. It gives a backtrace from where it was detected (the free causing the allocation that was used after free to be moved from the quarantine), not where the issue actually occurred. It's designed as a hardening feature, not a debugging one. The same thing applies to canaries: it finds out on free, not when the canary was overwritten. There's very little that can be done without setting up debugging, but I can't reproduce the issues so there's nothing to do.

@thestinger thestinger changed the title from system reboots due to memory corruption detected by heap canaries to system reboots due to memory corruption detected by heap canaries and UAF detection in zygote May 7, 2016

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 8, 2016

Contributor

Ideally the root causes of these issues could be identified, so it could reproduced. If it's a rare issue that's infeasible to reproduce, there's little hope of a fix.

Contributor

thestinger commented May 8, 2016

Ideally the root causes of these issues could be identified, so it could reproduced. If it's a rare issue that's infeasible to reproduce, there's little hope of a fix.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 8, 2016

Contributor

(at least without trying to find it with ASan or Valgrind, but the canaries can be more precise... yet they aren't going to have enough debugging information to work from in such large processes)

Contributor

thestinger commented May 8, 2016

(at least without trying to find it with ASan or Valgrind, but the canaries can be more precise... yet they aren't going to have enough debugging information to work from in such large processes)

@canary5

This comment has been minimized.

Show comment Hide comment
@canary5

canary5 May 12, 2016

What is recommended quarantine size, that new option. Is it related to reboots?

canary5 commented May 12, 2016

What is recommended quarantine size, that new option. Is it related to reboots?

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 12, 2016

Contributor

It controls the memory usage dedicated to the quarantine. It's set to 32 by default, and only setting it to 0 would eliminate it as a bug finding feature completely. None of these switches is meant to alter stability: there shouldn't be so many memory corruption bugs in normal code paths, and it needs to be fixed upstream ASAP.

Contributor

thestinger commented May 12, 2016

It controls the memory usage dedicated to the quarantine. It's set to 32 by default, and only setting it to 0 would eliminate it as a bug finding feature completely. None of these switches is meant to alter stability: there shouldn't be so many memory corruption bugs in normal code paths, and it needs to be fixed upstream ASAP.

@thestinger

This comment has been minimized.

Show comment Hide comment
@thestinger

thestinger May 13, 2016

Contributor

Fixing all memory corruption issues in Android is out-of-scope. I'm willing to do some work on it, but I need reproducible bugs, i.e. I need to know how to trigger this. It's not a bug in anything CopperheadOS added, the feature is working as intended (triggering aborts / faults when memory corruption happens).

Contributor

thestinger commented May 13, 2016

Fixing all memory corruption issues in Android is out-of-scope. I'm willing to do some work on it, but I need reproducible bugs, i.e. I need to know how to trigger this. It's not a bug in anything CopperheadOS added, the feature is working as intended (triggering aborts / faults when memory corruption happens).

@thestinger thestinger closed this May 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment