Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BluetoothLeScannerImplOreo startScan PendingIntent causes leak that eventually fails all future scans #58

Closed
paulpv opened this issue Jul 20, 2019 · 14 comments · Fixed by #97 or #99
Labels

Comments

@paulpv
Copy link
Contributor

paulpv commented Jul 20, 2019

This problem might manifest itself with can't Register GATT client, MAX client reached: 32 output in an unfiltered log.

Please also refer to my StackOverflow question and self-answer:
https://stackoverflow.com/q/57104535/252308

Basically, BluetoothLeScannerImplOreo startScanInternal has the following code (I added the log line in my fork):

        final PendingIntent pendingIntent = createStartingPendingIntent(nonNullFilters,
                nonNullSettings, context, callbackIntent);
        Log.v(TAG, "#GATT startScanInternal: pendingIntent=" + pendingIntent);
        return scanner.startScan(nativeFilters, nativeSettings, pendingIntent);

stopScanInternal has the following code (log line added in my fork):

        final PendingIntent pendingIntent = createStoppingPendingIntent(context, callbackIntent);
        Log.v(TAG, "#GATT stopScanInternal: pendingIntent=" + pendingIntent);
        scanner.stopScan(pendingIntent);

Here is a log of my app starting and stopping a few scans:

2019-07-19 21:14:51.230 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.startScan(...)
2019-07-19 21:14:58.424 16905-16905/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: pendingIntent=PendingIntent{1dfc67: android.os.BinderProxy@e560a14}
2019-07-19 21:14:58.468 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(948)] GATT_Register 3c2e1fd9-d3fd-6897-a113-863aa50df781
2019-07-19 21:14:58.468 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(968)] allocated gatt_if=7
2019-07-19 21:14:58.478 16905-16905/com.testapp V/BleScannerManager: scan: #GATT bleScanner.startScan(...); result=SUCCESS(0)
2019-07-19 21:14:58.479 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.startScan(...)
2019-07-19 21:14:58.481 16905-16905/com.testapp E/BleScannerManager: #GATT bleScannerStartScanCount=1, elapsedMillisSinceBleScannerStartScanFirstTime=0
2019-07-19 21:15:03.521 1447-1447/? D/BtGatt.ScanManager: awakened up at time 614650173
2019-07-19 21:15:08.250 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.stopScan(...)
2019-07-19 21:15:08.255 16905-16905/com.testapp V/BleScannerImplOreo: #GATT stopScanInternal: pendingIntent=PendingIntent{dc4115f: android.os.BinderProxy@84087ac}
2019-07-19 21:15:08.259 1447-1982/? E/BtGatt.ContextMap: Context not found for info com.android.bluetooth.gatt.GattService$PendingIntentInfo@77d2d25
2019-07-19 21:15:08.260 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.stopScan(...)
2019-07-19 21:15:08.556 1447-1447/? D/BtGatt.ScanManager: awakened up at time 614655208
2019-07-19 21:15:08.570 1447-1554/? E/BtGatt.GattService: Exception: android.app.PendingIntent$CanceledException
2019-07-19 21:15:20.556 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.startScan(...)
2019-07-19 21:15:20.570 16905-16905/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: pendingIntent=PendingIntent{2ce86d6: android.os.BinderProxy@84087ac}
2019-07-19 21:15:20.575 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(948)] GATT_Register 6b444e39-d61a-1d92-52d6-afd6e901b56b
2019-07-19 21:15:20.575 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(968)] allocated gatt_if=8
2019-07-19 21:15:20.576 16905-16905/com.testapp V/BleScannerManager: scan: #GATT bleScanner.startScan(...); result=SUCCESS(0)
2019-07-19 21:15:20.576 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.startScan(...)
2019-07-19 21:15:20.576 16905-16905/com.testapp E/BleScannerManager: #GATT bleScannerStartScanCount=2, elapsedMillisSinceBleScannerStartScanFirstTime=22097
2019-07-19 21:15:23.264 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.stopScan(...)
2019-07-19 21:15:23.269 16905-16905/com.testapp V/BleScannerImplOreo: #GATT stopScanInternal: pendingIntent=PendingIntent{ebafcb0: android.os.BinderProxy@ebb6729}
2019-07-19 21:15:23.272 1447-1982/? E/BtGatt.ContextMap: Context not found for info com.android.bluetooth.gatt.GattService$PendingIntentInfo@3abaefa
2019-07-19 21:15:23.272 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.stopScan(...)
2019-07-19 21:15:24.968 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.startScan(...)
2019-07-19 21:15:24.980 16905-16905/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: pendingIntent=PendingIntent{8b1626b: android.os.BinderProxy@ebb6729}
2019-07-19 21:15:24.990 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(948)] GATT_Register 0bbb0b8f-2c63-dfc2-268d-fbfe39360ddf
2019-07-19 21:15:24.990 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(968)] allocated gatt_if=9
2019-07-19 21:15:24.990 16905-16905/com.testapp V/BleScannerManager: scan: #GATT bleScanner.startScan(...); result=SUCCESS(0)
2019-07-19 21:15:24.990 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.startScan(...)
2019-07-19 21:15:24.991 16905-16905/com.testapp E/BleScannerManager: #GATT bleScannerStartScanCount=3, elapsedMillisSinceBleScannerStartScanFirstTime=26511
2019-07-19 21:15:25.019 1447-1554/? E/BtGatt.GattService: Exception: android.app.PendingIntent$CanceledException
2019-07-19 21:15:25.795 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(948)] GATT_Register 32a50296-06f2-ed4d-f83c-25da059a4a19
2019-07-19 21:15:25.795 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(968)] allocated gatt_if=10
2019-07-19 21:15:26.046 1447-1447/? D/BtGatt.ScanManager: awakened up at time 614672699
2019-07-19 21:15:27.000 16905-16905/com.testapp V/BleScannerManager: scan: #GATT +bleScanner.stopScan(...)
2019-07-19 21:15:27.005 16905-16905/com.testapp V/BleScannerImplOreo: #GATT stopScanInternal: pendingIntent=PendingIntent{40e339d: android.os.BinderProxy@6340012}
2019-07-19 21:15:27.008 1447-1982/? E/BtGatt.ContextMap: Context not found for info com.android.bluetooth.gatt.GattService$PendingIntentInfo@96b43b4
2019-07-19 21:15:27.009 16905-16905/com.testapp V/BleScannerManager: scan: #GATT -bleScanner.stopScan(...)
2019-07-19 21:15:28.679 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(948)] GATT_Register f119757a-00ff-29b4-72e0-952c5e0db695
2019-07-19 21:15:28.679 1447-1941/? I/bt_stack: [INFO:gatt_api.cc(968)] allocated gatt_if=10
2019-07-19 21:15:28.810 1447-1447/? D/BtGatt.ScanManager: awakened up at time 614675462
2019-07-19 21:15:28.822 1447-1554/? E/BtGatt.GattService: Exception: android.app.PendingIntent$CanceledException

Notice that the pendingIntent= logged during a startScan and stopScan do not match.
This means that the code never actually stops any PendingIntent started scans.
The pendingIndent passed to stopScan must be the exact same one that was passed to startScan.

You can usually only start about 28, max of 32, scans before the OS blocks all future scans.
This repros on every API >= 26 phone I have tried.

I can look in to a fix and send a PR, but I wanted to open this issue to see if there is anything else to discuss.

@philips77
Copy link
Member

philips77 commented Jul 20, 2019

Thanks for the report. Although, it's weird, as I remember checking, that Android uses PendingIntent#equals(..) to compare then, and I made sure they are equal... I'll have to look into it.

@paulpv
Copy link
Contributor Author

paulpv commented Jul 21, 2019

@philips77 that may be true, and is what I read and thought myself, but only when using this library, when stopScan(PendingIntent) is called after startScan(PendingIntent), there are symptoms that the scanning is still occurring, the two biggest being:

  1. The OS logs a lot of BLE scanner events even though if this library's code worked there shouldn't be anything scanning (on a freshly rebooted devices known to have no other BLE scanner apps installed)
  2. The gatt_if=# continues to increment the next time startScan(PendingIntent) is called, a tell-tale of a "leak"

FYI, my repro project here is:
https://github.com/paulpv/AndroidBleStartScanPendingIntentLeak

If I tweak the code to not use this library, and use direct calls to the android API, there is no "leak".
I can toggle back and forth between using this library and direct API calls and 100% reliably get respectively either a leak or no leak.

I will continue to look in to this myself, including if the .equals is supposed to work and isn't for some reason.
I have a production app using this library and runs in to this after ~170 seconds for all of my users rendering the phone's BLE scanning useless (and thus also my app) until BLE is turned off and back on again.
Not sob-storying you, just saying that I have a vested interest to help you solve this problem...but the problem has been in my users' hands for over a month, so a few more days isn't going to do much more damage than I have already done.
I am tempted to revert back to using the ScanCallback non-PendingIntent version and push that live to my users, but I think I/we can fix this in a few days.

Thanks for looking in to this.

@paulpv
Copy link
Contributor Author

paulpv commented Jul 22, 2019

Just starting to look back in to this.
I added the following code to Oreo startScanInternal:

final PendingIntent pendingIntent = createStartingPendingIntent(nonNullFilters,
                nonNullSettings, context, callbackIntent);
Log.v(TAG, "#GATT startScanInternal: pendingIntent=" + pendingIntent);

final PendingIntent tempPendingIntentStop = createStoppingPendingIntent(context, callbackIntent);
Log.v(TAG, "#GATT startScanInternal: tempPendingIntentStop=" + pendingIntent);

boolean pendingIntentsEqual = pendingIntent.equals(tempPendingIntentStop);
Log.v(TAG, "#GATT startScanInternal: pendingIntentsEqual=" + pendingIntentsEqual);

The log output is:

#GATT startScanInternal: pendingIntent=PendingIntent{334586: android.os.BinderProxy@a79e47}
#GATT startScanInternal: tempPendingIntentStop=PendingIntent{b823674: android.os.BinderProxy@4c0b79d}
#GATT startScanInternal: pendingIntentsEqual=false

This definitely needs an androidTest backing it. :/

@paulpv
Copy link
Contributor Author

paulpv commented Jul 22, 2019

I might have done that wrong. It is not the PendingIntents that need to equal, it seems to be that it is the underlying Intents themselves, evaluated via https://developer.android.com/reference/android/content/Intent.html#filterEquals(android.content.Intent).
Looking in to that...

@paulpv
Copy link
Contributor Author

paulpv commented Jul 22, 2019

They do equal:

2019-07-22 13:49:50.499 32339-32339/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: startingIntent=Intent { act=no.nordicsemi.android.support.v18.ACTION_FOUND cmp=com.testapp/no.nordicsemi.android.support.v18.scanner.PendingIntentReceiver (has extras) }
2019-07-22 13:49:50.499 32339-32339/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: stoppingIntent=Intent { act=no.nordicsemi.android.support.v18.ACTION_FOUND cmp=com.testapp/no.nordicsemi.android.support.v18.scanner.PendingIntentReceiver }
2019-07-22 13:49:50.499 32339-32339/com.testapp V/BleScannerImplOreo: #GATT startScanInternal: intentsEqual=true

@paulpv
Copy link
Contributor Author

paulpv commented Jul 22, 2019

I believe I found the problem.
I think it has nothing to do with the Intent or PendingIntent equaling or not.
I think it is much simpler than that.

In Oreo's createStoppingPendingIntent I changed PendingIntent.FLAG_CANCEL_CURRENT to PendingIntent.FLAG_UPDATE_CURRENT.
Using FLAG_CANCEL_CURRENT there seems to be a leak and I can only stop and start scanning ~28 times before allocated gatt_if=32 is logged and the next start results in can't Register GATT client, MAX client reached: 32.
Using FLAG_UPDATE_CURRENT I have no problem stopping and re-starting scanning over 28 times.

@philips77
Copy link
Member

Fix released in 1.4.1.

@pc-bszabo
Copy link

pc-bszabo commented Nov 20, 2019

The problem is not resolved, with 1.4.1! The "listent" PendingIntent is not broadcasted anymore, that's true, but the Bluetooth internal scanning is still not stopped, here is the proof :" BtGatt.ScanManager: awakened up at time".
Test device : Pixel2 with Android 10

Problem found :
BluetoothLeScannerImplOreo -> createStartingPendingIntent(...) and createStoppingPendingIntent(...)
in both methods this line of code breaks the normal flow :

  final int id = callbackIntent.hashCode();

In this way the started pendingIntent hashCode() will be different than the stop pendingIntent hashCode()!! this is bad! it's a bug, cause the doc clarifies that the two intents must be the same, and in this library case are totally different.

Solution :
Add "requestCode" as an extra argument to createStartingPendingIntent(..., @NonNull final int requestCode) and createStoppingPendingIntent(..., @NonNull final int requestCode) methods, and use that for id.

  final int id = requestCode; //callbackIntent.hashCode();

And yes, the requestCode must be the same for start and stop.

@paulpv
Copy link
Contributor Author

paulpv commented Nov 21, 2019

Thanks @pc-bszabo
I had thought of that too but assumed the background for using the callbackIntent.hashCode() was sound and proven as long as the caller does pass the exact same callbackIntent, which there is little to no reason why a caller should intentionally not.

As a semi-unrelated but semi-related problem, it is also possible to entirely orphan a background PendingIntent scan by having one active and then killing the process (either intentionally or unintentionally, including updating while debugging), then the scan will automatically re-start the process and you will be unable to stop it even with direct calls to the OS API using the same requestCode. The OS API underneath seems to not be able to find any matching PendingIntent to cancel and stopping the PendingIntent just results in a no-op.
The only way to cancel in this state is to turn Bluetooth off and back on, or reboot.

To detect this state I persist the started scan's process id (PID) to SharedPreferences and clear it when the scan is stopped; on every onReceived the started PID saved to SharedPreferences is checked with the current PID, and if they do not match then I know this onReceived is from a newly started process that will be unable to stop the scan; the app may then choose to notify the user with a pseudo-friendly message/notification of their choice (something along the lines of "Your phone's Bluetooth is unstable; please turn Bluetooth off and back on, or reboot your phone".)

@philips77 philips77 reopened this Nov 21, 2019
@wcauchois
Copy link

Hi, I just want to chime in that I encountered this issue several months ago. I haven't caught up on the entirety of this conversation but I found that callbackIntent.hashCode() seemed unstable regardless of whether the Intent had the same fields. It's been a while, but I think I even tried diving into the Android source code and couldn't figure out why this was the case.

Since my app has only one background scan, I ended up forking this library and just manually passing "0" as the request code. You can see that commit here: barkinglabs@a923b70

That's the version of the library I'm currently using but I would love to see this issue fixed upstream and be able to kill my fork!

@wcauchois
Copy link

In the release notes for version 1.4.1 it says that this issue is fixed. However, I've just tried reproducing the issue on the latest version of the library (1.4.3) and I've confirmed that the issue is still there.

@wanglei0309
Copy link

@philips77 The latest version of the SDK still has this problem. I have to restart Bluetooth to solve it. I don't have a better solution at present, will I optimize this library in the future?

@philips77
Copy link
Member

Yes, we will release a patch, but have other priorities at the moment. I think creating a PR would be the quickest way if you want to contribute.

@philips77
Copy link
Member

The time has come, I started looking into this issue now. Just after releasing 1.4.4... :(

@philips77 philips77 added the bug label Jun 4, 2021
philips77 added a commit that referenced this issue Jun 7, 2021
@philips77 philips77 mentioned this issue Jun 7, 2021
@philips77 philips77 linked a pull request Jun 7, 2021 that will close this issue
@philips77 philips77 mentioned this issue Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants