Android JNI code leaks local references in some cases #175
The current Android JNI code does not explicitly delete all of the local references it creates. This works in cases where DetachLocalThread is called because that function will delete all local references that were created. However, in cases where DetachLocalThread is not called, the local references accumulate, which eventually causes the process to abort. Sample code and crash are below.
While fixing that issue, I noticed that the JNI method IDs were being looked up every time through ares_get_android_server_list. Those can be cached, so I also moved those lookups to ares_library_init_android. See also: https://www.ibm.com/developerworks/library/j-jni/#notc
This does change the behavior of ares_library_init_android slightly. For older Android API versions that don't have these classes or methods, previously ares_library_init_android would return ARES_SUCCESS (although subsequent calls to ares_get_android_server_list would always return NULL). With this change, ares_library_init_android will return ARES_ENOTINITIALIZED (ares_get_android_server_list continues to return NULL). That seems more truthful to me, but we can keep the original semantics if needed.
This code snippet will crash when run in an Android app:
`/* ares_library_init, ares_library_init_jvm, and ares_library_android_init must be called first. */
res = (*android_jvm)->GetEnv(android_jvm, (void **)&env, JNI_VERSION_1_6);
for (i = 0; i < 1000; i++)
The crash will look something like this:
JNI ERROR (app bug): local reference table overflow (max=512)
Some of the references are indeed not being cleaned up and the code is not trying to rely on detach thread for clean up. That is an issue that needs to be resolved.
Caching is certainly possible. Honestly, I didn't do it because and ares_init(_options) should be called so infrequently the performance penalty would be negligible. More work and complexity for little real benefit. One channel should be used throughout an applications life cycle except when there is an event that can cause the name servers to change. Like switching from LTE to wifi... I’ll concede that this can happen often on Android so caching might end up being be a benefit.
The issue I see with the initialization returning ARES_ENOTINITIALIZED is an app running on an older Android version will see that return and fail to start. Or will error saying it can’t properly run. Or be in some other bad situation because it doesn’t think it can do DNS lookups even though it can using __system_property_get.
I’d rather only have ARES_ENOTINITIALIZED on a true failure where DNS cannot be used period. It’s currently returning success because the fallback to __system_property_get will most likely work on older versions that don’t have some of the needed ConnectivityManager functions.
Checking if it’s a true failure would be sufficient. Such as if there is a failure in the initialization code, check if __system_property_get will return DNS servers and only return ARES_ENOTINITIALIZED if it doesn’t. I think it's more misleading to reaturn ARES_ENOTINITIALIZED when setting up Android even though DNS servers can still be queried.
Other than the initialize return causing problems if the fallback method can still be used I don’t see anything wrong with the code.
Thanks very much for reviewing.
Curl calls ares_init every time an easy handle is created (https://github.com/curl/curl/blob/master/lib/asyn-ares.c#L122), so an application that does large numbers of transfers using Curl is going to benefit as well.
That's fair, but unfortunately the flip side is also true. Returning ARES_SUCCESS to an app running on a newer Android when one of the FindClass or GetMethodID calls fails will cause it to continue even though its DNS lookups will definitely fail. Admittedly, it should be very uncommon for those to fail on newer Android.
Agreed, it would be better if ARES_ENOTINITIALIZED was returned if and only if permanent failure is confirmed, but that's a bigger change than I'd like to make as part of this pull request. I've added another commit to this request that keeps the current semantics. Just to reiterate, the current semantics have the opposite problem, that ARES_SUCCESS may be returned when in fact there is a permanent failure, but I'm fine with keeping that behavior for now.
* Add Google LLC to AUTHORS. * android: Explicitly delete all JNI local references, and cache JNI method IDs at initialization. * android: Only return ARES_ENOTINITIALIZED on failures in initialization code.