Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow android Dispatchers.Main init #878

Closed
Tolriq opened this issue Dec 8, 2018 · 49 comments

Comments

@Tolriq
Copy link

commented Dec 8, 2018

It seems Android Main dispatcher initialization / discovery can be quite slow currently.

On a Nokia 8 device my application startup time went from 250ms to 500ms (on release mode). I know that 250ms does not sound much but it's pretty much 2 times slower startup time and it does impact the users as they often want the fastest possible startup time for this kind of application. And I still have to see on older devices if the impact is 250ms or linear with the device power.

See attached flamechart (It's from the debug app with full trace profiling so the times are insanely slow but was necessary to find the issue).

coroutine flame chart

Is there a way to bypass that slow factory discovery and statically create my own Main dispatcher on Android to avoid that slowdown?

@JakeWharton

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

ServiceLoader should definitely be avoided on Android. I filed https://issuetracker.google.com/issues/120436373 recently to rewrite it in release builds.

@elizarov

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

This is really strange. I see lot of time spent on reading manifest, but we do not use manifest to initialize Dispatchers.Main.

Is there any chance you can compare loading an app that uses Dispatchers.Main (dynamically loaded) vs static reference to Handler(Looper.getMainLooper()).asCoroutineDispatcher() to see where exactly this additional time comes from?

UPDATE: Now I finally saw what is going on... OMG.

What would be the options that we have to avoid ServiceLoader? One option is that we can package a separate android-specific jar-artifact that comes pre-bundled with Android Main dispatcher, but then it'll be a pain to use with coroutine-using non-android specific libraries.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 8, 2018

Does Handler(Looper.getMainLooper()).asCoroutineDispatcher() provide the exact same thing?

https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/HandlerDispatcher.kt seems to do a lot more.

Looking for temporary workaround as I can't really rollback now :)

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

What would be the options that we have to avoid ServiceLoader?

Class.forName :sigh:

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

Does Handler(Looper.getMainLooper()).asCoroutineDispatcher() provide the exact same thing?

More or less, modulo createAsync

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 9, 2018

Did a quick test by duplicating the original code and making the Main public.
(I don't know why but I was not able to use the usual package trick to access still some Kotlin mysteries)

The gain is more in the 100/125ms seems I have something else to find, Maybe 100ms is less important and can wait for the R8 solution, providing a simple public getter for the rare people that will notice and care about it in the meantime could be enough?

@elizarov

This comment has been minimized.

Copy link
Member

commented Dec 9, 2018

What and why is exactly slow in ServiceLoader on Android?

What if we write our own (compatible) implementation that uses ClassLoader.getResources to get the corresponding META-INF/service/xxx files and parses them. Would it be fast or just as slow?

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 10, 2018

In the attached flamegraph most of the time is spent in signed JAR verification.
Though I am not sure the same issue arises in release-mode dex: there is no separate JAR and it can be verified only once on startup, not during service loading.

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 10, 2018

It appears that JAR is verified all the time even in dex.
We can write our own ServiceLoader and avoid verification (it is a matter of one JarFile constructor parameter) if it gives a real performance boost.

UPD.
In emulator with shrinked dex sample code that reads a class name from JAR manifest, data points:
Without verification: 20, 30, 23 ms
With verification: 485, 366, 449 ms

But I am not sure whether we should write it: it will add maintenance burden, it may not work on some devices (especially on new androidx where even default SL breaks in some versions).

Example of how to read service name without verification on Android:

val service = "META-INF/services/kotlinx.coroutines.internal.MainDispatcherFactory"
val url = this.javaClass.classLoader.getResources(service).toList()[0]
val string = url.toString()
val separatorIndex = string.indexOf('!')
val pathToJar = string.substring(0, separatorIndex)
val entry = string.substring(separatorIndex + 2, string.length)
val file = JarFile(pathToJar.substring("jar:file:/".length), false)
BufferedReader(InputStreamReader(file.getInputStream(ZipEntry(entry)), "UTF-8")).useLines {
    it.forEach { println(it) } // prints AndroidDispatcherFactory
}
@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 10, 2018

Maybe I was not clear but the given values where from production app in release mode and heavily optimized ;)

If you do nothing I would highly appreciate to have a public getter in https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/HandlerDispatcher.kt for those who care about the delay in application init.

Currently it seems Kotlin prevent using a package trick to access the internal Main, leading to necessary code duplication just to pass the var public and avoid the issue.

Maybe add another public alias with a clear name / Kdoc about it's usage not recommended?

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 12, 2018

One more detail, I got some ANR in prod due to that too.

Probably due to accessing file from main thread on a busy file system.

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 flags=0 obj=0x71e769a8 self=0xb303a000
  | sysTid=8148 nice=0 cgrp=default sched=0/0 handle=0xb6e794a4
  | state=R schedstat=( 688284631 7931945867 493 ) utm=46 stm=22 core=0 HZ=100
  | stack=0xbe505000-0xbe507000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at java.util.HashMap.put (HashMap.java:611)
  at sun.security.util.ManifestDigester.<init> (ManifestDigester.java:170)
  at java.util.jar.JarVerifier.processEntry (JarVerifier.java:290)
- locked <0x0c34e60a> (a byte[])
  at java.util.jar.JarVerifier.update (JarVerifier.java:229)
  at java.util.jar.JarFile.initializeVerifier (JarFile.java:374)
  at java.util.jar.JarFile.getInputStream (JarFile.java:441)
- locked <0x06592e7b> (a java.util.jar.JarFile)
  at libcore.io.ClassPathURLStreamHandler$ClassPathURLConnection.getInputStream (ClassPathURLStreamHandler.java:177)
  at java.net.URL.openStream (URL.java:1058)
  at java.util.ServiceLoader.parse (ServiceLoader.java:305)
  at java.util.ServiceLoader.-wrap0 (ServiceLoader.java)
  at java.util.ServiceLoader$LazyIterator.hasNextService (ServiceLoader.java:358)
  at java.util.ServiceLoader$LazyIterator.hasNext (ServiceLoader.java:402)
  at java.util.ServiceLoader$1.hasNext (ServiceLoader.java:488)
  at kotlin.a.u.a (_Collections.kt:1132)
  at kotlin.a.u.f (_Collections.kt:1165)
  at kotlin.a.u.e (_Collections.kt:1156)
  at kotlinx.coroutines.bx.<clinit> (Dispatchers.kt:96)
  at kotlinx.coroutines.ax.b (Dispatchers.kt:53)
  at com.genimee.android.yatse.b.a.b.b.<init> (MainThreadBus.kt:14)
  at com.genimee.android.yatse.b.b.a.<clinit> (ServiceLocator.kt:24)
  at org.leetzone.android.yatsewidget.YatseApplication.<clinit> (YatseApplication.kt:390)
  at java.lang.Class.newInstance (Native method)
  at android.app.Instrumentation.newApplication (Instrumentation.java:1102)
  at android.app.Instrumentation.newApplication (Instrumentation.java:1087)
  at android.app.LoadedApk.makeApplication (LoadedApk.java:983)
  at android.app.ActivityThread.handleBindApplication (ActivityThread.java:5720)
  at android.app.ActivityThread.-wrap1 (ActivityThread.java)
  at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1657)
  at android.os.Handler.dispatchMessage (Handler.java:106)
  at android.os.Looper.loop (Looper.java:164)
  at android.app.ActivityThread.main (ActivityThread.java:6499)
  at java.lang.reflect.Method.invoke (Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:442)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:807)
@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

There's a solution to have the Main dispatcher being automatically injected before Application.onCreate() is called. It is by using a ContentProvider (which are eagerly initialized) and taking advantage of the default manifest placeholder for applicationId so there's no clash between apps.
This needs to have the android artifact be published as an aar though. This could also be done with a separate artifact, provided there's a public API to make it possible.

You can see some example on something I done for this: https://github.com/LouisCAD/Splitties/blob/e77c909585f1b6d457af0fe18655e4794434ce50/initprovider/README.md

@JakeWharton

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@JakeWharton Because it requires no additional code in consumer apps. in other words, it can fix the issue for the default process in a convenient way (having nothing to do apart from updating).

The current slow path can still be used for multiprocess use (so it doesn't break anything). Such processes are usually not the processes where the main UI is run anyway, so cold start time doesn't matter as much.

@JakeWharton

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

The current design doesn't allow static initialization because Dispatchers.Main is shared with non android kotlinx.coroutines, so that's why there was a ServiceLoader in the first place, and that's why I proposing an alternative to initialize it. I think I now answered why, do you agree?

About the ContentProvider hack, it is not just for getting access to a Context but equally to run before Application.onCreate is called, hence why it can fit this use case.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 12, 2018

Before talking about how, let's talk about if it should be done as @qwwdfsad seems to think it's not :(

When talking about how will come the do 'we do file access on main thread' (as possible solution proposed by @qwwdfsad do too), and does the solution needs to be global or optional? (Meaning should the faster path replace the Dispatchers.Main or be something else that won't interfere with other implementations and open a lot more solutions).

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 12, 2018

I see multiple solutions here and I'm especially interested in @JakeWharton opinion on that.

  1. Do nothing, wait until R8 starts desugaring service loaders into static access. Very tempting for kotlinx.coroutines, resolves the issue in the long term but does not help users on older toolchains.

  2. Use Class.forName(AndroidDispatcherFactory fqn). It is the fastest solution, but it can have a weird interaction with e.g. TestDispatcher (#749) and adds a lot of maintenance burden. We should keep in mind this behaviour when implementing new features, it is testable only when kotlinx-coroutines-android artifact is present in the classpath, it does not resolve the same issue with CoroutineExceptionHandler etc. I generally don't like this solution.

  3. Write our own ServiceLoader. According to my experiments, it speeds up load by an order of magnitude. Yes, it is still a file read from the main thread, but this effect is negligible, and it cuts down 90% of startup footprint. I don't see any problems with omitted JAR verification because we can statically reject any services that are not part of kotlinx.coroutines and/or we can make verification an opt-in mechanism for environments where JARs from untrusted sources is the real issue (and AFAIK Android is not one of these environments).
    Though I am not sure whether custom ServiceLoader will always work on every Android runtime, we can always fallback to the default ServiceLoader. This solution looks the most beneficial for me and we probably can even implement it in 1.1.0 release.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 12, 2018

Yes, it is still a file read from the main thread, but this effect is negligible

Let me disagree on that, was hard to reach top 3% of Play Store ANR (And can't go further due to Android 8+ bug) and any file access on main thread as small as it can be, can have impact on low end phones and block for many seconds before getting access due to some low level locks.
So yes 3) cuts a lot of the issue in nominal cases, but does not solve all cases, so as you said does it worth the maintenance cost if it's not 100% reliable and full solution?

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@qwwdfsad Is there a reason you don't talk about adding a public (possibly experimental) API to inject an instance of the main dispatcher?
This would allow us to instantiate it ourselves, statically, from host application code, bypassing the usual lookup that unfortunately does I/O on main thread. I think it would be a great solution until R8 "desugars" ServiceLoaders (which could be distant in time, and ANRs…).

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 12, 2018

Yes, I've explained them in #810 (comment)

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

@qwwdfsad To me, the solution I mentioned, using the ContentProvider hack, is the only solution that can be implemented as a workaround without drawbacks or public API changes while waiting for R8 desugar support. It falls back to serviceloader on non default process, but most apps are single process as I said previously, and it's not worse than what we currently have anyway.

I can try to make a PR for this.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 13, 2018

Got confirmation that 1) won't be planned before quite some time as priority is R8 stability before those things and handling the jar change directly at OS is out of possibility minSdk 29 would be 10 years away :)

For my needs, I'll go to prod with duplication of the handler and not using the Dispatchers.Main.

In what cases / at what point is the CoroutineExceptionHandler serviceLoader started?

@sellmair

This comment has been minimized.

Copy link

commented Dec 13, 2018

@qwwdfsad

Yes, I've explained them in #810 (comment)

So the argument is that allowing to change the MainDispatcher publicly and globally is bad for any core library like coroutines because this could lead to third-party libraries conflicts, right?

Would this argument also hold, if one would just allow global configuration once and further changes are disallowed by the framework (by throwing a Runtime Exception that explains that re-assignment of the main dispatcher is not possible)

Isn't this conceptionally the same as defining the service class in META-INF/services/..., in regards of conflicts?:
Anyone could define his own service there.

The tradeoff would be compile-time safety vs performance here. I personally could live with the Runtime Exception when loading two 'UI coroutine libraries'.

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

Isn't this conceptionally the same as defining the service class in META-INF/services/..., in regards of conflicts?

It is not. Service loader (and actual service interface) mechanism is purely internal. We do not allow third-parties to implement custom main dispatchers, we do not provide any backward compatibility guarantees and there are chances we even forbid implementations not known to our service loader.

Set-once is still a source of confusion, non-determinism and (as any other global mutable state provided by a library or a runtime) eventually will end up with reordering your project dependencies along with reflection hacks until you get the desired behaviour. Yes, it is better than "set anything anytime", but it still has the same flaws.

The tradeoff would be compile-time safety vs performance here.

There are no tradeoffs in your solution. The tradeoff between safety and performance is to start using Class.forName or rolling our own ServiceLoader, not opening dispatcher for global mutation.

I personally could live with the Runtime Exception when loading two 'UI coroutine libraries'.

Yes, everyone could. Until they have tens of dependencies that have 7 different versions of 4 conflicting frameworks* (yes, log4j, slf4j, jul and logback, I'm talking to you)

What you (and others) are proposing is a good solution for a local project or for a library which is used only within a single company, especially when you've already seen a similar pattern in RxJavaPlugins. But it is not good enough for a core library and eventually will harm the ecosystem.
Moreover, a global settable dispatcher is not really helpful: for tests, there is #749, if you really need settable dispatches -- you need dependency injection. You don't set global statics in order to mock a class dependency, do you?

* No jokes: https://github.com/eBay/Spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L162
Another good example is implicit shutdown hook dependencies.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

While I do understand and approve all the reasons, currently the main issue is that there's no workaround due to everything being internal / private.

Can't you just offer a way for those "small" or non multiplatform a way to access the Android Main dispatcher so that we can inject that and use that if we need to?

So not a way to inject or alter Dispatchers.Main but a way to access to X that would be the AndroidMainDispatcher that we can then use as we want? Avoid code duplication and maintenance for us too.

And can you answer this please?:
In what cases / at what point is the CoroutineExceptionHandler serviceLoader started?

I fear that's the reason of a slow down when Crashlytics initialize but can't pinpoint it :(

@elizarov

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

@Tolriq There is a public non-experimental API for you: https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-android/kotlinx.coroutines.android/android.os.-handler/as-coroutine-dispatcher.html

You can statically create a coroutine dispatcher from your handler and then inject it statically throughout your code. As long as you do not explicitly use Dispatchers.Main there is no dynamic loading (it is loaded only on the actual access to this property).

@Tolriq

This comment has been minimized.

Copy link
Author

commented Dec 14, 2018

@elizarov yes but as confirmed by @qwwdfsad it does not do exactly the same, I currently have copied the https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/HandlerDispatcher.kt made Main public and inject that and it works nicely, just sad to have to copy the file to have the var public. (Usually I would use the package trick to access it, but it seems for file level variable this does not work)

For the moment I can use that workaround, but it means I may need to update my code copy when I update coroutines as result might be unpredictable if I forget.

Do you have the answer to the question about CoroutineExceptionHandler serviceloader call? Trying to find why Crashlytics now impact my startup time even if initialized on background thread and want to know if it can be linked or not.

@elizarov

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

@Tolriq

In what cases / at what point is the CoroutineExceptionHandler serviceLoader started?

It happens the first time this handleCoroutineExceptionImpl function is invoked:

internal actual fun handleCoroutineExceptionImpl(context: CoroutineContext, exception: Throwable) {

It happens if and only if there is uncaught exception and it was not handled by your code. For Android it means that your app is about to crash.

I currently have copied the https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/HandlerDispatcher.kt

The only thing you might need to copy from there is this Looper.asHandler function:
https://github.com/Kotlin/kotlinx.coroutines/blob/master/ui/kotlinx-coroutines-android/src/HandlerDispatcher.kt#L61
However, depending on your API level it could be much simpler implemented. You don't need the rest of that code, because Handler.asCoroutineDispatcher is public, or I don't understand your use-case.

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

@Tolriq I just released version 3.0.0-alpha03 of Splitties, where Dispatchers.MainAndroid is now backed by an async Handler.

@Tolriq

This comment has been minimized.

Copy link
Author

commented Feb 6, 2019

Thanks but I tend to not use too much external libraries and have implemented my solution since mid December :) This is more for @Ornolfr

Let's hope an official no disk on, main thread solution is made one day.

SokolovaMaria pushed a commit that referenced this issue Mar 15, 2019
SokolovaMaria pushed a commit that referenced this issue Mar 15, 2019
SokolovaMaria pushed a commit that referenced this issue Mar 15, 2019
SokolovaMaria pushed a commit that referenced this issue Mar 18, 2019
@qwwdfsad qwwdfsad closed this in 15b6345 Apr 4, 2019
@Tolriq

This comment has been minimized.

Copy link
Author

commented Apr 4, 2019

For those who will find this, the "fix" mitigate the issue, but this is still disk IO on main thread.

On Android, this is still an issue and can still cause slowdown / ANR on busy devices. (Obviously no more than without that patch, but still an issue for those who seek 100% ANR free sessions)

@qwwdfsad

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

It is a very isolated IO for a single file that should be in a disk cache anyway.

Feel free to open a new issue if you still experience ANR after 1.2.0-alpha-2;
There are ways to speed up that loading even more, but only if there exists real evidence that the current solution is not enough

@Tolriq

This comment has been minimized.

Copy link
Author

commented Apr 4, 2019

I do Android apps since 8 years every single disk access on main thread can and will trigger an ANR or slow down at some point, flash is slow, 50$ phones have even more bad quality flash.

I no more use the official dispatcher but my own to address that, so won't be able to give "evidence" about something well known.

I had to provide Google with a patch for their Billing library because on recent updates Google Play did a simple file access on service connection, leading to tons of ANRs.

Most users do not check their ANRs stats or vitals and it's really hard to spot on tests without forcing the device to have a busy IO queue, so you probably won't have other users reporting the issue for a while, but it will still be present.

I totally understand that Android is not the only / main target and that 0,x% of slow down and 0,0X% of ANR is not a priority. But it's important that for those who matters, they know there's still a disk IO and the cause.

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

I just submitted a feature request for the Android Emulator to allow simulating these slow I/O performances while the storage is already busy and risks triggering an ANR because a file has been naively read from the main thread: https://issuetracker.google.com/issues/129900564

Hopefully, this will at some point allow us to reproduce these situations more easily and consistently.

@elizarov

This comment has been minimized.

Copy link
Member

commented Apr 7, 2019

What we really need is an R8 feature that would desugar ServiceLoader usage into static class initialization based on the META-INF/services files that are found when building project.

@JakeWharton

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2019

@elizarov

This comment has been minimized.

Copy link
Member

commented Apr 7, 2019

@JakeWharton We also need to figure out how we shall write the code so that:

if (newR8isAvailable) useRegualarServiceLoader() else useOurOwnWorkAround()
@JakeWharton

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

@nak411

This comment has been minimized.

Copy link

commented Apr 10, 2019

It is a very isolated IO for a single file that should be in a disk cache anyway.

Why was this issue closed? @qwwdfsad Are there any plans to address this so developers can develop with strict mode on?
I recently discovered this and can use some of the workarounds mentioned above to get around it but doing IO on the main thread is never appropriate for android. https://stackoverflow.com/questions/55584194/android-kotlin-coroutine-crashing-on-strict-mode

@elizarov

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

@nak411 Can you, please, create a separate issue with a reproducer code from that SO question, so that we have an appropriate place to discuss and track it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.