Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped frames while scrolling in list with multiple spans #409

Closed
franciscofranco opened this issue May 19, 2020 · 42 comments
Closed

Dropped frames while scrolling in list with multiple spans #409

franciscofranco opened this issue May 19, 2020 · 42 comments
Labels
bug Something isn't working

Comments

@franciscofranco
Copy link

I'm observing some pretty hefty main thread blockage using the latest Coil version (0.11.0) and older ones too. I just migrated from Glide on this app (I've used Coil before in other projects but never in a RV adapter) and everything was smooth until I tried scrolling on the RV. Anyway, tested with both ixel 3 and OnePlus 7T Pro so clearly not an hardware bottleneck.

My setup is a 5 span GridLayoutManager vertical scrolling. One device has +300 pictures, the other has ~50, so doesn't seem related to the amount of pictures. What I observe is when I start scrolling when Coil is decoding the URIs and starting to display the thumbnails something seems to block the main thread and the whole thing gets blocked and it seems to suspend the entire operation. Sometimes I have to wait 3-4 seconds for it to finish whatever it seems to be doing and it then resumes.

In the screenrecord I'm attatching you can see a bit of the jank after I open the app fresh. I made each thumb 8px/8px because it's my personal pictures library, but you get the picture. It obviously has worse jank with their actual decoded size (which should be device width / spancount) (5 in this case). The jank is halved if I decrease the span count to 3 or 4. Here's the very simple code I'm using:

LoadRequest request = LoadRequest.builder(((ViewHolder) holder).b.picture.getContext())
                    .key(item.getUri().toString())
                    .data(item.getUri())
                    .crossfade(false)
                    .target(((ViewHolder) holder).b.picture)
                    .build();
            Coil.execute(request);

Logcat is filled with these GC related operations when doing the scrolling:

2020-05-19 20:36:44.160 12402-12415/com.franco.graphice I/franco.graphic: Background young concurrent copying GC freed 11330(820KB) AllocSpace objects, 48(1172KB) LOS objects, 0% free, 8438KB/8438KB, paused 124us total 157.193ms
2020-05-19 20:36:44.455 12402-12415/com.franco.graphice I/franco.graphic: Background concurrent copying GC freed 22434(2113KB) AllocSpace objects, 127(3216KB) LOS objects, 49% free, 7161KB/13MB, paused 438us total 288.432ms
2020-05-19 20:36:45.609 12402-12415/com.franco.graphice I/franco.graphic: Background young concurrent copying GC freed 28249(2948KB) AllocSpace objects, 190(4884KB) LOS objects, 0% free, 13MB/13MB, paused 590us total 693.969ms
2020-05-19 20:36:45.772 12402-12437/com.franco.graphice I/franco.graphic: Waiting for a blocking GC ProfileSaver
2020-05-19 20:36:46.126 12402-12415/com.franco.graphice I/franco.graphic: Background concurrent copying GC freed 46990(4091KB) AllocSpace objects, 222(5452KB) LOS objects, 49% free, 10MB/20MB, paused 1.728ms total 515.818ms
2020-05-19 20:36:46.126 12402-12437/com.franco.graphice I/franco.graphic: WaitForGcToComplete blocked ProfileSaver on ClassLinker for 354.762ms
2020-05-19 20:36:48.146 12402-12415/com.franco.graphice I/franco.graphic: Background young concurrent copying GC freed 111714(5587KB) AllocSpace objects, 253(6552KB) LOS objects, 0% free, 22MB/22MB, paused 300us total 851.007ms
2020-05-19 20:36:48.421 12402-12415/com.franco.graphice I/franco.graphic: Background concurrent copying GC freed 125680(7618KB) AllocSpace objects, 372(9332KB) LOS objects, 49% free, 8377KB/16MB, paused 654us total 258.322ms
2020-05-19 20:36:50.169 12402-12415/com.franco.graphice I/franco.graphic: Background concurrent copying GC freed 122438(7099KB) AllocSpace objects, 230(5824KB) LOS objects, 49% free, 5686KB/11MB, paused 48us total 438.112ms

Also it's a bit better when using BitmapConfig RGB_565, but just a tiny bit. With the default BitmapConfig I see tons of these logcat messages:

2020-05-19 20:38:15.978 12923-12923/com.franco.graphice I/Choreographer: Skipped 40 frames!  The application may be doing too much work on its main thread.
2020-05-19 20:38:16.211 12923-12965/com.franco.graphice I/OpenGLRenderer: Davey! duration=914ms; Flags=0, IntendedVsync=5662959106048, Vsync=5663625772688, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=5663641456128, AnimationStart=5663641473576, PerformTraversalsStart=5663642302847, DrawStart=5663719940563, SyncQueued=5663865801775, SyncStart=5663866294692, IssueDrawCommandsStart=5663867129952, SwapBuffers=5663869075161, FrameCompleted=5663874209172, DequeueBufferDuration=187000, QueueBufferDuration=1022000, 
2020-05-19 20:38:16.321 12923-12936/com.franco.graphice I/franco.graphic: Background concurrent copying GC freed 108152(11MB) AllocSpace objects, 478(11MB) LOS objects, 50% free, 17MB/34MB, paused 201us total 1.090s

Any clues or question that I can answer to help you narrow it down somehow?

Thanks I hope this was clear enough.

@franciscofranco franciscofranco added the bug Something isn't working label May 19, 2020
@franciscofranco
Copy link
Author

Here's the screenrecord https://imgur.com/a/oX3jmdn

@colinrtwhite
Copy link
Member

Thanks for the report! This seems like it might related to #378. Coil currently doesn't interrupt the thread during decode - it only supports coroutines cancellation. Just to be sure, can you enable logs and ensure that requests are being started/cancelled when you expect them to? If that's the case, I'll prioritize #378.

@franciscofranco
Copy link
Author

Yup they seem to start & cancel when they should. It certainly looks like the case. I don't do Kotlin so I don't know how coroutines work so can't add much feedback in there, but it certainly looks like what you are describing. The threads should be interrupted if the view is detached/recycled otherwise this will happen. I tried with a big library (+10k pictures) and if I start scrolling really fast everything will be blocked indefinitely until everything is decoded, even if the views have been recycled.

Let me know if there's anything else I can help you with.

@colinrtwhite
Copy link
Member

@franciscofranco I've merged interruption support into master. When you get the chance could you try using the latest snapshot and let me know if it fixes your issue?

@franciscofranco
Copy link
Author

Awesome. I'll try in the next couple hours and get back to you ASAP.

@franciscofranco
Copy link
Author

franciscofranco commented May 27, 2020

I'm afraid to report I don't see many, if any, improvements using the latest 0.12.0-SNAPSHOT :(.
Just waited 5-7 seconds for thumbs to even show up after scrolling a bit after opening the app fresh.

Maybe I'm being retarded and I need to enable this new feature? Didn't seem to find any api to do so if that's the case.

How can I help?

@franciscofranco
Copy link
Author

franciscofranco commented May 27, 2020

Also enabled logging to see wtf is going on, and just caught it taking more than 15 seconds between a particular cancelation:

2020-05-27 19:45:17.665 27341-27341/com.franco.graphice I/RealImageLoader: 🏗  Cancelled - content://media/external/images/media/838
2020-05-27 19:45:34.215 27341-27341/com.franco.graphice I/RealImageLoader: 🏗  Cancelled - content://media/external/images/media/842

And again, now with just 3 columns (so it's showing way less thumbnails):

020-05-27 19:47:54.878 28117-28117/com.franco.graphice I/RealImageLoader: 🏗  Cancelled - content://media/external/images/media/852
2020-05-27 19:48:01.722 28117-28117/com.franco.graphice I/RealImageLoader: 💾 Successful (DISK) - content://media/external/images/media/832

@colinrtwhite
Copy link
Member

Hmm are these local file uris, content uris, or http uris? Can you try increasing the maxRequests and maxRequestsPerHost for your Dispatcher (to say, 50) when creating your OkHttpClient for Coil?

@franciscofranco
Copy link
Author

These are local, there are no http uris on my app. Yes, let me try that.

@franciscofranco
Copy link
Author

franciscofranco commented May 28, 2020

Hmm are these local file uris, content uris, or http uris? Can you try increasing the maxRequests and maxRequestsPerHost for your Dispatcher (to say, 50) when creating your OkHttpClient for Coil?

No change, same behaviour. No matter what kind of changes I make to the dispatcher or the ExecutorService it always seems to block every now and then.

Basically the only thing I can do to mitigate this is lower the span count of my grid from 5 to 3 items.

What else can I try? Can it be a bug on the coroutines themselves?

Sorry I closed the issue by mistake

@colinrtwhite
Copy link
Member

@franciscofranco Ah darn, thanks for checking. If you have a sample project I could run locally that'd be best. Otherwise I'll create a list of a bunch of local images and debug likely this weekend. Do you know the average size of the images? Also are they mostly jpgs or another file type?

@franciscofranco
Copy link
Author

franciscofranco commented May 30, 2020

@franciscofranco Ah darn, thanks for checking. If you have a sample project I could run locally that'd be best. Otherwise I'll create a list of a bunch of local images and debug likely this weekend. Do you know the average size of the images? Also are they mostly jpgs or another file type?

There's nothing really fancy about the project... it's just a 5 span grid, each thumb is (displayWidth / spanCount), each picture is a normal 12px shot from my camera (Pixel 3) so they're JPG, all local URI. The code I'm using from Coil is the same I posted on this issue. I don't do any transformations or animations. You can check the app: https://play.google.com/store/apps/details?id=com.franco.graphice
That version with 3 columns is relatively fast, but I had 5 before and is just unbearable.
Tested with several different devices, all high end though.

Let me know if you reproduced it, otherwise I'll give you access to my private repo with all the source code. Not ideal, but I'd rather help you out somehow figure this one up since I like Coil more than Glide at this point.

@colinrtwhite
Copy link
Member

I think I'm able to reproduce this (or at least another performance issue) by setting ImageListAdapter.numColumns to 5 in the sample app. I'm going to investigate locally - will keep you updated. Thanks again for the info.

@colinrtwhite
Copy link
Member

colinrtwhite commented Jun 1, 2020

Okay so I think the dropped frames are a result of a combination of factors:

  • Slight overhead of launching a coroutine from the main thread.
  • The extra work (compared to Glide) Coil does on the main thread to support returning drawables from the memory cache synchronously.
  • Some other factors deep within Android's graphics pipeline.

With enough active image requests it seems like main thread starts thrashing. I've started some work here to start optimizing for high concurrency scenarios like this, but it's going to take a while.

In the meantime, I've found using Dispatchers.Default as the default dispatcher an ok work-around. It limits the number of in-flight requests to the number of cores on the device, though this can delay other in-flight requests. If you're using the default coil artifact you can set it when you create the singleton ImageLoader. 3 columns should also be faster once I merge this PR.

@colinrtwhite colinrtwhite changed the title Jank during RecyclerView scrolling Dropped frames while scrolling in list with 5 spans Jun 1, 2020
@colinrtwhite colinrtwhite changed the title Dropped frames while scrolling in list with 5 spans Dropped frames while scrolling in list with multiple spans Jun 1, 2020
@franciscofranco
Copy link
Author

I'm glad you managed to reproduce this locally! Do you really need to pull the drawables from cache synchronously? You can do it in a thread and then post it to the ImageView that was passed to the builder? I think it's reliatively trivial to check if the ImageView on the request has been recycled/different position so you can silently fail and move on to the new request.

I'll try your suggestion.

@franciscofranco
Copy link
Author

franciscofranco commented Jun 1, 2020

It's SIGNIFICANTLY better using the default Dispacher. It's like night & day. I can set it to 6 columns and while I can see some jank it doesn't seem to block the thread anymore to the point of waiting 20s for it to "unblock" itself and keep decoding. Good call and can't wait to see the improvements from your PR.

@colinrtwhite
Copy link
Member

@franciscofranco Coil checks the memory cache synchronously to avoid flashes where it shows the placeholder for one frame then shows the memory cache item. It also allows supporting cool things like image sampling automatically. The check is overall very quick and I'm not convinced it's 100% causing the lag we're running into. There's also some work like interacting with AndroidX Lifecycles and any View methods that need to be called from the main thread.

I'm going to keep investigating the root cause of the lag and test if moving the memory cache check off the main thread solves the issue. If it does I think it makes sense to add a mode to move that work to the background dispatcher (at the cost of losing the benefits mentioned above).

@Wrakor
Copy link

Wrakor commented Jun 2, 2020

Just to add I'm having the same problem when using ViewPager2. Heavy main thead usage. Should I just downgrade Coil for the time being? 0.9.5 was fine

@franciscofranco
Copy link
Author

@colinrtwhite can you try to see if setting trackWeakReferences(false) reduces your jank?

@colinrtwhite
Copy link
Member

colinrtwhite commented Jun 3, 2020

@Wrakor Hmm I wouldn't expect 0.9.5 to perform better in this case, but I'll try 0.9.5 with my sample. If it fixes it it'll definitely help with debugging. EDIT: I see the same issue with 0.9.5.

@franciscofranco I tried trackWeakReferences(false) and still experienced dropped frames.

@Wrakor
Copy link

Wrakor commented Jun 3, 2020

I reverted to v0.9.5 and the blockage is gone, as well as the "Davey!" logs, all back to being smooth.
It's strange it didn't help you @colinrtwhite, was expecting the cause to be similar, as ViewPager2 uses RV adapter and layout manager.

@colinrtwhite
Copy link
Member

@Wrakor Hmm maybe it's a different issue you're running into.

I did more profiling yesterday and even if I move the memory cache check to a background thread I still see the dropped frames. It seems like onMeasure and onLayout are taking up most of the time despite the view hierarchy being very simple.

@franciscofranco
Copy link
Author

I tried 0.9.5 and the blockage is indeed gone. Very weird.

@colinrtwhite
Copy link
Member

@franciscofranco There were a number of performance enhancements that went into 0.13.0 and 1.0.0-rc2. In my tests I'm able to scroll through a list with 4-5 columns without dropping frames (especially after optimizing with R8). If you update, make sure to set launchInterceptorChainOnMainThread(false) to run the memory cache check on a background thread.

@franciscofranco
Copy link
Author

Fantastic. I'll definitely test in the next few days

@Wrakor
Copy link

Wrakor commented Sep 21, 2020

Just adding my two cents, I've tried the 0.13.0 update (both with and without launchInterceptorChainOnMainThread(false)) and my screen with a ViewPager becomes so laggy that it is unusable. Same with 1.0.0-rc2. Back to 0.9.5 unfortunately.

@colinrtwhite
Copy link
Member

@Wrakor If you're able to create a sample project that reproduces the issue (or modify the coil-sample project in this repo), that would help out a lot with debugging.

@Wrakor
Copy link

Wrakor commented Sep 22, 2020

@Wrakor If you're able to create a sample project that reproduces the issue (or modify the coil-sample project in this repo), that would help out a lot with debugging.

I'm trying to set a sample project, but I'm having trouble loading images when using Coil (shows a blank image view). Could you see if there's anything missing in my code?
https://github.com/Wrakor/coil-sample-app

@colinrtwhite
Copy link
Member

@Wrakor The application loads the images correctly for me with no performance issues - it's likely a local setup issue.

@tdounnyy
Copy link

I'm on Coil 1.0.0-rc2,facing exactly the same issue. The withInterruptibleSource seems not solving this problem.

After some digging, I notice that Dispatchers.Default, which Coil relays on, is starting too many decode() thread at the same time, eating too many memory. Comparing to Glide v4.11.0, there's only 4 or 5.

So, I feed the ImageRequest with a newFixedThreadPoolContext(nThreads = 4,...) as a workaround.

@Wrakor
Copy link

Wrakor commented Sep 28, 2020

@Wrakor The application loads the images correctly for me with no performance issues - it's likely a local setup issue.

@colinrtwhite I've managed to narrow down my problem to SVG loading. Some SVGs, when loaded with a custom Target, are causing heavy main thread usage. I've updated my example project. Do you want me to open a new issue?

@colinrtwhite
Copy link
Member

@Wrakor Thanks - I'll take a look. I'd keep discussion in this issue.

@Tolriq
Copy link
Contributor

Tolriq commented Nov 3, 2020

So wanted to give coil a new try now with the interceptor and all but the lack of parallelism control triggers this issue too for me.
On a screen with dozens of images Coil will try to load all the images at the same time generating huge CPU/Disk and memory usage without control.

While we can pass a fixed thread pool dispatcher, this triggers more context switch that needed and prevent thread sharing with the rest of the app, vastly reducing the gains of using coroutines.

Is there any plans to add some concurrency limits ?

@colinrtwhite
Copy link
Member

@Tolriq You can use Dispatchers.Default to restrict concurrency while sharing threads.

@Tolriq
Copy link
Contributor

Tolriq commented Nov 3, 2020

I still want the default to be available for the rest and not being blocked doing IO stuff that can take long when downloading large images. Using that would be even worse as all threads would be blocked doing IO for images and the rest of the coroutines would be waiting to run.

@leondeklerk
Copy link

Is there already a solution for this? Since with 5 spans and launchInterceptorChainOnMainThread(false), my recyclerView still experiences some jank. Especially when scrolling fast through the grid.

@colinrtwhite
Copy link
Member

@leondeklerk Unfortunately, there isn't a great solution for this at the moment aside from setting ImageLoader.Builder.dispatcher(Dispatchers.Default). That said, this bug should be completely fixed in 2.0. I'm able to run 5 columns in the sample app no problem. The alpha will be out soon.

@colinrtwhite
Copy link
Member

colinrtwhite commented Oct 11, 2021

I've just published 2.0.0-alpha01 which should completely fix any performance issues related to scrolling. I've set the sample app to run with 5 columns and it doesn't drop any frames.

Among other performance improvements, 2.x throttles the output of BitmapFactoryDecoder. The number of parallel BitmapFactory decodes defaults to 4, but can be adjusted using ImageLoader.Builder.bitmapFactoryMaxParallelism. Increasing the number allows more parallel decodes, but may (slightly) reduce scrolling FPS. I felt 4 was a good default, but depending on how much work your app does on the main thread, your app may be able to handle more.

@marcouberti
Copy link

@colinrtwhite this solved the issue in my app. Now my image gallery (LazyColumn with 3 image columns) is very smooth. Thank you!

@WANZIzZ
Copy link

WANZIzZ commented Jan 4, 2022

@colinrtwhite I'm using PictureSelector. imageEngine() choosed Coil. when I select pictures in the album. if there are a lot of pictures(more than 500). and the pictures are very large(more than 1M). it will be very stuck when I swipe up and down. but if imageEngine() choosed Glide. It won't be like this. The Coil version I use is 2.0.0-alpha06. you can use CoilPictureSelector try.

@WANZIzZ
Copy link

WANZIzZ commented Jan 4, 2022

Is there a load limit? For example, when I load 1000 pictures, I swipe quickly and can only slide to the 500th position. After the 500 pictures are partially loaded, I can continue to slide down.

@colinrtwhite
Copy link
Member

Going to close this out as the original issue is fixed in 2.0.0-rc01. If there are other performance issues please open a new ticket with a way to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants