New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed deadlocks, completion blocks async, race condition on cancel #781

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@bpoplauschi
Member

bpoplauschi commented Jun 24, 2014

Fixed issues #625 #507 aka deadlocks (replaces #509)

  • In order to fix the deadlock, reviewed the [SDImageCache diskImageExistsWithKey:] method. Based on the Apple doc for NSFileManager, using the defaultManager without the dispatch on the ioQueue to avoid the deadlocks. This instance is thread safe. Also created an async variant of this method [SDImageCache diskImageExistsWithKey:completion:]
    For consistency, added async methods in SDWebImageManager cachedImageExistsForURL:completion: and diskImageExistsForURL:completion:
  • More fixes against deadlocks. dispatch_sync on a serial queue is an anti pattern, as it can lead to deadlocks. I replaced all the dispatch_main_sync_safe with dispatch_main_async_safe, based on the fact that the completion blocks are not returning anything, so we don't need to wait for them to finish.
  • The biggest change (architectural) is that the completion block will no longer be executed inside the operation. But that is not an issue, as NSOperation does the same (completion block gets called after operation isFinished).

Fixed race condition In SDWebImageManager (replaces #699)

  • In SDWebImageManager if one operation is cancelled, the completion block must not be called, otherwise it might race with a newer completion for the same object
@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Jun 24, 2014

@rs @donholly @matej @cherpake @salling @lavoy @akhenakh I would appreciate if you guys could review this thoroughly. I will add more tests as I have time

@bpoplauschi bpoplauschi added this to the 3.7.0 milestone Jun 25, 2014

bpoplauschi added some commits Jun 23, 2014

#625 In order to fix the deadlock, reviewed the `[SDImageCache diskIm…
…ageExistsWithKey:]` method. Based on the Apple doc for NSFileManager, using the defaultManager without the dispatch on the ioQueue to avoid the deadlocks. This instance is thread safe. Also created an async variant of this method `[SDImageCache diskImageExistsWithKey:completion:]`

For consistency, added async methods in `SDWebImageManager` `cachedImageExistsForURL:completion:` and `diskImageExistsForURL:completion:`
#625 More fixes against deadlocks. dispatch_sync on a serial queue is…
… an anti pattern, as it can lead to deadlocks. I replaced all the dispatch_main_sync_safe with dispatch_main_async_safe, based on the fact that the completion blocks are not returning anything, so we don't need to wait for them to finish.

The biggest change (architectural) is that the completion block will no longer be executed inside the operation. But that is not an issue, as NSOperation does the same (completion block gets called after operation isFinished).
Replace #699 Fixed race condition in SDWebImageManager if one operati…
…on is cancelled, the completion block must not be called, otherwise it might race with a newer completion for the same object
@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Jun 27, 2014

Guys, could you please review this? Especially @rs

@rs

This comment has been minimized.

Member

rs commented Jun 27, 2014

I initially tried to have dispatch_sync whenever possible as async delays the processing and slows down the display of images.

@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Jun 29, 2014

Ok, I will do some profiling for sync vs async.

@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Jul 10, 2014

I profiled a bit the sync vs async solutions, using an iPhone 4.

I think the async one works better (as I would have expected), as it doesn't block any queue. It doesn't make sense why in the async version, the displaying would be slowed down. We are just registering a block for execution on the main queue, the only difference is if ioQueue needs to wait for the completion executed on the main queue or it can get to a new operation. See the screenshots (I can provide the Instruments trace files as well) - async is the first one.
async
sync

The only issue that popped on the async solution were some crashes inside sd_setImageLoadOperation because operation was nil. That happened because the operations were executed fast and deallocated before getting to store them. No issue there, just need to add a check for nil.

@rs I think this is the only major showstopper for the 3.7.0 that we are being asked to release asap.

@rs

This comment has been minimized.

Member

rs commented Jul 10, 2014

Let go with async. We'll see if we get complaints :)

@matej

This comment has been minimized.

Contributor

matej commented Jul 10, 2014

I think the important thing to keep with async calls is that (memory) cache hits get returned in the same run loop iteration. This way we don't get any image flickering going on for this operation, that really should be instantaneous.

Putting it another way. If I call -[SDWebImageManager downloadImageWithURL:options:progress:completed:] from the main thread and this results in a memory cache hit, the completion block should be invoked immediately so the image can be populated before the UI is re-drawn. The main thread check in dispatch_main_async_safe, should essentially make sure this does in fact happen, right?

@rs

This comment has been minimized.

Member

rs commented Jul 10, 2014

Yes, the whole sync dance is about not delaying the cache check so UI doesn't have a chance to redraw. We may find a better way.

@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Jul 14, 2014

Guys, I've given this more thought and I think the safest way to do this is:

  • release a 3.7.0 without the change from sync to async. I will only cherry pick the fixes for the deadlocks (46d23ad) and for race condition for canceled ops (a8f7a94)
  • for the next major version 4.0 we will incorporate the async calls and ask people to test using master.

This was due to the impossibility to predict the impact of changing those sync calls to async and the need to release the 3.7.0 version that a lot of developers are asking for.

@bpoplauschi bpoplauschi modified the milestones: 4.0.0, 3.7.0 Jul 14, 2014

Fixed crashes due to operation being nilled before being able to set …
…it, so we added a check for operation and key
@rsanchezsaez

This comment has been minimized.

rsanchezsaez commented Aug 8, 2014

The fix for #699 is causing an issue in our code where this method:

     - (id <SDWebImageOperation>)downloadImageWithURL:(NSURL *)url
                                     options:(SDWebImageOptions)options
                                    progress:(SDWebImageDownloaderProgressBlock)progressBlock
                                   completed:(SDWebImageCompletionWithFinishedBlock)completedBlock;

doesn't execute the SDWebImageCompletionWithFinishedBlock completion in some circumstances.

This should never ever happen. One should expect that the most external completion block gets executed, whether the action succeeds, cancel or timeouts.

Wouldn't the proper fix for this be (in addition of what was already committed) replacing this code

        operation.cancelBlock = ^{
            [subOperation cancel];

            @synchronized (self.runningOperations) {
                [self.runningOperations removeObject:weakOperation];
            }
        };

by this

        operation.cancelBlock = ^{
            [subOperation cancel];

            @synchronized (self.runningOperations) {
                [self.runningOperations removeObject:weakOperation];
            }

            dispatch_main_async_safe(^{
                completedBlock(nil, nil, SDImageCacheTypeNone, NO, url);
            });
        };

?

@rsanchezsaez

This comment has been minimized.

rsanchezsaez commented Aug 8, 2014

UPDATE: Hmm, I'm debugging further and I think it's a different issue. Let me investigate and get back to you.

Not sure what's causing it, but definitely

- (id <SDWebImageOperation>)downloadImageWithURL:(NSURL *)url
                                 options:(SDWebImageOptions)options
                                progress:(SDWebImageDownloaderProgressBlock)progressBlock
                               completed:(SDWebImageCompletionWithFinishedBlock)completedBlock;

is not calling the completion handler on some instances. This problem surfaced when updating from SDWebImage 3.6 to 3.7.1, but not sure which change exactly started to make it happen.

We are heavily using the SDWebImagePrefetcher in parallel to the aforementioned method.

@croath

This comment has been minimized.

croath commented Jul 12, 2016

Hey guys I don't know if you guys are still focusing on the issue. Actually I'm not involving with any issues here, just feeling curious about the sync things around the callback block, well, while reading the source code of the project.

After reading the explanations from @matej and @rs , and the great example about -[SDWebImageManager downloadImageWithURL:options:progress:completed:] hitting the memory cache, I kind of know the reason of using sync.

But talking about the UI redrawing, is the main thread just here waiting for memory cache returns back and will draw the image once the image comes back from memory, immediately? There are still lot of time and chances for UI to redraw between querying cache and sending back the completionBlock.

And what if the -[SDWebImageManager downloadImageWithURL:options:progress:completed:] call couldn't hit any cache and then just fetches it from internet? After the image being downloaded successfully, should we just use a async instead of sync to call the completionBlock?

Please let me know if I missed something or there was any misunderstanding.

Thanks!

bpoplauschi added a commit that referenced this pull request Sep 30, 2016

Replacing #781 - based on http://blog.benjamin-encz.de/post/main-queu…
…e-vs-main-thread/, background queue can execute code on the main thread, so we need to check for the main queue to assure safety

bpoplauschi added a commit that referenced this pull request Sep 30, 2016

Replacing #781 - replaced all the remaining `dispatch_main_sync_safe`…
… with `dispatch_main_async_safe`, so we no longer `dispatch_sync` on the main queue that can create issues.

bpoplauschi added a commit that referenced this pull request Sep 30, 2016

Replacing #781 - made sure all the completionBlocks called from `SDWe…
…bImageDownloaderOperation` are called on the main queue. Created 2 methods to simplify the code for calling the completions

bpoplauschi added a commit that referenced this pull request Sep 30, 2016

Replacing #781 - made sure completionBlock called from `SDWebImageMan…
…ager` is called on the main queue. Created 2 methods to simplify the code for calling the completions
@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Sep 30, 2016

Redone this via commits 062e50a f7e8246 c77adf4 265ace4 0c47bc3 on the 4.x branch. Same stuff: replaced all sync calls to the completion with async, cleaned up the code a bit.

Another important change is based on http://blog.benjamin-encz.de/post/main-queue-vs-main-thread. This article states that the main thread may execute code from other queues than the main queue, so we actually want to check for main queue to assure our UIKit operations.
We used: strcmp(dispatch_queue_get_label(DISPATCH_CURRENT_QUEUE_LABEL), dispatch_queue_get_label(dispatch_get_main_queue())

@bpoplauschi

This comment has been minimized.

Member

bpoplauschi commented Sep 30, 2016

@croath I see your point, but I feel safer with the async code, since we have a history of locking here and there. Take a look and let me know what you think.

@bpoplauschi bpoplauschi referenced this pull request Sep 30, 2016

Closed

Dead lock #507

@croath

This comment has been minimized.

croath commented Sep 30, 2016

@bpoplauschi Awesome! Yep the safety, just what I was talking about. So that I asked about why sync but not async. Maybe we can see the result differs with profiling work in some specific situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment