Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXC_BAD_ACCESS in AWSS3 #2314

Closed
diesal11 opened this issue Feb 25, 2020 · 12 comments
Closed

EXC_BAD_ACCESS in AWSS3 #2314

diesal11 opened this issue Feb 25, 2020 · 12 comments
Assignees
Labels
bug Something isn't working s3 Issues related to S3

Comments

@diesal11
Copy link
Contributor

diesal11 commented Feb 25, 2020

Describe the bug
We have seen a number of crashes from our error reporting that seem to be coming from the AWS SDK. As you will see in the attached stack trace a call to fwrite contained within [AWSS3TransferUtility createTemporaryFileForPart] is causing a EXC_BAD_ACCESS/SIGBUS crash.

Not sure what to do about this one but thought it was worth reporting. I don't believe we have set anything up incorrectly here but let me know if you require more context

To Reproduce
Unable to reproduce locally

Which AWS service(s) are affected?
AWSS3 SDK

Environment(please complete the following information):

  • SDK Version: 2.12.6
  • Dependency Manager: Cocoapods
  • Swift Version : 5.0

Device Information (please complete the following information):

  • Device: Issues reported from a iPhone6 & iPhone11
  • iOS Version: iOS 13.3.1
  • Specific to simulators: No

Additional context
Full stacktrace:

OS Version: iOS 13.3.1 (17D50)
Report Version: 104

Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: BUS_NOOP at 0x0000000000000068
Crashed Thread: 2

Application Specific Information:
Exception 1, Code 104, Subcode 8 >
Attempted to dereference garbage pointer 0x68.

Thread 2 Crashed:
0   libsystem_c.dylib               0x186e2f15c         flockfile
1   libsystem_c.dylib               0x186e31dbc         fwrite
2   AWSS3                           0x2b5b2a304         -[AWSS3TransferUtility createTemporaryFileForPart:partNumber:dataLength:error:] (AWSS3TransferUtility.m:1450)
3   AWSS3                           0x2b5b2a74c         -[AWSS3TransferUtility createUploadSubTask:subTask:startTransfer:internalDictionaryToAddSubTaskTo:] (AWSS3TransferUtility.m:1496)
4   AWSS3                           0x2b5b2b65c         -[AWSS3TransferUtility retryUploadSubTask:subTask:startTransfer:] (AWSS3TransferUtility.m:1606)
5   AWSS3                           0x2b5b2504c         -[AWSS3TransferUtility handleUnlinkedTransfers:tempTransferDictionary:] (AWSS3TransferUtility.m:755)
6   AWSS3                           0x2b5b247a0         __93-[AWSS3TransferUtility linkTransfersToNSURLSession:tempTransferDictionary:completionHandler:]_block_invoke (AWSS3TransferUtility.m:690)
7   CFNetwork                       0x31ef56900         _CFNetworkHTTPConnectionCacheSetLimit
8   CFNetwork                       0x31ee63bfc         cfnTranslateCFError
9   Foundation                      0x31493739c         __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__
10  Foundation                      0x3148410c4         -[NSBlockOperation main]
11  Foundation                      0x314939624         __NSOPERATION_IS_INVOKING_MAIN__
12  Foundation                      0x314840d5c         -[NSOperation start]
13  Foundation                      0x31493a01c         __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__
14  Foundation                      0x314939ae8         __NSOQSchedule_f
15  libdispatch.dylib               0x186e817d8         _dispatch_block_async_invoke2
16  libdispatch.dylib               0x186ecf180         _dispatch_client_callout
17  libdispatch.dylib               0x186e77a38         _dispatch_continuation_pop$VARIANT$mp
18  libdispatch.dylib               0x186e7718c         _dispatch_async_redirect_invoke
19  libdispatch.dylib               0x186e83fa0         _dispatch_root_queue_drain
20  libdispatch.dylib               0x186e8476c         _dispatch_worker_thread2
21  libsystem_pthread.dylib         0x186f1eb44         _pthread_wqthread

Thread 0
0   libsystem_kernel.dylib          0x330a8f634         mach_msg_trap
1   libsystem_kernel.dylib          0x330a8ea9c         mach_msg
2   CoreFoundation                  0x3301f3284         __CFRunLoopServiceMachPort
3   CoreFoundation                  0x3301ee3a4         __CFRunLoopRun
4   CoreFoundation                  0x3301edad8         CFRunLoopRunSpecific
5   GraphicsServices                0x3112be324         GSEventRunModal
6   UIKitCore                       0x334d32638         UIApplicationMain
7   attentionTRACE                  0x20071448c         main (VideoRecorder.swift:16)
8   libdyld.dylib                   0x32f29635c         start

Thread 1
0   libsystem_kernel.dylib          0x330a8f670         semaphore_wait_trap
1   libdispatch.dylib               0x186e7588c         _dispatch_sema4_wait$VARIANT$mp
2   libdispatch.dylib               0x186e75ed0         _dispatch_semaphore_wait_slow
3   libswiftDispatch.dylib          0x34e4bc354         OS_dispatch_semaphore.wait
4   Mixpanel                        0x1010a45a4         Decide.checkDecide (Decide.swift:150)
5   Mixpanel                        0x1010ef65c         MixpanelInstance.checkDecide (MixpanelInstance.swift:1715)
6   Mixpanel                        0x1010dd824         @callee_guaranteed
7   libdispatch.dylib               0x186ece60c         _dispatch_call_block_and_release
8   libdispatch.dylib               0x186ecf180         _dispatch_client_callout
9   libdispatch.dylib               0x186e7b400         _dispatch_lane_serial_drain$VARIANT$mp
10  libdispatch.dylib               0x186e7bdf4         _dispatch_lane_invoke$VARIANT$mp
11  libdispatch.dylib               0x186e85310         _dispatch_workloop_worker_thread
12  libsystem_pthread.dylib         0x186f1eb84         _pthread_wqthread

Thread 3
0   libsystem_kernel.dylib          0x330ab1a7c         __workq_kernreturn
1   libsystem_pthread.dylib         0x186f1ebd0         _pthread_wqthread

Thread 4
0   libsystem_pthread.dylib         0x186f21758         start_wqthread

Thread 5
0   libsystem_pthread.dylib         0x186f21758         start_wqthread

Thread 6 name: com.apple.uikit.eventfetch-thread
0   libsystem_kernel.dylib          0x330a8f634         mach_msg_trap
1   libsystem_kernel.dylib          0x330a8ea9c         mach_msg
2   CoreFoundation                  0x3301f3284         __CFRunLoopServiceMachPort
3   CoreFoundation                  0x3301ee3a4         __CFRunLoopRun
4   CoreFoundation                  0x3301edad8         CFRunLoopRunSpecific
5   Foundation                      0x31482a780         -[NSRunLoop(NSRunLoop) runMode:beforeDate:]
6   Foundation                      0x31482a660         -[NSRunLoop(NSRunLoop) runUntilDate:]
7   UIKitCore                       0x334dcae7c         -[UIEventFetcher threadMain]
8   Foundation                      0x31495b098         __NSThread__start__
9   libsystem_pthread.dylib         0x186f1dd88         _pthread_start

Thread 7
0   libsystem_pthread.dylib         0x186f21758         start_wqthread

Thread 9
0   libsystem_kernel.dylib          0x330ab1a7c         __workq_kernreturn
1   libsystem_pthread.dylib         0x186f1ebd0         _pthread_wqthread

Thread 10
0   libsystem_kernel.dylib          0x330ab1240         __semwait_signal
1   libsystem_c.dylib               0x186e6665c         nanosleep
2   libsystem_c.dylib               0x186e6645c         sleep
3   Sentry                          0x285e482a0         monitorCachedData (SentryCrashCachedData.c:151)
4   libsystem_pthread.dylib         0x186f1dd88         _pthread_start

Thread 11 name: SentryCrash Exception Handler (Secondary)
0   libsystem_kernel.dylib          0x330a8f634         mach_msg_trap
1   libsystem_kernel.dylib          0x330a8ea9c         mach_msg
2   Sentry                          0x285e53a10         handleExceptions (SentryCrashMonitor_MachException.c:281)
3   libsystem_pthread.dylib         0x186f1dd88         _pthread_start

Thread 13
0   libsystem_kernel.dylib          0x330a8f634         mach_msg_trap
1   libsystem_kernel.dylib          0x330a8ea9c         mach_msg
2   CoreFoundation                  0x3301f3284         __CFRunLoopServiceMachPort
3   CoreFoundation                  0x3301ee3a4         __CFRunLoopRun
4   CoreFoundation                  0x3301edad8         CFRunLoopRunSpecific
5   CoreFoundation                  0x3301ee824         CFRunLoopRun
6   CoreMotion                      0x322c113b4         CLClientCreateIso6709Notation
7   libsystem_pthread.dylib         0x186f1dd88         _pthread_start
@danieldekerlegand
Copy link

Any updates on this issue? We're on version 2.19.1, and we've been receiving reports of crashes that appear to be related to TransferUtility multipart uploads. We seem to receive more crash reports right after releasing a new version of the app. For months we weren't receiving any crash logs, but now that we've received a few, they all appear to be related to createTemporaryFileForPart. Most of the stack traces appear almost identical to the one above, involving retryUploadSubTask via linkTransfersToNSURLSession. However, we do have one outlier stemming from internalUploadFileUsingMultiPart.

Until there is a fix, for now we've decided to avoid using multipart uploads for files < 5 GB. However, is there a way to clear all existing subtasks (ie. from the database)? We were hoping that doing so on startup might help to prevent the app from crashing.

@palpatim
Copy link
Member

palpatim commented Mar 2, 2021

@diesal11 & @danieldekerlegand Thanks for reporting the issue and for the context.

The implicated method (linkTransfersToNSURLSession:tempTransferDictionary:completionHandler) is invoked during creation of a new TransferUtility instance (e.g., from +[AWSS3TransferUtility registerS3TransferUtilityWithConfiguration:transferUtilityConfiguration:forKey:completionHandler:] or one of the TransferUtility init) methods. How are your apps creating a TransferUtility instance? How many TransferUtility instances does it create, and at what points in the app lifecycle?

@palpatim palpatim added pending-community-response Issue is pending response from the issue requestor and removed pending-triage Issue is pending triage labels Mar 2, 2021
@danieldekerlegand
Copy link

@palpatim Thanks for getting back to me. After your question, we did a code review. In general, the TransferUtility instance should have only been created once upon app startup, but we did realize that if the app switched back and forth between nonmultipart uploads and multipart uploads, it could potentially create a second instance. I don't think that's necessary, so we've refactored the code some so that only once instance can ever be created, which is done at startup as follows:

  AWSS3TransferUtilityConfiguration *transferUtilityConfiguration = [[AWSS3TransferUtilityConfiguration alloc] init];
    transferUtilityConfiguration.timeoutIntervalForResource = 60 * 60 * 24;
    transferUtilityConfiguration.retryLimit = 10;

  [AWSS3TransferUtility registerS3TransferUtilityWithConfiguration:configuration transferUtilityConfiguration:transferUtilityConfiguration forKey:instanceKey];

Do you think it's possible that the issue was due to having multiple instances, where one instance was attempting to rehydrate parts from the database that didn't "belong" to it? Ideally, because we aren't concerned with being able to resume an upload after the app has been force quit, we would like to try clearing the DB of parts on startup, so that we can hopefully avoid the crash.

@palpatim
Copy link
Member

Creating the TU instance at startup (or once before you begin processing transfers) is the recommended way of managing TU. The NSURLSessions are bound to the Transfer Utility's key (in your case, the instanceKey parameter), so if you were creating utilities with the same key value, there could be conflict there.

...one instance was attempting to rehydrate parts from the database that didn't "belong" to it

That seems like the most likely scenario. If the two TU instances were registered with the same key value at the same time, then they would have been running through their init methods at the same time, and contending for the file system locks on their multi-upload parts.

Let me know if your refactor works to eliminate the crash.

@danieldekerlegand
Copy link

@palpatim We've refactored our code to make sure that only one TU instance per key can exist at a time, and we haven't been able to replicate any createTemporaryFileForPart crashes coming from linkTransfersToNSURLSession. However, we have been able to replicate one coming from internalUploadFileUsingMultiPart. We experienced this crash while attempting to upload a ~ 5.5 GB file with a reported available space of ~20 GB according to the iOS Settings page.

Screen Shot 2021-03-18 at 5 30 29 AM

Along with this stack trace, we were able to find a few other TransferUtility-related device logs during crashes encountered while starting uploads with lower available disk space, such as < 10 GB, which appeared to occur even after restarting the app:

2021-03-17 23:25:25.444792-0500 tuploader[1247:224071] [logging-persist] os_unix.c:44580: (0) open(/var/mobile/Containers/Data/Application/5E9E9891-29A0-4C52-A185-D9AEF7B0669C/Library/Caches/S3TransferUtility/com/amazonaws/AWSS3TransferUtility/transfer_utility_database-journal) - Undefined error: 0
2021-03-17 23:25:25.444889-0500 tuploader[1247:224071] Unknown error calling sqlite3_step (14: unable to open database file) eu
2021-03-17 23:25:25.444933-0500 tuploader[1247:224071] DB Query: DELETE FROM awstransfer WHERE transfer_id=:transfer_id
2021-03-17 23:25:25.444973-0500 tuploader[1247:224071] Unknown error finalizing or resetting statement (14: unable to open database file)
2021-03-17 23:25:25.445003-0500 tuploader[1247:224071] DB Query: DELETE FROM awstransfer WHERE transfer_id=:transfer_id
2021-03-17 23:25:30.015113-0500 tuploader[1247:224194] Task <C1FED4B7-51C4-4B7D-8A83-77F12824F7D3>.<975> finished with error [28] Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device"

We're finding that we can fairly reliably reproduce crashing behaviors when attempting to upload a large file > 5 GB with available disk space of <= 20 GB, with crashes and failed uploads becoming more and more frequent as disk space decreases.

@palpatim
Copy link
Member

My initial thought is that iOS is reaping temporary files to reclaim space, and the multipart files are being removed before they're uploaded.

Since the system behavior isn't under our control, if my hypothesis is correct, the best option I can think of is to move multipart temp files into a non-Cache, non-Temporary app-specific directory like Documents/some/transferutility/path, that is also tagged to not be backed up to iCloud.

The downside to that would be that if we have any catastrophic failures that prevent us from properly deleting/cleaning up orphaned files (e.g., crashes that corrupt the internal state of the database queue), those files would persist indefinitely since iOS wouldn't attempt to clean them up. We'll add this to our backlog to consider the best ways of handling this.

@palpatim palpatim added pending-triage Issue is pending triage and removed pending-community-response Issue is pending response from the issue requestor labels Mar 24, 2021
@danieldekerlegand
Copy link

@palpatim Thanks, that makes sense and is along the lines of what I was anticipating. Let me know if you need any more information on reproducing the crash.

@diegocstn diegocstn added the follow up Requires follow up from maintainers label Jul 13, 2021
@royjit royjit added bug Something isn't working and removed pending-triage Issue is pending triage labels Jul 29, 2021
@royjit
Copy link
Contributor

royjit commented Aug 2, 2021

Further investigation: TU creates temporary files during upload here and the background session for upload also creates temporary files as per this Apple doc. This will put strain on the available memory on the OS. TU creates these temporary file when the upload task, we need to figure out a better way to do this logic.

@royjit
Copy link
Contributor

royjit commented Aug 2, 2021

Adding to the list of items to check - See if a single instance of TU have any threading issue and the file write is working in a thread safe manner.

@brennanMKE brennanMKE self-assigned this Sep 17, 2021
brennanMKE pushed a commit that referenced this issue Sep 23, 2021
* method has experienced crashes and uses low level POSIX API
* uses higher level FileManager API with error checking
* includes unit test for new implementation
brennanMKE added a commit that referenced this issue Sep 23, 2021
* fix(S3): rewrite create partial file function (#2314)

* method has experienced crashes and uses low level POSIX API
* now uses higher level FileManager API with error checking
* includes unit test for new implementation
@brennanMKE brennanMKE added the pending-release Code has been merged but pending release label Sep 23, 2021
@brennanMKE
Copy link
Contributor

@diesal11 The code for creating the partial file has been updated so that it is more reliable. It will be included in the next release.

@brennanMKE brennanMKE added closing soon and removed follow up Requires follow up from maintainers labels Sep 23, 2021
@lawmicha
Copy link
Member

lawmicha commented Oct 1, 2021

2.26.1 has been released https://github.com/aws-amplify/aws-sdk-ios/releases/tag/2.26.1

@brennanMKE brennanMKE removed pending-release Code has been merged but pending release closing soon labels Oct 4, 2021
@brennanMKE
Copy link
Contributor

@diesal11 Please let us know if you are not able to repro the crash with your code with this update. We can reopen the issue if the crash is still happening. The new code replaces the low level POSIX calls with higher level Objective-C methods which Apple supports along with the changes which have been made to the filesystem implementation. My expectation is that using NSFileHandle will behave better and if there is a problem the reason will be logged with error being set by Apple's API. You can see there is revised API available since iOS 13 communicates failures from the internal implementation. If anything does happen we can get more detail this way. Most users are likely running iOS 13 and later and will use these newer methods.

gabek pushed a commit to KeepSafe/aws-sdk-ios that referenced this issue Aug 31, 2023
…-amplify#3786)

* fix(S3): rewrite create partial file function (aws-amplify#2314)

* method has experienced crashes and uses low level POSIX API
* now uses higher level FileManager API with error checking
* includes unit test for new implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working s3 Issues related to S3
Projects
None yet
Development

No branches or pull requests

8 participants