Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Configuration Threading Logic #807

Merged
merged 17 commits into from
Apr 14, 2022
Merged

Conversation

sarahkoop
Copy link
Contributor

@sarahkoop sarahkoop commented Apr 12, 2022

Summary of changes

  • Update demo app to vault Venmo and use Payment Context
  • Update demo app to not embed & sign Magnes framework (static framework should not be embedded)
  • Update configuration cache request policy
  • Moved logic for dispatch queue of configuration calls
  • Remove unnecessary flushing of analytics when app enters background
  • Throttle requests for configuration to reduce load on servers and network connection lost errors
  • Add analytics events for network connection lost errors in Venmo flow

Note: The scope of impact for this is older devices running older iOS and Venmo app versions

Checklist

  • Added a changelog entry

Authors

List GitHub usernames for everyone who contributed to this pull request.

sarahkoop and others added 15 commits April 8, 2022 11:26
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Co-authored-by: Sarah Koop <skoop@paypal.com>
Co-authored-by: Sarah Koop <skoop@paypal.com>
Co-authored-by: Sarah Koop <skoop@paypal.com>
Co-authored-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Co-authored-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
Signed-off-by: Sarah Koop <skoop@paypal.com>
@sarahkoop sarahkoop requested a review from a team as a code owner April 12, 2022 21:44
waitForElementToBeHittable(mockVenmo.buttons["SUCCESS WITH PAYMENT CONTEXT"])
mockVenmo.buttons["SUCCESS WITH PAYMENT CONTEXT"].tap()

XCTAssertTrue(demoApp.buttons["Got a nonce. Tap to make a transaction."].waitForExistence(timeout: 15))
XCTAssertTrue(demoApp.buttons["Failed to store Venmo Account in vault"].waitForExistence(timeout: 15))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UI test if only used to test that the SDK successfully switches to the mock Venmo app and returns. Since we updated the demo app to include vaulting, this flow now ultimately fails, because the mock Venmo app does not return a valid nonce for vaulting. We decided to just update this error message, so that we can keep vaulting in the demo app flow for future testing, and since this test still covers its intended scope.

@@ -90,7 +90,8 @@ - (nullable instancetype)initWithAuthorization:(NSString *)authorization sendAna
configurationCache = [[NSURLCache alloc] initWithMemoryCapacity:1 * 1024 * 1024 diskCapacity:0 diskPath:nil];
});
configuration.URLCache = configurationCache;
configuration.requestCachePolicy = NSURLRequestReturnCacheDataElseLoad;
// Use the caching logic defined in the protocol implementation, if any, for a particular URL load request.
configuration.requestCachePolicy = NSURLRequestUseProtocolCachePolicy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random q: does config endpoint enforce caching with HTTP headers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like passing a Cache-Control header? If so the preferred method for NSURLSession is to use the requestCachePolicy on the configuration as there is some weirdness with headers not being respected (for example NSURLSession also ignores Keep-Alive headers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right yeah I guess that's a question for the gateway actually. If config has caching behavior built in we could leverage that on Android too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah certainly something to look into for both SDKs

#pragma mark - Private methods

- (void)appWillResign:(NSNotification *)notification {
UIApplication *application = notification.object;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Are analytics all processed in the foreground now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't change the thread where analytics are being processed which is done in the sendAnalyticsEvent function. Here we were unnecessarily flushing analytics on the background thread and updating this results in all of the analytics events being sent as expected, so was unnecessarily adding strain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah background tasks do still run when the app is closed by the user, it may be good to do this in the future. On Android we're using WorkManager to do timed uploads every 30 seconds, I'm wondering if iOS has something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, previously we were getting a ton of flush analytics errors which this resolves. I think we should certainly look into something similar in the future. When we re-write the core module in Swift I think there is certainly a lot of optimization to be done as we move to newer APIs with those changes in the future.

}];
[task resume];
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the httpRequestWithCaching: only used for BTConfiguration requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! We only want to cache configuration requests (not anything else)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the overall solution to prevent multiple BT API requests within a 1 second timeframe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, essentially we are throttling this request by 0.1 seconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh aight Analytics HTTP requests happen as soon as they're triggered on iOS that is rough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mean to nit I genuinely have questions lol. Did we choose 0.1 because it worked best in testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we tried 0.05 and 0.075 but we still saw an increased number of Network connection lost errors. 0.1 seems to be the least amount of time that still improved the errors.

Copy link
Member

@demerino demerino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate to say that testing this solution is heavily dependent on using an older device? What is the monitoring strategy to identify these dropped requests? This looks like a good pass at the fix, any reservations I have are around observability and testing moving forward to catch future issues.


// The increase in speed of API calls with cached configuration caused an increase in "network connection lost" errors.
// Adding this delay allows us to throttle the network requests slightly to reduce load on the servers and decrease connection lost errors.
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 0.1 * NSEC_PER_SEC), dispatch_get_main_queue(), ^{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to put the 0.1 as a constant somewhere? What are the odds that we will want to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. we probably shouldn't change this value without the same amount of testing we did initially (we also tested smaller increments and this seemed to be the shortest amount of time that improved the issue). It's also not used elsewhere so I think leaving it here is ok, but happy to change it if others disagree

@sarahkoop
Copy link
Contributor Author

Is it accurate to say that testing this solution is heavily dependent on using an older device? What is the monitoring strategy to identify these dropped requests? This looks like a good pass at the fix, any reservations I have are around observability and testing moving forward to catch future issues.

Yes, testing heavily depended on an older Venmo app version on an older device. We believe the older Venmo app version was more crucial to testing because there was a big change in loading time between Venmo app versions (newer versions of the app have a loading screen and the app switch process takes much longer than older versions, where we switch back to the merchant app almost immediately and continue with API calls). We didn't have a newer iOS versioned device with an older Venmo app, so couldn't confirm if the iOS version also plays a large role.

We are planning to also meet with the observability team to discuss client side monitoring in general and hopefully improve our process for catching these issues sooner.

Copy link
Contributor

@sshropshire sshropshire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a good reference PR for starting the Swift refactor. A serial dispatch queue could help in the future to halt outbound requests until the config is available.

@jaxdesmarais jaxdesmarais merged commit 8cdd80a into master Apr 14, 2022
@jaxdesmarais jaxdesmarais deleted the config-cache-cleanup branch April 14, 2022 14:29
@scannillo
Copy link
Contributor

👋 Just catching up on some PRs while I was away - this is definitely an interesting one. I'm curious @jaxdesmarais - do you have background info on how this issue arose? Was it a specific merchant request? What exactly did they report?

@jaxdesmarais
Copy link
Contributor

Yeah, this was a fun one for sure @scannillo 🙈

This issue started when caching was updated for us to actually cache configurations in PR #789 (which was not previously being done properly). This was a big brought to us by a large merchant where we were making millions of config calls instead of utilizing a cache on iOS.

After fixing that bug, we inadvertently introduced the bug resolved in this PR. Older versions of the Venmo app on older iOS versions do not include the same loading indicator that is included now as part of the Venmo flow. With us now caching the configuration, we were making it so that calls (and the return from the Venmo app) were happing too fast causing the connection to fail while we were waiting for the response. Previously we were relying on the call to fetchAndReturnRemoteConfig to essentially act as a time buffer to allow the background calls to be made.

The solution here is to throttle the requests slightly to avoid losing the connection for other requests during the Venmo app switch. The merchant was able to test this as well before merge to ensure they were no longer seeing the "Network Connection Lost" errors and we implemented the analytics events in this PR to allow us to monitor once this version of the SDK was ramped to 100 on the merchant side. This was also only an issue in Venmo app versions without the loading indicator buffer to allow for network calls and was not present in any other app switch flows.

Hope that explanation makes sense but also happy to chat more about it!

@scannillo
Copy link
Contributor

👋 @jaxdesmarais - thanks for that thorough explanation, this helps a lot!

I'm curious - so in the original PR #789, we are manually wiping and then and restoring URL responses to the cache, using the NSURLCache.sharedURLCache shared instance. I also see this cache instance. Is this leftover/not being used?

@jaxdesmarais
Copy link
Contributor

Yeah, the tl;dr is that we were accessing the cache in BTAPIClient, but we were never actually adding the config to the cache under the hood in BTHTTP.

This resulted in every time we attempted to access the cache here, nothing was present so we would then call fetchOrReturnRemoteConfiguration again, which called into BTHTTP to access the cache, but since nothing was actually cached just resulting in us always calling the API for a new config. PR #789 made it so that we would actually cache the config during the underlying method calls from BTAPIClient > BTHTTP to cache the config if there were no errors as well as access and use that cache if present.

We can likely clean this up even further during the Swift conversion though as this cache instance I don't believe is ever used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants