Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway: Adds parallel retry logic to get account information. #2355

Merged
merged 12 commits into from
Apr 1, 2021

Conversation

j82w
Copy link
Contributor

@j82w j82w commented Mar 31, 2021

Pull Request Template

Description

This reduces the time to get the account information from over 70 seconds if the primary region gateway is down to a little over 5 seconds if the user has preferred regions configured. The get account information is needed on SDK initialization and on certain failures which requires getting the updated account information to check for failovers.

Current SDK logic
Global Endpoint -> 5 sec
Global Endpoint -> 10 sec
Global Endpoint -> 20 sec
Primary Endpoint (which most of the time is the same as the global endpoint) -> 5 sec
Primary Endpoint -> 10 sec
Primary Endpoint -> 20 sec
Secondary Endpoint -> success

Total time to get the success from the secondary region is over 70 seconds + time for the secondary endpoint to respond

New SDK logic

        /// This gets the account information
        /// 
        /// Source Task        
        /// Creates Task 1,2 ->                |    Task 1                         |    Task 2          |                 
        ///                                    | Global endpoint -> 10 sec -> fail | Timer wait 5 sec   |
        /// Waits for Any on (Task1, Task2)    | still waiting on response         | Timer is done      |
        /// Creates Task3, Task4                                                                        |     Task 3                                |     Task 4                                 |   
        ///                                                                                             | 1st preferred location -> 10 sec -> fail  | 2nd preferred location -> 1 sec -> success |
        ///                                                                                             | still waiting on response                 | returns success                            |
        /// Waits for Any on (Task1, Task3, Task 4). Task 4 is done return the account information.
        /// Other tasks log the exception or just ignore the response
        /// This gets the account information with multiple preferred region failures
        /// 
        /// Source Task        
        /// Creates Task 1,2 ->                |    Task 1                         |    Task 2          |                 
        ///                                    | Global endpoint -> 10 sec -> fail | Timer wait 5 sec   |
        /// Waits for Any on (Task1, Task2)    | still waiting on response         | Timer is done      |
        /// Creates Task3, Task4                                                                        |     Task 3                                |     Task 4                               |   
        ///                                                                                             | 1st preferred location -> 10 sec -> fail  | 2nd preferred location 3 sec -> fail     |
        ///                                                                                             | 4th preferred location 7 sec -> fail      | 3rd preferred location 1 sec -> fail     |
        ///                                                                                             | still waiting on response                 | 5th preferred location 1 sec -> success  |
        /// Waits for Any on (Task1, Task3, Task 4). Task 4 is done return the account information.
        /// Other tasks log the exception or just ignore the response

Total time to get the success from the secondary region is 5 seconds + time for the secondary endpoint to respond

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

…educes the latency of the SDK being able to handle if there is a regional outage.
@j82w j82w merged commit 7ccdc83 into master Apr 1, 2021
@j82w j82w deleted the users/jawilley/ha/regionOutage branch April 1, 2021 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants