[FOSS] hot fix and improvement #2207

mingkun2020 · 2021-11-04T22:20:25Z

Companion stack for MSK, MQ tests
Throttling handle mechanism
Option for adding pipeline name as testing stack prefix
Update template runtime version

…on-model into develop

…ess-application-model into foss-hot-fix

integration/combination/test_function_with_alias.py

integration/helpers/stack.py

integration/combination/test_function_with_sns.py

integration/combination/test_api_with_authorizers.py

integration/resources/templates/combination/function_with_all_event_types.yaml

integration/single/test_basic_api.py

CoshUS · 2021-11-23T09:37:50Z

integration/helpers/deployer/utils/retry.py

+                try:
+                    return func(*args, **kwargs)
+                except exc:
+                    sleep_time = random.uniform(0, math.pow(2, retry_attempt) * delay)


What's the reason for adding jitter?
Shouldn't sleep time be calculated by delay +/- jitter instead of a random int between 0 to max delay?
The current approach on a 10th retry can range from 0 to 512 seconds. This seems to be a large range and I'm not sure what that will accomplish.

The Cloudformation'sdescribe_stack api that we use to check the stack status has hard limit(maybe 10 calls per sec) and cannot be changed. Since we run the tests in parallel(for example 10 thread in concurrent), we will have 10 calls in the same time(imagine a big call cluster in a small time frame). So we need to add jitter(add some randomness) to spread the calls over a large time frame to avoid the throttling. If we just use exponential back off, the call cluster will appear once again after exponential time and won't solve the issue, therefore we need the jitter.

delay +/- jitter similar to the Equal jitter and the one I used here range from 0 to exponential delay is a Full jitter. They don't have much different in performance. More detail can be referred from this blog: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

Why is the final delay generated with a random between 0 to max delay instead of plus minus a percentage of the max delay?
In the example of 10th retry, it could have only 1 second delay which defeats the purpose of exponential backoff.

Green as minimal, Red as maximum, and Purple as average.
This is the current implementation:

The minimal seems off to me and a really large range as retry count increases.

With a percentage based jitter:

Our goal is trying to spread the api calls among the test time frame as even as possible(If we use delay + jitter we might have some time block with no call). Thinking now how to do some simulation about this.

Tracking:

Current implementation is following the blog post: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

One assumption the blog post makes is that the calls will continue to happen until the job is complete. However, in our case, we will fail the test if it fails after an x amount of retries. Although the total amount of "Work" is less, there could be a case where a job rolls low numbers for 5 times and fails the execution.
I think there is a better solution for our use case than either full jitter or equal jitter.

@mingkun2020 will work on a simulator for our use case specifically to test out different approaches.

This comment is not blocking.

CoshUS

LGTM! Thanks for addressing the comments!

codecov-commenter · 2021-11-25T20:58:08Z

Codecov Report

Merging #2207 (aeeb334) into develop (0bc383f) will decrease coverage by 0.10%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #2207      +/-   ##
===========================================
- Coverage    94.43%   94.33%   -0.11%     
===========================================
  Files           95       95              
  Lines         6558     6598      +40     
  Branches      1325     1331       +6     
===========================================
+ Hits          6193     6224      +31     
- Misses         169      179      +10     
+ Partials       196      195       -1

Impacted Files	Coverage Δ
samtranslator/model/api/api_generator.py	`93.24% <0.00%> (-1.42%)`	⬇️
samtranslator/swagger/swagger.py	`93.27% <0.00%> (-0.73%)`	⬇️
samtranslator/model/api/http_api_generator.py	`91.21% <0.00%> (-0.63%)`	⬇️
samtranslator/model/lambda_.py	`93.10% <0.00%> (ø)`
samtranslator/model/apigateway.py	`97.69% <0.00%> (ø)`
samtranslator/model/sam_resources.py	`92.35% <0.00%> (ø)`
...translator/plugins/api/implicit_rest_api_plugin.py	`100.00% <0.00%> (ø)`
samtranslator/model/eventsources/push.py	`92.33% <0.00%> (+0.02%)`	⬆️
...translator/plugins/api/implicit_http_api_plugin.py	`100.00% <0.00%> (+2.40%)`	⬆️
samtranslator/model/eventsources/pull.py	`92.89% <0.00%> (+2.84%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bc383f...aeeb334. Read the comment docs.

qingchm · 2021-11-25T23:45:28Z

integration/combination/test_api_settings.py

@@ -1,5 +1,7 @@
 import hashlib

+import pytest


Is this import used?

Unused import, removed it.

qingchm · 2021-11-25T23:57:58Z

integration/combination/test_function_with_deployment_preference.py

-        deployments = self.client_provider.code_deploy_client.list_deployments()["deployments"]
+        deployments = self.client_provider.code_deploy_client.list_deployments(
+            applicationName=application_name, deploymentGroupName=deployment_group
+        )["deployments"]


For client responses like this are we catching any possible client errors? Also if the client returned a None response due to some error ["deployments"] would also be error prone? (Maybe get("deployments") would be a better option

If it returns a None, it will break and the test will failed as expected(it shouldn't be none).

Reviewer is busy on other tasks and this pr already get get 2 approvals

mingkun2020 and others added 16 commits March 2, 2021 13:22

change yaml.load to yaml.safe_load for the security best practice

2aacd6d

use yaml_parse for consistant style

060102c

Merge branch 'develop' of https://github.com/aws/serverless-applicati…

a6ec11e

…on-model into develop

Merge branch 'develop' of https://github.com/aws/serverless-applicati…

b66e8bb

…on-model into develop

remove pillow library for image comparing, use hash instead

8bf03ce

make it compatible with py2

ceb3268

some bug fixes for pipeline failures

be3b418

Merge branch 'aws:develop' into foss-hot-fix

893735e

better handling throttling issue

58b00fe

Merge branch 'foss-hot-fix' of https://github.com/mingkun2020/serverl…

b20e646

…ess-application-model into foss-hot-fix

add companion stack for testing

975bc06

add companion stack

b79e69f

add companion stack exist check

26cd2f6

add stack prefix

3c15955

update runtime

b1fff6f

Merge branch 'develop' into foss-hot-fix

bd15f33

mingkun2020 changed the title ~~Foss hot fix~~ [FOSS] hot fix and improvement Nov 4, 2021

mndeveci previously requested changes Nov 5, 2021

View reviewed changes

integration/combination/test_function_with_alias.py Outdated Show resolved Hide resolved

integration/helpers/stack.py Outdated Show resolved Hide resolved

integration/combination/test_function_with_sns.py Outdated Show resolved Hide resolved

moelasmar added the pr/internal label Nov 10, 2021

avoid deleting companion stack while prefix is provided

092d4e4

CoshUS suggested changes Nov 23, 2021

View reviewed changes

mingkun2020 added 5 commits November 23, 2021 14:14

fix some throttling issues

8fe45ba

move service name to constants, addressed comments

7b8e008

add service name constant file

3f82302

remove unused testing lines

53474cf

fix error

5330727

mingkun2020 requested a review from mndeveci November 25, 2021 20:23

pass schedule name as parameter

a4c1da6

mingkun2020 requested a review from CoshUS November 25, 2021 20:50

black reformat

a35def2

CoshUS approved these changes Nov 25, 2021

View reviewed changes

qingchm reviewed Nov 25, 2021

View reviewed changes

remove unused imort

aeeb334

qingchm reviewed Nov 25, 2021

View reviewed changes

qingchm approved these changes Nov 26, 2021

View reviewed changes

mingkun2020 merged commit d2cf4b7 into aws:develop Dec 3, 2021

mingkun2020 deleted the foss-hot-fix branch December 3, 2021 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FOSS] hot fix and improvement #2207

[FOSS] hot fix and improvement #2207

mingkun2020 commented Nov 4, 2021

CoshUS Nov 23, 2021

mingkun2020 Nov 23, 2021 •

edited

CoshUS Nov 23, 2021

CoshUS Nov 23, 2021 •

edited

mingkun2020 Nov 24, 2021 •

edited

CoshUS Nov 24, 2021

CoshUS left a comment

codecov-commenter commented Nov 25, 2021 •

edited

qingchm Nov 25, 2021

mingkun2020 Nov 25, 2021

qingchm Nov 25, 2021

mingkun2020 Nov 26, 2021

[FOSS] hot fix and improvement #2207

[FOSS] hot fix and improvement #2207

Conversation

mingkun2020 commented Nov 4, 2021

CoshUS Nov 23, 2021

Choose a reason for hiding this comment

mingkun2020 Nov 23, 2021 • edited

Choose a reason for hiding this comment

CoshUS Nov 23, 2021

Choose a reason for hiding this comment

CoshUS Nov 23, 2021 • edited

Choose a reason for hiding this comment

mingkun2020 Nov 24, 2021 • edited

Choose a reason for hiding this comment

CoshUS Nov 24, 2021

Choose a reason for hiding this comment

CoshUS left a comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 25, 2021 • edited

Codecov Report

qingchm Nov 25, 2021

Choose a reason for hiding this comment

mingkun2020 Nov 25, 2021

Choose a reason for hiding this comment

qingchm Nov 25, 2021

Choose a reason for hiding this comment

mingkun2020 Nov 26, 2021

Choose a reason for hiding this comment

mingkun2020 Nov 23, 2021 •

edited

CoshUS Nov 23, 2021 •

edited

mingkun2020 Nov 24, 2021 •

edited

codecov-commenter commented Nov 25, 2021 •

edited