Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not overwrite last error if it was previously set - fixes flaky tests #115

Merged
merged 1 commit into from Feb 11, 2020

Conversation

@aaslamin
Copy link
Contributor

aaslamin commented Feb 11, 2020

I believe the root cause of the flakiness is because the last status from the server is being overwritten in the event of a retry that resulted in the request context exceeding its deadline (or a network error). See:

https://github.com/aporeto-inc/manipulate/blob/master/maniphttp/manipulator.go#L670
https://github.com/aporeto-inc/manipulate/blob/master/maniphttp/manipulator.go#L678

Instead, what I believe it should do is only assign the NewErrCannotCommunicate to lastError IFF lastError != nil because there could have been a connection to the server which failed due to any of the following HTTP status codes: https://github.com/aporeto-inc/manipulate/blob/master/maniphttp/manipulator.go#L696-L719

The tests are failing because the last status is sometimes thrown away if there was a network connectivity error or the request context deadline exceeds (during a retry attempt to the test server).

For example, we have a test that brings up a server that returns 429 and we assert that we get back a manipulate.ErrTooManyRequests error. However, this test sometimes fails because a retry results in a manipulate.NewErrCannotCommunicate error.

This is happening because the last status from the server was thrown away in the two snippets I posted above 🔝

Flow:

  • Hit server, got back 429 (ok I will retry)
  • Hit server, got back 429 (ok I will retry)
  • Hit server, get back 429 (ok I will retry)
  • Hit server, get back 429 (ok I will retry)
  • (last retry attempt) This guy now fails randomly due to a network error or the sub-context deadline exceeding - the lastError is now overwritten w/ manipulate.NewErrCannotCommunicate

Now the function returns with a manipulate.NewErrCannotCommunicate error instead of a manipulate.ErrTooManyRequests error even though we reached the server in some of the retry attempts

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 11, 2020

Codecov Report

Merging #115 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #115      +/-   ##
==========================================
+ Coverage   69.72%   69.74%   +0.02%     
==========================================
  Files          38       38              
  Lines        3022     3024       +2     
==========================================
+ Hits         2107     2109       +2     
  Misses        834      834              
  Partials       81       81              

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4824241...30dd929. Read the comment docs.

@primalmotion primalmotion merged commit 687deec into master Feb 11, 2020
4 checks passed
4 checks passed
codecov/patch 100.00% of diff hit (target 69.72%)
Details
codecov/project 69.74% (+0.02%) compared to 4824241
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@primalmotion primalmotion deleted the fix-errors branch Feb 11, 2020
CyrilPeponnet added a commit that referenced this pull request Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.