Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POST operations need to handle intermittent network failure. #106

Open
samoconnor opened this issue Mar 17, 2018 · 5 comments · Fixed by JuliaWeb/HTTP.jl#236
Open

POST operations need to handle intermittent network failure. #106

samoconnor opened this issue Mar 17, 2018 · 5 comments · Fixed by JuliaWeb/HTTP.jl#236

Comments

@samoconnor
Copy link

The GitHub doc says that they have been careful to use HTTP methods with semantics appropriate to the API operations.

HTTP.jl retries most request automatically in the event of network failure. However, POST operations are not generally idempotent and cannot be automatically retried. GitHub.jl should implement system-state-aware retry loops around POST requests. i.e. when a POST operation throws an HTTP.IOError GitHub.jl should retry the operation in a way that avoids duplicate changes to system state.

In the case of the create_status operation that is the subject of JuliaWeb/HTTP.jl#220, POST /repos/:owner/:repo/statuses/:sha, it seems that double-posting would create two statuses, https://developer.github.com/v3/repos/statuses/, so in the event of a failure, the GitHub.jl create_status function should do a GET to check if the status was created before retrying (or, if duplicate statuses are not considered harmful to overall system correctness, just set retry_non_idempotent=true).

@api_default function create_status(api::GitHubAPI, repo, sha; options...)
result = gh_post_json(api, "/repos/$(name(repo))/statuses/$(name(sha))"; options...)
return Status(result)
end

See also JuliaWeb/HTTP.jl#214.

This issue may apply in other places that use gh_post_json.

@ararslan
Copy link
Member

Help with this would be very much appreciated. This is causing significant issues with Nanosoldier, which is critical community infrastructure.

@KristofferC
Copy link
Collaborator

Do you know what postrequest that is failing? If it is an idempotent one then we could just run with retry_non_idempotent=true for that request?

@samoconnor
Copy link
Author

@KristofferC the request that is failing is create_status. As described above "it seems that double-posting would create two statuses", i.e. it is not idempotent.

I would like to help out, but I need to see more detailed logs of what is going on with Nanosoldier: JuliaWeb/HTTP.jl#220 (comment)
JuliaWeb/HTTP.jl#220 (comment).

@samoconnor
Copy link
Author

See log analysis: JuliaWeb/HTTP.jl#220 (comment)

To deal with the immediate practical problem in the Nanosoldier's use case I recommend modifying GitHub.jl to:

  • set retry_non_idempotent=true, restoring the old behaviour of blindly retrying POST requests; or
  • set idle_timeout=30 or idle_timeout=20 to remove/reduce the incidence of attempting to reuse a connection that has in-fact become unusable; or
  • (as a last resort) set reuse_limit=0 to disable connection re-use completely.

(To be reliable, GitHub.jl will still need to implement state-aware retry of non-idepmontent operations, but the changes above should improve failure rates in the short term).

vtjnash pushed a commit to JuliaWeb/HTTP.jl that referenced this issue Apr 17, 2018
When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106
quinnj pushed a commit to JuliaWeb/HTTP.jl that referenced this issue May 31, 2018
* ConnectionPool: monitor idle connections

When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106

*  - Encapsulate read|writebusy/sequence/count logic in new isbusy function.
 - Move close() on eof() || !isbusy() to new monitor_idle_connection function.
 - Make monitor_idle_connection() a noop for ::Connection{SSLContext}

* require Julia 0.6.3 #236 (comment)
@samoconnor
Copy link
Author

I think this probably should stay open per #106 (comment): "still need to implement state-aware retry of non-idepmontent operations"

@KristofferC KristofferC reopened this May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants