Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cloud: ResumingReader incorrectly forgets to call .Close() on its ReadCloser #75143

Closed
knz opened this issue Jan 19, 2022 · 2 comments · Fixed by #75149
Closed

pkg/cloud: ResumingReader incorrectly forgets to call .Close() on its ReadCloser #75143

knz opened this issue Jan 19, 2022 · 2 comments · Fixed by #75149
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery

Comments

@knz
Copy link
Contributor

knz commented Jan 19, 2022

Describe the problem

Since go 1.17.6, it has become critical to call .Close() on the Body field of a http Response object, for otherwise we get goroutine leaks.
(see #74657, golang/go#50652 and #75136)

Thanks to #75136, we're now doing a pretty good job of doing so in most cases, except for the following:

  • cloud/httpsink/(*httpStorage).ReadFileAt calls openStreamAt which returns a http.Response
  • that Response's Body is then passed as an io.ReadCloser to cloud.NewResumingReader,
  • which, in turns does not call .Close() when the Reader is reset in Read(), when RetryOnErrFn returns true

(NB I've found it hard to read through the implementation of the methods of cloud.ReadResumer, it was expecting some kind of disciplined centralization of the invariants on the ReadCloser - maybe there's some improvement to make there)

cc @adityamaru for triage

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery T-disaster-recovery labels Jan 19, 2022
@knz knz added this to Triage in Disaster Recovery Backlog via automation Jan 19, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jan 19, 2022

cc @cockroachdb/bulk-io

@adityamaru
Copy link
Contributor

Thanks for digging into this! I think we can add the Close() call before we nil it out over here https://github.com/cockroachdb/cockroach/blob/a9077697a77aa41f1cd09d3173616be37e85e1bf/pkg/cloud/cloud_io.go#L234-L234, I can send a PR.

@adityamaru adityamaru self-assigned this Jan 19, 2022
craig bot pushed a commit that referenced this issue Jan 19, 2022
74920: server: create new endpoint that return the roles of the sql user r=maryliag a=maryliag

The commit creates a new endpoint `/sqlroles` that returns
a list of roles of the SQL user logged in.

Partially addresses #74817

Release note (api change): Creation of new endpoint `/sqlroles` that
returns a list of the SQL roles for the SQL user logged in.

75088: sql: migrate has_sequence_privilege from evalPrivilegeCheck to ctx.Pl… r=otan a=ecwall

…anner.HasPrivilege

refs #66173

Migrate has_sequence_privilege from evalPrivilegeCheck to ctx.Planner.HasPrivilege.

Release note: None

75116: bazel: update comments in `BUILD.bazel`, include reference to `dev -h` r=irfansharif a=rickystewart

Most of this stuff is out-of-date at this point.

Release note: None

75145: sql: deflake TestTelemetry r=rytaft a=rytaft

This commit deflakes `TestTelemetry` by adding a more precise
`feature-allowlist`.

Fixes #75138

Release note: None

75149: cloud: close Reader before resetting in ResumingReader r=knz a=adityamaru

This change `Close()`s the Reader before resetting it when we
encounter a resumable error in the ResumingReader. This is particularly
important for the http external storage provide, since forgetting to
call Close() results in goroutine leaks from go1.17.6 onwards.

See: golang/go#50652

Fixes: #75143

Release note: None

Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com>
Co-authored-by: Evan Wall <wall@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
@craig craig bot closed this as completed in #75149 Jan 19, 2022
Disaster Recovery Backlog automation moved this from Triage to Done Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery
Development

Successfully merging a pull request may close this issue.

2 participants