New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: fix boost::asio::async_write() does not return error... #35904
rgw: fix boost::asio::async_write() does not return error... #35904
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not knowledgeable enough to say this works, but it looks very plausible; I take it this prevents further calls on the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, didn't mean to approve before getting feedback
src/rgw/rgw_asio_frontend.cc
Outdated
| @@ -69,6 +69,9 @@ class StreamIO : public rgw::asio::ClientIO { | |||
| yield[ec]); | |||
| if (ec) { | |||
| ldout(cct, 4) << "write_data failed: " << ec.message() << dendl; | |||
| if (ec==boost::asio::error::broken_pipe) { | |||
| stream.lowest_layer().shutdown(tcp::socket::shutdown_both, ec); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're throwing ec as a rgw::io::Exception just below, but this call to shutdown() will mutate ec. maybe use a separate variable boost::system::error_code ec_ignored for the call to shutdown()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intention was to override the ec to indicate the specific condition has occured
for example the error printed to the log at
Line 292 in 4425f3e
| dout(0) << "ERROR: client_io->complete_request() returned " |
changes from
7fb47eedc700 0 ERROR: client_io->complete_request() returned Broken pipeto
7fc0e14da700 0 ERROR: client_io->complete_request() returned Transport endpoint is not connected
it is not essential, changing to ec_ignored
|
have you narrowed down exactly which parts of the RGWListBuckets op was making additional calls to write after errors were returned? it sounds like we're missing some error handling, and i'd like to address that too if we can |
be5e45a
to
f7d43f3
Compare
The flow is as follows from process_request(...): Line 169 in 4425f3e
the request is submited at: Line 235 in 4425f3e
completing the requst: Lines 288 to 290 in 4425f3e
and caught at: Line 291 in 4425f3e
|
src/rgw/rgw_process.cc
Outdated
| @@ -291,6 +291,8 @@ int process_request(rgw::sal::RGWRadosStore* const store, | |||
| } catch (rgw::io::Exception& e) { | |||
| dout(0) << "ERROR: client_io->complete_request() returned " | |||
| << e.what() << dendl; | |||
| perfcounter->inc(l_rgw_qlen, -1); | |||
| perfcounter->inc(l_rgw_qactive, -1); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we address this issue in a separate pr? the counters should be encapsulated by the frontends, and shouldn't leak into process_request()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done & working on handling it at the frontend level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @mkogan1 , is the pr fix perfcounter->inc(l_rgw_qlen, -1); already submit? i search no result for it, if have not summit, i would like to submit a new pr about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f7d43f3
to
e02aff0
Compare
although remote has closed the connection Fixes: https://tracker.ceph.com/issues/46332 Signed-off-by: Mark Kogan <mkogan@redhat.com>
e02aff0
to
c997eb6
Compare
although remote has closed the connection
Fixes: https://tracker.ceph.com/issues/46332
Signed-off-by: Mark Kogan mkogan@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard backendjenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox