Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: osd: do not send ENXIO on misdirected op by default #13255

Merged
merged 1 commit into from Feb 22, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Feb 3, 2017

In practice this tends to get bubbled up the stack as an error on
the caller, and they usually do not handle it properly.  For example,
with librbd, this turns into EIO and break the VM.

Instead, this will manifest as a hung op on the client.  That is
also not ideal, but given that the root cause here is generally a
bug, it's not clear what else would be better.

We already log an error in the cluster log, so teuthology runs will
continue to fail.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 923e7f5)

# Conflicts:
#	PendingReleaseNotes
#	src/common/config_opts.h
@liewegas liewegas added this to the jewel milestone Feb 3, 2017
@ghost ghost added the core label Feb 13, 2017
@ghost ghost self-assigned this Feb 13, 2017
@smithfarm smithfarm changed the title osd: do not send ENXIO on misdirected op by default jewel: osd: do not send ENXIO on misdirected op by default Feb 13, 2017
@smithfarm
Copy link
Contributor

@jdurgin @liewegas This passed an upgrade/hammer-x run at http://tracker.ceph.com/issues/17851#note-82 and it also underwent a rados run at http://tracker.ceph.com/issues/17851#note-88 with the following results:

I re-ran the "4 other failures" with the following results:

  • 3 jobs passed
  • one failure reproduced from the previous run: "rados/verify/{1thrash/none.yaml clusters/{fixed-2.yaml openstack.yaml} fs/btrfs.yaml msgr-failures/few.yaml msgr/async.yaml rados.yaml tasks/rados_api_tests.yaml validater/lockdep.yaml}"
2017-02-22T15:26:29.695 INFO:tasks.workunit.client.0.smithi114.stdout:                 api_misc: test/librados/misc.cc:71: Failure
2017-02-22T15:26:29.695 INFO:tasks.workunit.client.0.smithi114.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2017-02-22T15:26:29.695 INFO:tasks.workunit.client.0.smithi114.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (43 ms)

Log of the reproducibly failed test: http://qa-proxy.ceph.com/teuthology/smithfarm-2017-02-22_15:06:17-rados-pr-13255-distro-basic-smithi/847000/teuthology.log

Do you think the above test results are sufficient to merge this PR?

@jdurgin
Copy link
Member

jdurgin commented Feb 22, 2017

the rados_api_test failure is a known issue - http://tracker.ceph.com/issues/15368

The rest are unrelated to this change, so I think this is ready to merge

@smithfarm smithfarm merged commit 9f36610 into ceph:jewel Feb 22, 2017
We now only send the ENXIO reply if the osd_enxio_on_misdirected_op option
is enabled (it's off by default). This means that a VM using librbd that
previously would have gotten an EIO and gone read-only will now see a
blocked/hung IO instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas This didn't make it into the 10.2.6 release notes, so it can be deleted now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean... if the patch is in 10.2.6 then we should add this to the 10.2.6 release notes. (It's a bit confusing because these pending notes live in the jewel etc branches but the notes are all in master :/ )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you send it in the release notes, I just merged it in master before announcing the release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's in 10.2.6 - @theanalyst Can we squeeze this in still?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lets do it, I'll update the blog, can send an addition in mail too if we want

@liewegas liewegas deleted the wip-enxio-jewel branch March 8, 2017 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants