New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: force promote for ops which ec base pool can't handle #5776
Conversation
Awesome (and really quick ;-) ./virtualenv/bin/teuthology-suite --priority 101 --suite rbd --filter='rbd/qemu/{cache/writethrough.yaml cachepool/ec-cache.yaml clusters/fixed-3.yaml fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}' --suite-branch master --distro ubuntu --ceph wonzhq-tmap-update --machine-type plana,burnupi,mira should confirm it works assuming you have built wonzhq-tmap-update with this pull request in the gitbuilders. |
http://pulpito.ceph.com/loic-2015-09-02_15:41:18-rbd-master---basic-multi/ is running the suite above on master to confirm it always fails and provide a simple way to reproduce the error. |
@wonzhq this should target infernalis and not master |
@dachary though I can access sepia to run teuthology, I'm not able to create a branch in the ceph repo. I'll need to ask someone else to do it for me. |
@wonzhq I'll push the branch for you |
@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi |
Please hold on this. The commit only addresses the problem for v1 image. For v2 image, we need to play around the |
@wonzhq I'll repush to the gitbuilders whenever you tell me to |
@liewegas can you help to take a look if the fix makes sense? Thanks! |
shouldn't this apply to all ops ec pools can't handle, i.e. everything that's not a class write, aligned append, or writefull? |
@jdurgin you are right. For all ops which the ec base pool can't handle, we can't proxy them. Besides WRITE, ZERO and TRUNCATE, the others are SYNC_READ, MAPEXT, SPARSE_READ, CLONERANGE, TMAPGET, TMAPPUT, TMAPUP, OMAPSETVALS, OMAPSETVALS, OMAPSETHEADER, OMAPCLEAR, OMAPRMKEYS. I'll make the changes for them. |
@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi |
@dachary thanks! I've run the teuthology test for this in http://pulpito.ceph.com/yuan-2015-09-03_19:24:10-rbd-wonzhq-tmap-update---basic-multi/. It ran successfully. |
@jdurgin do you think it is good to merge ? |
iter->op.op == CEPH_OSD_OP_TMAPGET || | ||
iter->op.op == CEPH_OSD_OP_TMAPPUT || | ||
iter->op.op == CEPH_OSD_OP_TMAPUP) { | ||
if (base_pool->require_rollback()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move the base_pool if so that it's the outter one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a safer approach would be to whitelist the things that the erasure pool can do (instead of blacklisting stuff it can't). And make a helper to determine that.. like pg_pool_t::support_osd_op(int opcode)? Similar to supports_omap()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, actually a helper is probably overkill since this method is the only user, and it may need to take additional information into account (like the whole OSDOp).
I still think flipping this around to be a whitelist is an improvement, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, a whitelist would be easier to keep up to date. For appends I think we need to check the length with the EC pool's alignment requirement
(iter->op.op != CEPH_OSD_OP_COPY_GET_CLASSIC) && | ||
(iter->op.op != CEPH_OSD_OP_COPY_GET) && | ||
(iter->op.op != CEPH_OSD_OP_COPY_FROM)) { | ||
op->set_promote(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove
cache_* (can't use ec for cache pools)
notify, watch (no watch/notify on ec pool)
from the list
-29> 2015-09-07 00:46:37.840258 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 2015-09-07 00:46:37.840192, event: throttled, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013) -28> 2015-09-07 00:46:37.840265 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 2015-09-07 00:46:37.840227, event: all_read, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013) -27> 2015-09-07 00:46:37.840271 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 0.000000, event: dispatched, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013) -26> 2015-09-07 00:46:37.840282 7fa36b07b700 20 osd.2 1016 should_share_map client.4547 10.214.136.114:0/1011485 1013 -25> 2015-09-07 00:46:37.840287 7fa36b07b700 20 osd.2 1016 client session last_sent_epoch: 0 versus osdmap epoch 1016 0> 2015-09-07 00:46:37.843087 7fa36b07b700 -1 *** Caught signal (Segmentation fault) ** in thread 7fa36b07b700 ceph version 9.0.3-1469-g2d7e480 (2d7e480) 1: (()+0x7ec80a) [0x55d2c7a0b80a] 2: (()+0x10340) [0x7fa372bc5340] 3: (OSD::init_op_flags(std::shared_ptr&)+0x125) [0x55d2c75374c5] 4: (OSD::handle_op(std::shared_ptr&, std::shared_ptr&)+0xbab) [0x55d2c7565c0b] 5: (OSD::dispatch_op_fast(std::shared_ptr&, std::shared_ptr&)+0x19e) [0x55d2c756695e] 6: (OSD::dispatch_session_waiting(OSD::Session, std::shared_ptr)+0x88) [0x55d2c7566c18] 7: (OSD::ms_fast_dispatch(Message)+0x234) [0x55d2c7566f64] 8: (AsyncConnection::process()+0x125a) [0x55d2c7bf3a0a] 9: (EventCenter::process_events(int)+0x41b) [0x55d2c7bb2d8b] 10: (Worker::entry()+0xd8) [0x55d2c7b92498] 11: (()+0x8182) [0x7fa372bbd182] 12: (clone()+0x6d) [0x7fa370f0447d] That's the version prior to your last push, but I'm not sure the change would affect it crashing? |
@liewegas is it possible |
For ops which the ec base pool can't handle, if they are proxied to the base ec pool, ENOTSUPP is returned. Need to force promote the objects into the cache pool. Fixes: ceph#12903 Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Looks like it's possible |
@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi . You should be able to run teuthology-suite --priority 101 --suite rbd --filter='rbd/qemu/{cache/writethrough.yaml cachepool/ec-cache.yaml clusters/fixed-3.yaml fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}' --suite-branch master --distro ubuntu --ceph wonzhq-tmap-update --machine-type plana,burnupi,mira to confirm it does the right thing once it's finished. |
On Mon, 7 Sep 2015, wonzhq wrote:
Hmm, only if this is called before we do the osdmap epoch check in the |
The rbd test case ran successfully at http://pulpito.ceph.com/yuan-2015-09-08_19:06:22-rbd-wonzhq-tmap-update---basic-multi/ |
lgtm |
osd: force promote for ops which ec base pool can't handle Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Loic Dachary <ldachary@redhat.com>
For ops which the ec base pool can't handle, if they are proxied to the
base ec pool, ENOTSUPP is returned. Need to force promote the objects
into the cache pool.
Fixes: #12903