Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: force promote for ops which ec base pool can't handle #5776

Merged
merged 1 commit into from Sep 12, 2015

Conversation

wonzhq
Copy link
Contributor

@wonzhq wonzhq commented Sep 2, 2015

For ops which the ec base pool can't handle, if they are proxied to the
base ec pool, ENOTSUPP is returned. Need to force promote the objects
into the cache pool.

Fixes: #12903

@ghost
Copy link

ghost commented Sep 2, 2015

Awesome (and really quick ;-)

./virtualenv/bin/teuthology-suite --priority 101 --suite rbd --filter='rbd/qemu/{cache/writethrough.yaml cachepool/ec-cache.yaml clusters/fixed-3.yaml fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}' --suite-branch master --distro ubuntu --ceph wonzhq-tmap-update --machine-type plana,burnupi,mira

should confirm it works assuming you have built wonzhq-tmap-update with this pull request in the gitbuilders.

@ghost ghost self-assigned this Sep 2, 2015
@ghost
Copy link

ghost commented Sep 2, 2015

http://pulpito.ceph.com/loic-2015-09-02_15:41:18-rbd-master---basic-multi/ is running the suite above on master to confirm it always fails and provide a simple way to reproduce the error.

@ghost
Copy link

ghost commented Sep 2, 2015

@wonzhq this should target infernalis and not master

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 2, 2015

@dachary though I can access sepia to run teuthology, I'm not able to create a branch in the ceph repo. I'll need to ask someone else to do it for me.

@ghost
Copy link

ghost commented Sep 2, 2015

@wonzhq I'll push the branch for you

@ghost
Copy link

ghost commented Sep 2, 2015

@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 2, 2015

Please hold on this. The commit only addresses the problem for v1 image. For v2 image, we need to play around the CEPH_OSD_OP_CALL op.

@ghost
Copy link

ghost commented Sep 2, 2015

@wonzhq I'll repush to the gitbuilders whenever you tell me to

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 2, 2015

@liewegas can you help to take a look if the fix makes sense? Thanks!

@jdurgin
Copy link
Member

jdurgin commented Sep 2, 2015

shouldn't this apply to all ops ec pools can't handle, i.e. everything that's not a class write, aligned append, or writefull?

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 3, 2015

@jdurgin you are right. For all ops which the ec base pool can't handle, we can't proxy them. Besides WRITE, ZERO and TRUNCATE, the others are SYNC_READ, MAPEXT, SPARSE_READ, CLONERANGE, TMAPGET, TMAPPUT, TMAPUP, OMAPSETVALS, OMAPSETVALS, OMAPSETHEADER, OMAPCLEAR, OMAPRMKEYS. I'll make the changes for them.

@wonzhq wonzhq changed the title osd: tmap update can't be proxied to an ec base tier osd: force promote for ops which ec base pool can't handle Sep 3, 2015
@ghost
Copy link

ghost commented Sep 3, 2015

@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 4, 2015

@dachary thanks! I've run the teuthology test for this in http://pulpito.ceph.com/yuan-2015-09-03_19:24:10-rbd-wonzhq-tmap-update---basic-multi/. It ran successfully.

@ghost
Copy link

ghost commented Sep 4, 2015

@jdurgin do you think it is good to merge ?

@ghost ghost assigned jdurgin and unassigned ghost Sep 4, 2015
@ghost ghost added this to the infernalis milestone Sep 4, 2015
iter->op.op == CEPH_OSD_OP_TMAPGET ||
iter->op.op == CEPH_OSD_OP_TMAPPUT ||
iter->op.op == CEPH_OSD_OP_TMAPUP) {
if (base_pool->require_rollback()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the base_pool if so that it's the outter one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a safer approach would be to whitelist the things that the erasure pool can do (instead of blacklisting stuff it can't). And make a helper to determine that.. like pg_pool_t::support_osd_op(int opcode)? Similar to supports_omap()...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, actually a helper is probably overkill since this method is the only user, and it may need to take additional information into account (like the whole OSDOp).

I still think flipping this around to be a whitelist is an improvement, though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, a whitelist would be easier to keep up to date. For appends I think we need to check the length with the EC pool's alignment requirement

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 6, 2015

@liewegas @jdurgin updated this to use a whitelist.

For append to an ec base pool, when the object is not in the cache tier, we don't know its size. Thus we are unable to determine if it's aligned or not. I think we have to promote this object anyway.

(iter->op.op != CEPH_OSD_OP_COPY_GET_CLASSIC) &&
(iter->op.op != CEPH_OSD_OP_COPY_GET) &&
(iter->op.op != CEPH_OSD_OP_COPY_FROM)) {
op->set_promote();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove

cache_* (can't use ec for cache pools)
notify, watch (no watch/notify on ec pool)

from the list

@liewegas
Copy link
Member

liewegas commented Sep 7, 2015

-29> 2015-09-07 00:46:37.840258 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 2015-09-07 00:46:37.840192, event: throttled, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013)
-28> 2015-09-07 00:46:37.840265 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 2015-09-07 00:46:37.840227, event: all_read, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013)
-27> 2015-09-07 00:46:37.840271 7fa36b07b700 5 -- op tracker -- seq: 13141, time: 0.000000, event: dispatched, op: osd_op(client.4547.0:3 0.obj [watch unwatch cookie 94197997405968] 174.db6ee63a ondisk+write+known_if_redirected e1013)
-26> 2015-09-07 00:46:37.840282 7fa36b07b700 20 osd.2 1016 should_share_map client.4547 10.214.136.114:0/1011485 1013
-25> 2015-09-07 00:46:37.840287 7fa36b07b700 20 osd.2 1016 client session last_sent_epoch: 0 versus osdmap epoch 1016
0> 2015-09-07 00:46:37.843087 7fa36b07b700 -1 *** Caught signal (Segmentation fault) **
in thread 7fa36b07b700
ceph version 9.0.3-1469-g2d7e480 (2d7e480)
1: (()+0x7ec80a) [0x55d2c7a0b80a]
2: (()+0x10340) [0x7fa372bc5340]
3: (OSD::init_op_flags(std::shared_ptr&)+0x125) [0x55d2c75374c5]
4: (OSD::handle_op(std::shared_ptr&, std::shared_ptr&)+0xbab) [0x55d2c7565c0b]
5: (OSD::dispatch_op_fast(std::shared_ptr&, std::shared_ptr&)+0x19e) [0x55d2c756695e]
6: (OSD::dispatch_session_waiting(OSD::Session, std::shared_ptr)+0x88) [0x55d2c7566c18]
7: (OSD::ms_fast_dispatch(Message)+0x234) [0x55d2c7566f64]
8: (AsyncConnection::process()+0x125a) [0x55d2c7bf3a0a]
9: (EventCenter::process_events(int)+0x41b) [0x55d2c7bb2d8b]
10: (Worker::entry()+0xd8) [0x55d2c7b92498]
11: (()+0x8182) [0x7fa372bbd182]
12: (clone()+0x6d) [0x7fa370f0447d]

That's the version prior to your last push, but I'm not sure the change would affect it crashing?

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 8, 2015

@liewegas is it possible osdmap->get_pg_pool returns NULL at this point?

For ops which the ec base pool can't handle, if they are proxied to the
base ec pool, ENOTSUPP is returned. Need to force promote the objects
into the cache pool.

Fixes: ceph#12903

Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 8, 2015

Looks like it's possible osdmap->get_pg_pool returns NULL in this function. I've changed the commit a little bit.

@ghost
Copy link

ghost commented Sep 8, 2015

@wonzhq wonzhq-tmap-update is building in http://ceph.com/gitbuilder.cgi . You should be able to run

teuthology-suite --priority 101 --suite rbd --filter='rbd/qemu/{cache/writethrough.yaml cachepool/ec-cache.yaml clusters/fixed-3.yaml fs/btrfs.yaml msgr-failures/few.yaml workloads/qemu_xfstests.yaml}' --suite-branch master --distro ubuntu --ceph wonzhq-tmap-update --machine-type plana,burnupi,mira

to confirm it does the right thing once it's finished.

@liewegas
Copy link
Member

liewegas commented Sep 8, 2015

On Mon, 7 Sep 2015, wonzhq wrote:

@liewegas is it possible osdmap->get_pg_pool returns NULL at this point?

Hmm, only if this is called before we do the osdmap epoch check in the
request and possibly put the op on wait_for_osdmap...

@wonzhq
Copy link
Contributor Author

wonzhq commented Sep 9, 2015

@jdurgin
Copy link
Member

jdurgin commented Sep 10, 2015

lgtm

liewegas added a commit that referenced this pull request Sep 12, 2015
osd: force promote for ops which ec base pool can't handle

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>
@liewegas liewegas merged commit 2e44373 into ceph:master Sep 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants