-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osd: continue recovery optimization for overwrite Ops #19569
Conversation
8248d36
to
5b38058
Compare
@jdurgin Could you review it for me ? thanks |
ping @jdurgin |
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
@redenval this looks quite good - I need some more time to look closely at the encoding changes, but I think this will be a very useful addition to ceph - please re-open |
db4b3a5
to
fe05f7a
Compare
@jdurgin OK, I've reopen it, please review it, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's taken so long to get back to this. I finally had a close look, and it's in pretty good shape.
Has this been through any rados suites via teuthology yet?
src/osd/osd_types.cc
Outdated
|
||
void ObjectCleanRegions::encode(bufferlist &bl) const | ||
{ | ||
using ceph::encode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add versioning to this encoding - I expect we'll want to change it in the future, e.g. adding omap ranges too
@@ -1370,20 +1370,23 @@ struct PGLog : DoutPrefixProvider { | |||
|
|||
set<hobject_t> did; | |||
set<hobject_t> checked; | |||
set<hobject_t> del; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
del is never read
src/include/ceph_features.h
Outdated
@@ -164,7 +164,7 @@ DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap | |||
DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap | |||
DEFINE_CEPH_FEATURE(60, 1, OSD_RECOVERY_DELETES) // *do not share this bit* | |||
|
|||
DEFINE_CEPH_FEATURE(61, 1, RESERVED2) // unused, but slow down! | |||
DEFINE_CEPH_FEATURE(61, 1, OSD_PARTIAL_RECOVERY) // unused, but slow down! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we only need to check peer features, we can just use SERVER_MIMIC instead of adding a new feature bit
src/osd/osd_types.h
Outdated
} else if (is_missing_divergent_item) { | ||
missing_it->second = item(e.version, eversion_t(), e.is_delete(), false, false); // .have = nil | ||
(missing_it->second).need = e.version; | ||
(missing_it->second).have = eversion_t(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need and have are already set by the constructor
src/osd/ReplicatedBackend.cc
Outdated
} | ||
else { | ||
// If omap is not changed, we need recovery omap when recovery cannot be completed once | ||
if (progress.first && progress.omap_complete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if can be combined into else, 'else {' should be on same line as '}'
@@ -7365,6 +7373,8 @@ inline int PrimaryLogPG::_delete_oid( | |||
interval_set<uint64_t> ch; | |||
ch.insert(0, oi.size); | |||
ctx->modified_ranges.union_of(ch); | |||
ctx->clean_regions.mark_data_region_dirty(0, oi.size); | |||
ctx->clean_regions.mark_omap_dirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a comment here about why xattrs aren't mentioned.
if I understand correctly, this handles xattrs correctly because during recovery we either 1) delete the object (if there are no subsequent writes) or 2) when recovering subsequent writes to the object, always remove and then copy all the xattrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it is.
src/osd/osd_types.cc
Outdated
@@ -3968,6 +4083,11 @@ void pg_log_entry_t::dump(Formatter *f) const | |||
mod_desc.dump(f); | |||
f->close_section(); | |||
} | |||
{ | |||
f->open_object_section("clean_reginos"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in 'regions'
what we need to consider is how to deal with the object data, | ||
object data makes up of omap_header, xattrs, omap, data: | ||
|
||
case 1 -- first && complete: since object recovering is finished in a single PushOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this based on the current implementation? in particular there's no cloning of omap anymore, I'm not sure if everything else is still the same
cr1_expect.insert(204800, 8192); | ||
ASSERT_TRUE(cr1_expect.subset_of(cr1.get_dirty_regions())); | ||
ASSERT_TRUE(cr1.omap_is_dirty()); | ||
} | ||
|
||
TEST(pg_missing_t, constructor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add some PGLog tests for read_log_and_missing that include ObjectCleanRegions, also for pg_missing_t's add_next_event
Also, @jdurgin can you guide how to run rados suites via teuthology. we find it hard to prepare the teuthology env and make it work properly. |
@mslovy Yes I can help with teuthology. I'm guessing you've already seen http://docs.ceph.com/teuthology/docs/LAB_SETUP.html - what problems are you seeing? |
yeah, I have several questions here:
|
You can see the results here: |
@jdurgin , thanks. got whole picture of the teuthology framework and I will try it by myself. when I follow the guide in http://docs.ceph.com/teuthology/docs/LAB_SETUP.html , wget -O ~/bin/worker_start https://raw.githubusercontent.com/ceph/teuthology/master/docs/_static/worker_start.sh, and use the worker_start.sh script to bring up the worker. It will clone the ceph-ci.git repository and use the master branch in it. However, I can find ceph-ci.git now has a master branch, and only previous-master branch exist. Traceback (most recent call last): This is the traceback, and how could I deal with it, can I use a different branch in ceph-ci.git? |
@mslovy I think you need to add these settings (possibly with https github urls, or your own mirror) in ~/.teuthology.yaml of the user running worker_start:
This changed since the docs were last updated - in the past ceph-qa-suite was a separate repository, but now that it is under ceph.git/qa, the defaults for some commands like this are to use github.com/ceph/ceph-ci, which is where test branches are built, but which does not contain the stable branches or master. The run I scheduled shows many failures. You can find full osd logs for each from the web ui. One with a crash in osd.1, for example, is here: From remote/smithi007/log/ceph-osd.1.log.gz:
|
16b37ba
to
0345a8a
Compare
Signed-off-by: qiuming <qiuming@unitedstack.com>
0345a8a
to
e98c6b2
Compare
Signed-off-by: qiuming qiuming@unitedstack.com