Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: continue recovery optimization for overwrite Ops #19569

Closed
wants to merge 5 commits into from

Conversation

qiuming-best
Copy link

@qiuming-best qiuming-best commented Dec 18, 2017

Signed-off-by: qiuming qiuming@unitedstack.com

@ZVampirEM77
Copy link
Contributor

@redenval I think it is better to rebase those commits in #7325 to current master branch.

@tchaikov tchaikov added the core label Dec 19, 2017
@qiuming-best qiuming-best deleted the wip-parial-recovery branch December 22, 2017 11:27
@qiuming-best qiuming-best reopened this Dec 23, 2017
@qiuming-best qiuming-best force-pushed the wip-parial-recovery branch 2 times, most recently from 8248d36 to 5b38058 Compare December 23, 2017 07:07
@qiuming-best
Copy link
Author

@jdurgin Could you review it for me ? thanks

@qiuming-best
Copy link
Author

ping @jdurgin

qiuming-best and others added 3 commits January 15, 2018 19:29
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
Signed-off-by: qiuming <qiuming@unitedstack.com>
@jdurgin
Copy link
Member

jdurgin commented Jan 16, 2018

@redenval this looks quite good - I need some more time to look closely at the encoding changes, but I think this will be a very useful addition to ceph - please re-open

@qiuming-best
Copy link
Author

@jdurgin OK, I've reopen it, please review it, thanks

@qiuming-best
Copy link
Author

ping @jdurgin @liewegas

Copy link
Member

@jdurgin jdurgin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it's taken so long to get back to this. I finally had a close look, and it's in pretty good shape.

Has this been through any rados suites via teuthology yet?


void ObjectCleanRegions::encode(bufferlist &bl) const
{
using ceph::encode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add versioning to this encoding - I expect we'll want to change it in the future, e.g. adding omap ranges too

@@ -1370,20 +1370,23 @@ struct PGLog : DoutPrefixProvider {

set<hobject_t> did;
set<hobject_t> checked;
set<hobject_t> del;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del is never read

@@ -164,7 +164,7 @@ DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap
DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap
DEFINE_CEPH_FEATURE(60, 1, OSD_RECOVERY_DELETES) // *do not share this bit*

DEFINE_CEPH_FEATURE(61, 1, RESERVED2) // unused, but slow down!
DEFINE_CEPH_FEATURE(61, 1, OSD_PARTIAL_RECOVERY) // unused, but slow down!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we only need to check peer features, we can just use SERVER_MIMIC instead of adding a new feature bit

} else if (is_missing_divergent_item) {
missing_it->second = item(e.version, eversion_t(), e.is_delete(), false, false); // .have = nil
(missing_it->second).need = e.version;
(missing_it->second).have = eversion_t();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need and have are already set by the constructor

src/osd/ReplicatedBackend.cc Outdated Show resolved Hide resolved
}
else {
// If omap is not changed, we need recovery omap when recovery cannot be completed once
if (progress.first && progress.omap_complete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if can be combined into else, 'else {' should be on same line as '}'

@@ -7365,6 +7373,8 @@ inline int PrimaryLogPG::_delete_oid(
interval_set<uint64_t> ch;
ch.insert(0, oi.size);
ctx->modified_ranges.union_of(ch);
ctx->clean_regions.mark_data_region_dirty(0, oi.size);
ctx->clean_regions.mark_omap_dirty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a comment here about why xattrs aren't mentioned.
if I understand correctly, this handles xattrs correctly because during recovery we either 1) delete the object (if there are no subsequent writes) or 2) when recovering subsequent writes to the object, always remove and then copy all the xattrs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is.

@@ -3968,6 +4083,11 @@ void pg_log_entry_t::dump(Formatter *f) const
mod_desc.dump(f);
f->close_section();
}
{
f->open_object_section("clean_reginos");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in 'regions'

what we need to consider is how to deal with the object data,
object data makes up of omap_header, xattrs, omap, data:

case 1 -- first && complete: since object recovering is finished in a single PushOp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this based on the current implementation? in particular there's no cloning of omap anymore, I'm not sure if everything else is still the same

cr1_expect.insert(204800, 8192);
ASSERT_TRUE(cr1_expect.subset_of(cr1.get_dirty_regions()));
ASSERT_TRUE(cr1.omap_is_dirty());
}

TEST(pg_missing_t, constructor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add some PGLog tests for read_log_and_missing that include ObjectCleanRegions, also for pg_missing_t's add_next_event

@mslovy
Copy link
Contributor

mslovy commented Mar 23, 2018

Also, @jdurgin can you guide how to run rados suites via teuthology. we find it hard to prepare the teuthology env and make it work properly.

@jdurgin
Copy link
Member

jdurgin commented Mar 27, 2018

@mslovy Yes I can help with teuthology. I'm guessing you've already seen http://docs.ceph.com/teuthology/docs/LAB_SETUP.html - what problems are you seeing?

@mslovy
Copy link
Contributor

mslovy commented Mar 29, 2018

yeah, I have several questions here:

  1. test nodes can be any virtual machines or baremetals, which contains several disks and can deploy ceph on it, right?
  2. how teuthology node knows these test nodes? I think we should tell the teutology node where they are, right? Does the SUBMITTING NODES register test nodes into teutology?
  3. finally, what kind of tasks we should run in the qa-suite? and is there minimum number of test nodes we need to run those tasks?

@jdurgin
Copy link
Member

jdurgin commented Mar 30, 2018

  1. Test nodes can be virtual or bare metal, ceph-cm-ansible is run on them to install appropriate packages and do other setup during a test run

  2. paddles uses a postgres database to store which nodes exist. That 'SUBMITTING NODES' step inserts the nodes into that database. Alternately, if you have an openstack cloud you can point teuthology at that: http://docs.ceph.com/teuthology/docs/openstack_backend.html#openstack-backend

  3. run a subset of the rados suite - to expedite this as you're looking at standing up teuthology, I've built packages for your branch https://shaman.ceph.com/builds/ceph/wip-partial-recovery-2018-03-29/ and started a run via:

~/teuthology/virtualenv/bin/teuthology-suite -v -c wip-partial-recovery-2018-03-29 -s rados -m smithi -k distro -e jdurgin@redhat.com --owner joshd@sietchtabr -p 90 --subset 8/2000

You can see the results here:
http://pulpito.ceph.com/joshd-2018-03-29_23:44:12-rados-wip-partial-recovery-2018-03-29-distro-basic-smithi/

@mslovy
Copy link
Contributor

mslovy commented Mar 30, 2018

@jdurgin , thanks. got whole picture of the teuthology framework and I will try it by myself.

when I follow the guide in http://docs.ceph.com/teuthology/docs/LAB_SETUP.html , wget -O ~/bin/worker_start https://raw.githubusercontent.com/ceph/teuthology/master/docs/_static/worker_start.sh, and use the worker_start.sh script to bring up the worker. It will clone the ceph-ci.git repository and use the master branch in it. However, I can find ceph-ci.git now has a master branch, and only previous-master branch exist.

Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 11, in
load_entry_point('teuthology', 'console_scripts', 'teuthology-worker')()
File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main
teuthology.worker.main(parse_args())
File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 83, in main
fetch_qa_suite('master')
File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 310, in fetch_qa_suite
branch, lock=lock)
File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 265, in fetch_repo
enforce_repo_state(url, dest_path, branch)
File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 84, in enforce_repo_state
clone_repo(repo_url, dest_path, branch)
File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 133, in clone_repo
raise BranchNotFoundError(branch, repo_url)
teuthology.exceptions.BranchNotFoundError: Branch 'master' not found in repo: https://github.com/ceph/ceph-ci.git!

This is the traceback, and how could I deal with it, can I use a different branch in ceph-ci.git?

@jdurgin
Copy link
Member

jdurgin commented Apr 3, 2018

@mslovy I think you need to add these settings (possibly with https github urls, or your own mirror) in ~/.teuthology.yaml of the user running worker_start:

ceph_git_base_url: git://git.ceph.com/git/
ceph_git_url: git://git.ceph.com/ceph.git
ceph_qa_suite_git_url: git://git.ceph.com/ceph.git

This changed since the docs were last updated - in the past ceph-qa-suite was a separate repository, but now that it is under ceph.git/qa, the defaults for some commands like this are to use github.com/ceph/ceph-ci, which is where test branches are built, but which does not contain the stable branches or master.

The run I scheduled shows many failures. You can find full osd logs for each from the web ui. One with a crash in osd.1, for example, is here:

http://qa-proxy.ceph.com/teuthology/joshd-2018-03-29_23:44:12-rados-wip-partial-recovery-2018-03-29-distro-basic-smithi/2334365/

From remote/smithi007/log/ceph-osd.1.log.gz:

     0> 2018-03-30 00:27:35.252 7f1d9bdb9700 -1 /build/ceph-13.0.1-1049-gfe05f7a/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::submit_push_data(const ObjectRecoveryInfo&, bool, bool, bool, bool, interval_set<long unsigned int>&, const interval_set<long unsigned int>&, ceph::bufferlist, ceph::bufferlist, const std::map<std::__cxx11::basic_string<char>, ceph::buffer::list>&, const std::map<std::__cxx11::basic_string<char>, ceph::buffer::list>&, ObjectStore::Transaction*)' thread 7f1d9bdb9700 time 2018-03-30 00:27:35.254313
/build/ceph-13.0.1-1049-gfe05f7a/src/osd/ReplicatedBackend.cc: 1638: FAILED assert(r == 0)

 ceph version 13.0.1-1049-gfe05f7a (fe05f7a9daa6a481a6f7f2303693c8bdbf1de5a6) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f1db86fc6c5]
 2: (ReplicatedBackend::submit_push_data(ObjectRecoveryInfo const&, bool, bool, bool, bool, interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > >&, interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&, ceph::buffer::list, ceph::buffer::list, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const&, ObjectStore::Transaction*)+0x1486) [0x56096146e9a6]
 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x20b) [0x56096146ecdb]
 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x124) [0x56096146eff4]
 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x278) [0x560961475358]
 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x560961382de7]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x675) [0x56096133ada5]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x341) [0x56096118f061]
 9: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5609614061b2]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xdfd) [0x56096119676d]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7f1db87025e2]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f1db8704940]
 13: (()+0x76ba) [0x7f1db71d26ba]
 14: (clone()+0x6d) [0x7f1db69fb41d]

@qiuming-best qiuming-best force-pushed the wip-parial-recovery branch 2 times, most recently from 16b37ba to 0345a8a Compare April 8, 2018 14:25
Signed-off-by: qiuming <qiuming@unitedstack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants