New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: support export pinning on directories #14598

Merged
merged 38 commits into from May 11, 2017

Conversation

Projects
None yet
3 participants
@batrick
Member

batrick commented Apr 18, 2017

Making a PR now to solicit feedback. Only major change left to make is preventing inodes that are pinned from being exported.

batrick added some commits Jan 12, 2017

mds: order MDBalancer header
This follows the coding style guidelines:

https://google.github.io/styleguide/cppguide.html#Declaration_Order

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: remove unimplemented function
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: organize Migrator headers
This follows the coding style guidelines:

https://google.github.io/styleguide/cppguide.html#Declaration_Order

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: simplify loops to range-for
No functional change.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: add comment for subtrees MDCache member
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
messages: add missing header to MMDSLoadTargets
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: organize headers
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: use uint64_t for sum of exported inodes
An int is too likely to overflow.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
common/DecayCounter: remove redundant qualifiers
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
common/DecayCounter: add delta to value for ::get
This resolves a bug where getting the DecayCounter value before one second has
elapsed will result in 0 always being returned. This is because ::hit will add
to the delta, not the value. The delta is added to the value only in the decay
function which only processes changes after 1 second has elapsed since the last
decay.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: check projected parent to avoid unsafe access
An anonymous inode may not have a stable parent so immediate migration would
cause a segfault when checking for strays.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: dispatch export request
Funnel dispatch through MDCache::dispatch_request so we have only one call site
for Migrator::dispatch_export_dir.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
common: assoc. DecayRate with DecayCounter
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@@ -73,7 +73,7 @@ class DecayCounter {
double get(utime_t now, const DecayRate& rate) {
decay(now, rate);
return val;
return val+delta;

This comment has been minimized.

@batrick

batrick Apr 18, 2017

Member

@liewegas does this commit seem reasonable to you?

batrick added some commits Apr 18, 2017

mds: unify export_targets handling for all exports
This commit moves the MDSMap export_targets updates/handling to MDSRank.  It is
necessary to wait for export_targets to be updated prior to doing any exports
as clients must have sessions open with targets of exports before any export
can be performed. Before this commit, this handling was only done for
migrations initiated by the balancer and not for manual migrations done by the
admin socket.

This fix and refactoring does the following:

o MDSRank now manages export_targets via a map of ranks with DecayCounters.
  MDSRank::hit_export_target enables the Migrator/MDBalancer to hit a rank to
  indicate migration is either desired or in progress. Importantly, updating
  export_targets is now no longer tied to the previous MDBalancer cycle.

o mds_bal_target_removal_min and mds_bal_target_removal_max are removed in
  favor of a DecayRate, via mds_bal_target_decay, which is independent of the
  tick rate.

o Certain balancing state has been pulled out of the MDBalancer into a stack
  variable type (balance_state_t). This is to avoid unnecessary persistence
  of my_targets, imported, and exported maps which made the code confusing.

o try_rebalance is no longer called on MDSMap updates. This was done before
  export target checking was part of the balancer, in 3e36d32.

o The Migrator now hits a rank in export_targets via MDSRank::hit_export_target
  proportional to how much is being exported and periodically during the
  course of the export. In my testing, one "default" hit (-1) will at least
  keep the target in the export map for about 2 minutes.

o The Migrator will wait until a target is in the export_targets before
  it actually does the export, or abort the export if the target is not
  added in a timely manner.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: add export_pin feature
This allows the client/admin to pin a directory tree to a particular rank,
preventing its export by the dynamic balancer.

Fixes: http://tracker.ceph.com/issues/17834

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: prevent export of pinned inodes
This adds a chain of linked lists to CInode which can be followed to CInodes
that are export pinned to this rank.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

@batrick batrick changed the title from [WIP] support export pinning on directories to mds: support export pinning on directories Apr 19, 2017

@batrick batrick requested a review from jcsp Apr 19, 2017

@batrick

This comment has been minimized.

Member

batrick commented Apr 19, 2017

@jcsp this is ready now I believe.

Here's a test run: http://pulpito.ceph.com/pdonnell-2017-04-19_21:04:50-multimds-wip-export-pin-testing-basic-mira/

Not really sure what's going on with the kclient failures but the FUSE configuration passes.

@@ -781,7 +781,7 @@ void Migrator::export_dir(CDir *dir, mds_rank_t dest)
return;
}
if (!dir->inode->is_base() && dir->get_parent_dir()->get_inode()->is_stray() &&
if (!dir->inode->is_base() && dir->get_inode()->get_projected_parent_dir()->get_inode()->is_stray() &&
dir->get_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)) {

This comment has been minimized.

@ukernel

ukernel Apr 21, 2017

Member

the MDS_INO_MDSDIR check also should use projected parent

This comment has been minimized.

@batrick

batrick Apr 21, 2017

Member

Fixed, thanks!

@ukernel

thank to the dirfrag size and MExportDir message size limitation, it's unlikely to overflow

@@ -1269,7 +1269,7 @@ void Migrator::export_go_synced(CDir *dir, uint64_t tid)
// fill export message with cache data
MExportDir *req = new MExportDir(dir->dirfrag(), it->second.tid);
map<client_t,entity_inst_t> exported_client_map;
int num_exported_inodes = encode_export_dir(req->export_data,
uint64_t num_exported_inodes = encode_export_dir(req->export_data,
dir, // recur start point
exported_client_map,
now);

This comment has been minimized.

@ukernel

ukernel Apr 21, 2017

Member

thank to the dirfrag size and MExportDir message size limitation, it's unlikely to overflow

This comment has been minimized.

@batrick

batrick Apr 21, 2017

Member

Okay to leave it as uint64_t?

This comment has been minimized.

@ukernel
@ukernel

This comment has been minimized.

Member

ukernel commented Apr 21, 2017

I can try writing code that creates local subtree bound inside exporting directory. It should simplify the code.

if (!dir->inode->is_base() && dir->get_parent_dir()->get_inode()->is_stray() &&
dir->get_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)) {
if (!dir->inode->is_base() && dir->get_inode()->get_projected_parent_dir()->get_inode()->is_stray() &&
dir->get_projected_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)) {

This comment has been minimized.

@jcsp

jcsp Apr 24, 2017

Contributor

The second one is s/dir->get_projected/dir->get_inode()->get_projected (doesn't compile)

This comment has been minimized.

@batrick

batrick Apr 24, 2017

Member

Thanks, fixed!

batrick added some commits Apr 21, 2017

mds: use projected parent to avoid unsafe access
Add-on to: 26a08f3

Credit to Zheng for noticing.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: break ancestor walk if node is export_pinned
i.e. don't keep walking the parents after adding the ancestor (or the node
itself) to the export_pin_queue.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@ukernel

This comment has been minimized.

Member

ukernel commented May 3, 2017

how about patch ukernel@4b2b874

mds: check export_pin on dirfrag load
Zheng's suggestion: "[maybe_export_pin is called by MDCache::add_inode() and
Migrator. For the first case, inode has no dirfrag (I think it's better to call
this function in CInode::get_or_open_dirfrag)"

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
if (!dir->inode->is_base() && dir->get_inode()->get_projected_parent_dir()->get_inode()->is_stray() &&
dir->get_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)) {
if (!dir->inode->is_base() && dir->inode->get_projected_parent_dir()->inode->is_stray() &&
dir->inode->get_projected_parent_dir()->get_parent_dir()->inode->is_mdsdir()) {

This comment has been minimized.

@ukernel

ukernel May 4, 2017

Member

if stray inode has subtree, we need to export the subtree to stray inode's auth mds. otherwise mds can't purge stray inode . this change breaks this function

This comment has been minimized.

@ukernel

ukernel May 4, 2017

Member

similarly, we should ignore the export pin check for stray inode or reset export pin when rmdir

batrick added some commits May 4, 2017

mds: remove unnecessary check for parent pins
With Zheng's help, now that the code has captured all the paths where an inode
should be checked for export pins, we don't need to look at parents anymore.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: do not try to export pin special directories
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick

This comment has been minimized.

Member

batrick commented May 5, 2017

I'll push a new version tomorrow after I fix up the commits. Thanks Zheng for the thorough review!

@ukernel

This comment has been minimized.

Member

ukernel commented May 5, 2017

I found bug in aux subtree code. Fixed by ukernel@54ce2c7

batrick and others added some commits May 5, 2017

qa: improve time handling for test_exports test
Also catches corner-case found by Zheng where an unjournaled directory will
cause export pinning to fail because it cannot be made a subtree until its
parent is stable.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: handle export pin on unjournaled directory
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: call maybe_export_pin on all fetched dirfrags
If the dirfrag is pinned to the current mds, it needs to be made into an aux
subtree.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: handle aux subtree when splitting/merging dirfrag
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
if (!dir->inode->is_base() && dir->get_parent_dir()->get_inode()->is_stray() &&
dir->get_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)) {
if (!dir->inode->is_base() && dir->inode->get_projected_parent_dir()->inode->is_stray() &&
dir->inode->get_projected_parent_dir()->get_parent_dir()->inode->is_mdsdir()) {

This comment has been minimized.

@ukernel

ukernel May 6, 2017

Member

this line should be dir->inode->get_projected_parent_dir()->get_parent_dir()->ino() != MDS_INO_MDSDIR(dest)

mds: check mdsdir against dest
Introduced by aebc1ca. Found by Zheng.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@ukernel

ukernel approved these changes May 8, 2017

@ukernel

This comment has been minimized.

Member

ukernel commented May 8, 2017

@ukernel

This comment has been minimized.

Member

ukernel commented May 8, 2017

got below crash when changing ceph.dir.pin twice

/home/zhyan/Ceph/ceph/src/mds/Migrator.cc: 780: FAILED assert(dest != mds->get_nodeid())

 ceph version 12.0.1-2188-g755a4e8 (755a4e802ba94b91f1ebc83dee51c1a317ed9d7b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x119) [0x55c0ba8d12c9]
 2: (Migrator::export_dir(CDir*, int)+0xdf2) [0x55c0ba71b94a]
 3: (MDBalancer::handle_export_pins()+0x7a6) [0x55c0ba74e65e]
 4: (MDBalancer::tick()+0x200) [0x55c0ba74f298]
 5: (MDSRankDispatcher::tick()+0x509) [0x55c0ba56feb9]
 6: (Context::complete(int)+0x11) [0x55c0ba55f3c1]
 7: (SafeTimer::timer_thread()+0x477) [0x55c0ba8ce007]
 8: (SafeTimerThread::entry()+0xd) [0x55c0ba8cf4ed]
 9: (()+0x76ca) [0x7f13034ce6ca]
 10: (clone()+0x5f) [0x7f1302332f7f]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
mds: update export targets even when not active
Problem caught test_migration_on_shutdown. Migration could not proceed because
export_targets were not being updated.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick

This comment has been minimized.

Member

batrick commented May 9, 2017

@ukernel most of the failures were caused by stalled migration due to export_targets not getting updated when an MDS is stopping. That's fixed in the latest commit. I have another test run that will be scheduled in 4 hours when the builds for my integration branch are done.

dout(7) << "dest is not yet an export target" << dendl;
if (count > 3) {
dout(5) << "dest has not been added as export target after three MDSMap epochs, canceling export" << dendl;
mds->mdcache->request_finish(mdr);

This comment has been minimized.

@ukernel

ukernel May 9, 2017

Member

should call export_try_cancel() here

ukernel added some commits May 9, 2017

mds: properly cleanup export states
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
mds: fix dir auth calculation in CDir::merge
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
@ukernel

This comment has been minimized.

Member

ukernel commented May 9, 2017

@batrick

This comment has been minimized.

Member

batrick commented May 10, 2017

I've found an assertion failure in testing, don't merge now.

mds: delay export until dir is stable
This catches an assertion failure due to exports on unjournaled directories:

    2017-05-10 12:45:18.193640 7eff65a04700  7 mds.0.migrator export_dir [dir 10000000003 /a/ [2,head] auth pv=3 v=1 cv=0/0 ap=1+2+2 state=1073742082|complete f()->f(v0 m2017-05-10 12:45:14.977151 1=0+1) n()->n(v0 rc2017-05-10 12:45:14.977151 1=0+1) hs=0+1,ss=0+0 | child=1 authpin=1 0x55ef655de000] to 1
    2017-05-10 12:45:18.193654 7eff65a04700 15 mds.0.18 hit export target 10 @ 2017-05-10 12:45:18.193653
    2017-05-10 12:45:18.193661 7eff65a04700 10 mds.0.cache.dir(10000000003) auth_pin by 0x55ef650cc130 on [dir 10000000003 /a/ [2,head] auth pv=3 v=1 cv=0/0 ap=2+2+2 state=1073742082|complete f()->f(v0 m2017-05-10 12:45:14.977151 1=0+1) n()->n(v0 rc2017-05-10 12:45:14.977151 1=0+1) hs=0+1,ss=0+0 | child=1 authpin=1 0x55ef655de000] count now 2 + 2
    2017-05-10 12:45:18.193701 7eff65a04700  7 mds.0.cache request_start_internal request(mds.0:4) op 5377
    2017-05-10 12:45:18.193708 7eff65a04700  7 mds.0.migrator dispatch_export_dir request(mds.0:4)
    2017-05-10 12:45:18.198281 7eff66205700 20 mgrc operator() 234 counters, of which 0 new
    2017-05-10 12:45:18.198294 7eff66205700 20 mgrc send_report encoded 2006 bytes
    2017-05-10 12:45:18.198298 7eff66205700  1 -- 127.0.0.1:6825/693985647 --> 127.0.0.1:6826/1025 -- mgrreport(+0-0 packed 2006) v2 -- 0x55ef655abe00 con 0
    2017-05-10 12:45:18.198364 7eff6a9ab700 10 _calc_signature seq 14 front_crc_ = 2388339344 middle_crc = 0 data_crc = 0 sig = 5026349537430007662
    2017-05-10 12:45:18.198376 7eff6a9ab700 20 Putting signature in client message(seq # 14): sig = 5026349537430007662
    2017-05-10 12:45:18.198411 7eff65a04700 -1 /home/pdonnell/ceph/src/mds/MDCache.cc: In function 'void MDCache::make_trace(std::vector<CDentry*>&, CInode*)' thread 7eff65a04700 time 2017-05-10 12:45:18.194136
    /home/pdonnell/ceph/src/mds/MDCache.cc: 8225: FAILED assert(parent)

     ceph version 12.0.1-2198-ge757c02 (e757c025fa3270b12fb2fca17cf159fa1bd72747)
     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55ef5b780f02]
     2: (MDCache::make_trace(std::vector<CDentry*, std::allocator<CDentry*> >&, CInode*)+0x1cb) [0x55ef5b4fa12b]
     3: (Migrator::get_export_lock_set(CDir*, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&)+0x55) [0x55ef5b5e8215]
     4: (Migrator::dispatch_export_dir(boost::intrusive_ptr<MDRequestImpl>&, int)+0xa74) [0x55ef5b5f72d4]
     5: (Migrator::export_dir(CDir*, int)+0x9ca) [0x55ef5b5ea75a]
     6: (MDBalancer::handle_export_pins()+0x7b4) [0x55ef5b61ab24]
     7: (MDBalancer::tick()+0x1e8) [0x55ef5b61b748]
     8: (MDSRankDispatcher::tick()+0x5f1) [0x55ef5b44bdb1]
     9: (Context::complete(int)+0x9) [0x55ef5b43bcc9]
     10: (SafeTimer::timer_thread()+0x452) [0x55ef5b77dd52]
     11: (SafeTimerThread::entry()+0xd) [0x55ef5b77f15d]
     12: (()+0x76ba) [0x7eff6d4bb6ba]
     13: (clone()+0x6d) [0x7eff6c52782d]

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick

This comment has been minimized.

Member

batrick commented May 10, 2017

Okay, this should be good to merge now.

@ukernel

This comment has been minimized.

Member

ukernel commented May 11, 2017

LGTM

@ukernel ukernel merged commit b67a599 into ceph:master May 11, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@batrick batrick deleted the batrick:mds-balancer-pin branch May 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment