Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: add `drop cache` command #21566

Merged
merged 4 commits into from Oct 1, 2018
Merged

Conversation

@rishabh-d-dave
Copy link
Contributor

rishabh-d-dave commented Apr 20, 2018

Adds the command drop cache for MDS and an extra message so that client acknowledge to MDS that cache has been released. See - http://tracker.ceph.com/issues/23362

@rishabh-d-dave rishabh-d-dave changed the title Adds the drop_cache command for MDS. See - http://tracker.ceph.com/issues/23362 mds: add drop_cache command Apr 20, 2018
@tchaikov

This comment has been minimized.

Copy link
Contributor

tchaikov commented Apr 21, 2018

retest this please.

@tchaikov

This comment has been minimized.

Copy link
Contributor

tchaikov commented Apr 21, 2018

retest this please

@tchaikov

This comment has been minimized.

Copy link
Contributor

tchaikov commented Apr 21, 2018

@rishabh-d-dave this change does not compile

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch from 2e5a4dc to 9fb16cf Apr 23, 2018
@rishabh-d-dave

This comment has been minimized.

Copy link
Contributor Author

rishabh-d-dave commented Apr 23, 2018

@rishabh-d-dave this change does not compile

@tchaikov Made a mistake while cleaning up the debug code. Fixed it. Thanks!

@tchaikov tchaikov added the cephfs label Apr 23, 2018
@batrick

This comment has been minimized.

Copy link
Member

batrick commented Apr 23, 2018

Please annotate the commit which fixes/resolves a Ceph tracker issue with:

Fixes: http://tracker.ceph.com/issues/...

This is essential when examining the history of the repository (this commit fixes what) and helps merge scripts identify issues that have been resolved by a merge.

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch 2 times, most recently from 8b3e920 to ebba6d1 Apr 24, 2018
// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);
command_flush_journal(fmtr);
// TODO: do we want output from `flush journal`?

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 25, 2018

Author Contributor

Running ceph tell mds.a drop cache displays output from MDSRank::command_flush_journal() as well as from MDSRank::cache_status().

{
    "message": "",
    "return_code": 0
}
{
    "pool": {
        "items": 482,
        "bytes": 52398
    }
}

Do we want both or only the latter?

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

both, combine into one object

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

should be:

{
    "message": "",
    "return_code": 0
    "cache_status": {
      "pool": {
          "items": 482,
          "bytes": 52398
      }
  }
}

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 26, 2018

Author Contributor

Okay. From a general user's POV, I think it will be difficult to understand the context of the first two lines. Perhaps it'll be better to add a section with title "flush_journal" like the following snippet?

{
    "flush_journal": {
        "message": "",
        "return_code": 0
    },
    "cache": {
        "pool": {
            "items": 482,
            "bytes": 52398
        }
    }
}

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 26, 2018

Author Contributor

Updating the PR with "flush_journal" part included. Let me know if you think it's not good idea.

@@ -2240,7 +2240,7 @@ int MDSRank::_command_flush_journal(std::stringstream *ss)
{
assert(ss != NULL);

Mutex::Locker l(mds_lock);
mds_lock.Lock();

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 25, 2018

Author Contributor

@batrick I don't know if this change is really appropriate but without it the MDS crashes after the executing the command at this line [1] since it expects MDS to be locked.

[1] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

I'd suggest moving Mutex::Locker l(mds_lock) to surround the call to _command_flush_journal in command_flush_journal:

const int r = _command_flush_journal(&ss);

to be

{
  Mutex::Locker l(mds_lock);
  const int r = _command_flush_journal(&ss);
}

Then comment MDSrank::_command_flush_journal that it expects mds_lock to be locked.

.set_default(30)
.set_description("how long should the MDS wait for receiving "
"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions "
"while dropping cache."),

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

I think we can keep this vague. Don't mention dropping cache. Also, rename to mds_session_recall_ack_timeout.

} else {
return false;
}
}

void MDSRank::command_drop_cache(stringstream *ds)
{

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

add Mutex::Locker l(mds_lock);

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 26, 2018

Author Contributor

@batrick Using Mutex::Locker l(mds_lock); creates the same issue as before [1]. At least for time being, I am replacing it by mds_lock.Lock() so that MDS would stay locked until control reaches here [2] after exiting command_drop_cache(); this prevents MDS from crashing.

[1] #21566 (comment)
[2] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143

This comment has been minimized.

Copy link
@batrick

batrick Apr 28, 2018

Member

Oh, right that lock is not necessary because it's locked in MDSDaemon::ms_dispatch.


// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);
command_flush_journal(fmtr);

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

should call _command_flush_journal.

"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions." << dendl;

// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

Use:

std::unique_ptr<Formatter> fmtr(new JSONFormatter(true));

then

fmtr.reset(new JSONFormatter(true));

when you want to create a new one.

if (r != 0)
*ds << "Failed to get cache status: " << cpp_strerror(r);
fmtr->flush(*ds);
}

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

As I noted elsewhere, what you should actually do is concatenate all of the commands outputs into one JSON object. So you should be using the same JSONFormatter for all of these commands. Just create a new object section before calling each command.


time_passed = mono_clock::now() - time_at_beg;
if (TIMEOUT - time_passed.count() > 2)
sleep(2);

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

This needs to be using waiter/gather continuations. @ukernel can you suggest the right approach.

This comment has been minimized.

Copy link
@ukernel

ukernel May 9, 2018

Member

see code in Server::flush_client_sessions() and see how does it get used

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 4, 2018

Author Contributor

I originally wrote this code to make MDS consume less CPU. Should it be gotten rid of? I am confused since discussion of waiter/gather continuations began from here.

This comment has been minimized.

Copy link
@ukernel

ukernel Jun 8, 2018

Member

I don't see how 'sleep' consume less CPU. please get rid of it.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 11, 2018

Author Contributor

Done.

I was under the impression that sleep would lead to lesser number of iterations (when ACKs are not received spontaneously) allowing other threads/processes to use that CPU time instead.

@@ -465,6 +465,11 @@ void Server::handle_client_session(MClientSession *m)
mdlog->flush();
break;

case CEPH_SESSION_RECALL_STATEMSG_ACK:
if (this->report_acks)
session->recall_acked = true;

This comment has been minimized.

Copy link
@batrick

batrick Apr 25, 2018

Member

The right way to do this is to record a sequence number for the recall messages the MDS sends so that the client can ACK the seq number. Then we know which recall messages it has handled. Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".

Please modify those messages (and increment the message version) to include the sequence number.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 26, 2018

Author Contributor

Please correct me if I am wrong: in my understanding, I need to assign seq. numbers to CEPH_SESSION_RECALL_STATE messages being sent by MDS in Server::client_recall_state, store them for later and then while receiving ACKs (until mds_session_recall_ack_timeout) for these messages I need to check them and report the ones (in the logs) I haven't received an ACK for.

Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".

Yeah, I couldn't comprehend the "seq" part back then.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Apr 26, 2018

Author Contributor

Updating the PR as per my understanding above.

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch from ebba6d1 to 821c4ad Apr 26, 2018
@@ -91,6 +93,7 @@ class Server {

public:
bool terminating_sessions;
set<int> crs_msg_seq_nums;

This comment has been minimized.

Copy link
@batrick

batrick May 9, 2018

Member

This should actually be a pair of sequence numbers (version_t) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK message.

But now that I'm reading the code more in SessionMap.h, I'm wondering if we can reuse the existing CEPH_SESSION_FLUSHMSG code for this purpose. There is already a sequence number in the session to use (see the code for Session::cap_push_seq. @ukernel would this be appropriate or should we use a separate sequence number?

This comment has been minimized.

Copy link
@ukernel

ukernel May 9, 2018

Member

using Session::cap_push_seq is appropriate.

If client always sends ACK immediately after receiving drop cache request, existing Session::{wait_for_flush,finish_flush} code can serve this purpose completely. If not, need introduce a new waiter list ( similar to Session::waitfor_flush)

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave May 15, 2018

Author Contributor

@batrick

This should actually be a pair of sequence numbers (version_t) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK message.

If sequence number of the message sent by MDS is different from that of the client, I wouldn't know which sessions have not been trimmed and it will not be possible to report those sessions in the log. Though I can still see how many sessions have not trimmed. Will that be okay?

This comment has been minimized.

Copy link
@ukernel

ukernel May 16, 2018

Member

check code that handle CEPH_SESSION_FLUSHMSG/CEPH_SESSION_FLUSHMSG_ACK. they use the same seq number. please reuse cap_push_seq instead of introducing a new one

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave May 31, 2018

Author Contributor

@ukernel Done. Thanks. :)

@@ -2078,6 +2078,9 @@ void Client::handle_client_session(MClientSession *m)

case CEPH_SESSION_RECALL_STATE:
trim_caps(session, m->get_max_caps());
session->con->send_message(new MClientSession(
CEPH_SESSION_RECALL_STATEMSG_ACK,
m->get_seq()));

This comment has been minimized.

Copy link
@ukernel

ukernel May 9, 2018

Member

Does client always send ACK immediately

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave May 14, 2018

Author Contributor

@ukernel Client will always send ACKs but I don't if its fast enough to be immediate as it calls trim_caps() first.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave May 15, 2018

Author Contributor

Recalling from one of the reply Patrick's on the mail, clients that have caps pinned locally will not return the cap. In that case ACK may not be returned.

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch 2 times, most recently from 336f2d2 to 90d24e7 May 31, 2018
@rishabh-d-dave

This comment has been minimized.

Copy link
Contributor Author

rishabh-d-dave commented Jun 4, 2018

Should the command drop cache be changed to cache drop? There are two more cache-related commands - cache status and dump cache - but they do not make the preferred order clear. I think cache preceding the operation is better.

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch from 90d24e7 to 3aeed0d Jun 11, 2018
@batrick

This comment has been minimized.

Copy link
Member

batrick commented Jun 11, 2018

cache drop sounds good to me.

}

if (not mdcache->trim(UINT64_MAX))
return;

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

I think trimming the cache should occur after we get all the client recall acks. We want to trim the MDS cache when the client caps are released.

fmtr.get()->open_object_section("result");

int retval = _command_flush_journal(&ss);
fmtr.get()->open_object_section("flush_journal");

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

can be fmtr->open_object_section...


sessionmap.get_client_session_set(unacked_sessions);

while (time_passed.count() < TIMEOUT) {

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

The MDS should do a wait, not an infinite loop. You need to add a waiter interface for this.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 13, 2018

Author Contributor

The MDS should do a wait, not an infinite loop.

It's not infinite. The looping stops on timeout.

This comment has been minimized.

Copy link
@batrick

batrick Jun 13, 2018

Member

Sorry my wording was wrong. It's not infinite but it is a busy-wait. i.e. the CPU loops until the condition is true... for TIMEOUT seconds. This isn't acceptable.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 14, 2018

Author Contributor

Got it.

@@ -301,6 +301,7 @@ enum {
CEPH_SESSION_RENEWCAPS,
CEPH_SESSION_STALE,
CEPH_SESSION_RECALL_STATE,
CEPH_SESSION_RECALL_STATEMSG_ACK,

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

this needs to go at the end of the enum otherwise the binary compatibility of the state messages is compromised

case CEPH_SESSION_RECALL_STATEMSG_ACK:
finish_flush_session(session, m->get_seq());
if (this->report_acks)
session->recall_acked = true;

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

recall_acked = m->get_seq()

{
MClientSession *m;
static MDSGatherBuilder gather(g_ceph_context);

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

why static?

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 13, 2018

Author Contributor

I saw MDS crash without it -

(gdb)
1155      set<Session*> sessions;
(gdb)
1133      MDSGatherBuilder gather(g_ceph_context);
(gdb)

Program received signal SIGABRT, Aborted.
0x00007f7fd283b277 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)
handle_fatal_signal (signum=6)
    at /home/centos/repos/ceph/src/global/signal_handler.cc:92
92      {
(gdb) c
Continuing.

I do not have a higher understanding of how MDS runs as a daemon but I saw the line being executed multiple times. So, the making it static seemed like a natural solution.

{
MClientSession *m;

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

Why did you move this var?

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 13, 2018

Author Contributor

Moved it during re-writings. Undid this change.

if (not mdcache->trim(UINT64_MAX))
return;

server->recall_client_state(1, true);

This comment has been minimized.

Copy link
@batrick

batrick Jun 11, 2018

Member

pass a pointer to

MDSGatherBuilder gather(g_ceph_context);

to server->recall_client_state which adds to the gather:

https://github.com/ceph/ceph/pull/21566/files#diff-eecf840e09d7481f7b8f415789852c7dR1177

then you can do a gather.activate() in _wait_for_client_recall_state_acks instead of the cpu-intensive loop.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Jun 18, 2018

Author Contributor

@batrick I can't figure out how to replace the busy/wait loop by gather.activate(). Also, calling gather.activate() in _wait_for_client_recall_state_acks leads to a crash due to the assert in gather.activate() [1]

And would these changes mean no need timeout? If so, the option mds_session_recall_ack_timeout will be redundant.

[1] https://github.com/ceph/ceph/blob/master/src/include/Context.h#L476

@rishabh-d-dave rishabh-d-dave force-pushed the rishabh-d-dave:mds-drop-cache-command branch from 3aeed0d to 922f5d4 Jun 18, 2018
@batrick

This comment has been minimized.

Copy link
Member

batrick commented Aug 16, 2018

This needs rebased to handle new message handling introduced by #22555.

@vshankar vshankar force-pushed the rishabh-d-dave:mds-drop-cache-command branch 3 times, most recently from aeb14bf to df8b700 Aug 27, 2018
@batrick

This comment has been minimized.

Copy link
Member

batrick commented Aug 28, 2018

Needs rebased again

@vshankar vshankar force-pushed the rishabh-d-dave:mds-drop-cache-command branch from df8b700 to 8b10e79 Aug 30, 2018
@vshankar

This comment has been minimized.

Copy link
Contributor

vshankar commented Aug 30, 2018

@batrick rebased and updated

The hang I mentioned about earlier was not really a deadlock -- the "drop cache" command used to wait for the timeout to expire even if there were no sessions. That's fixed in this update. Also, I removed the custom gather class that didn't do much and used existing MDSGatherBuilder class. Plus, added a couple of command related tests.

@batrick I could not reproduce the deadlock issue you had mentioned earlier. Also, w.r.t. the asynchronous execution of "ceph tell" -- I think the command works fine -- although I haven't looked at the tell code closely. I'll try to take a look sometime tomorrow, but for now, the rest of the PR is reviewable.

dout(10) << "recall_client_state " << ratio
<< ", caps per client " << min_caps_per_client << "-" << max_caps_per_client
<< dendl;
dout(10) << __func__ << ": ration=" << ratio << ", caps per client "

This comment has been minimized.

Copy link
@batrick

batrick Sep 6, 2018

Member

typo "ratio"

}

JSONFormatter f(true);
if (!command_cache_drop(f, timeout)) {

This comment has been minimized.

Copy link
@batrick

batrick Sep 6, 2018

Member

I don't see how this will work (ceph tell). The dispatcher is processing this MCommand message here. I believe it will remain blocked on command_cache_drop until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)

I think it will be necessary to convert this drop cache command into an asynchronous Context which releases the dispatch thread back to the messenger. Consider that this is a problem we share with asynchronous scrub which currently executes in an unaccountable way in the MDS. It'd be valuable to have a generic "command" module of some kind which allows the operator to manipulate/stop these operations once started.

This doesn't need to be written now but it's worth thinking about. I think we can solve drop cache by just using contexts to drive the drop to completion and eventually send an MCommandReply once finished. The dispatch thread should be released ASAP. If it simplifies the code, go ahead and drop the asok version.

This comment has been minimized.

Copy link
@rishabh-d-dave

rishabh-d-dave Sep 7, 2018

Author Contributor

batrick wrote

I don't see how this will work (ceph tell). The dispatcher is processing this MCommand message here. I believe it will remain blocked on command_cache_drop until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)

It indeed does not -

$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep  7 10:22:51 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 583,
            "bytes": 57080
        }
    }
}
Fri Sep  7 10:22:51 UTC 2018

... unlike using the asok command -

$ date && ./bin/ceph daemon mds.a cache drop 20 && date
Fri Sep  7 10:19:38 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 729,
            "bytes": 67320
        }
    }
}
Fri Sep  7 10:19:38 UTC 2018

The tell command waits until timeout only when session->caps.size() > newlim[1] -

$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep  7 10:23:16 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 583,
            "bytes": 57080
        }
    }
}
Fri Sep  7 10:23:17 UTC 2018

For that mds min caps per client needs to be set (in ceph.conf) to some lower value (depending on FS).

This comment has been minimized.

Copy link
@vshankar

vshankar Sep 10, 2018

Contributor

Thanks @batrick -- I understand the issue with ceph tell now. With @rishabh-d-dave testing above, I expected client_recall status to return ETIMEDOUT.

Its probably worth to drive command_cache_drop() via context. I'll look into this and update the PR.

@vshankar vshankar force-pushed the rishabh-d-dave:mds-drop-cache-command branch from 8b10e79 to 73080b3 Sep 14, 2018
@vshankar

This comment has been minimized.

Copy link
Contributor

vshankar commented Sep 14, 2018

@batrick Updated PR implements "drop cache" as an asynchronous state machine. This required journal flush to be converted to an asynchronous state machine. The major difference with asynchronous execution is that the mds_lock is not held for the entire duration of command execution -- its dropped when invoking an async call and gets reacquired in the context callback. Would that be fine or would we want it to be as it is currently done (in the synchronous version)?

Also, the async implementation might not look elegant as of now -- but I wanted to get this out for reviews and also since we probably want to convert other (tell?) commands to execute asynchronously (for problems similar to what we faced when recalling client state). I have the async state machines as separate commits -- so if we want more discussions/eyes on that, I can resend this with just the asok interface for now.

@batrick batrick self-assigned this Sep 17, 2018
@batrick

This comment has been minimized.

Copy link
Member

batrick commented Sep 18, 2018

@vshankar I still need to look at your update but I want to add onto this PR a note that we should also think about an option to this command to ask clients to drop their object buffers/caches (for all types of clients, kernel and ceph-fuse). This could be a followup ticket or, if you're interested, we can add it to this PR.

@vshankar

This comment has been minimized.

Copy link
Contributor

vshankar commented Sep 20, 2018

@vshankar I still need to look at your update but I want to add onto this PR a note that we should also think about an option to this command to ask clients to drop their object buffers/caches (for all types of clients, kernel and ceph-fuse). This could be a followup ticket or, if you're interested, we can add it to this PR.

I would vote for having that as a follow-up enhancement.

@batrick

This comment has been minimized.

Copy link
Member

batrick commented Sep 21, 2018

needs rebase

Copy link
Member

batrick left a comment

Otherwise looks like great work! Thanks @vshankar

ss(ss), on_finish(on_finish) {
}

void send() {

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

start would be better I think

This comment has been minimized.

Copy link
@vshankar

vshankar Sep 22, 2018

Contributor

that's what I got used to when writing async state machines in rbd ;)

#define dout_context g_ceph_context
#define dout_subsys ceph_subsys_mds
#undef dout_prefix
#define dout_prefix *_dout << "mds." << whoami << '.' << incarnation << ' '

class C_Flush_Journal : public Context {

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

Let's inherit from MDSInternalContext so it hooks into some general purpose logging too.

This comment has been minimized.

Copy link
@vshankar

vshankar Sep 25, 2018

Contributor

makes sense

Context *on_finish;
};

class C_Drop_Cache : public Context {

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

MDSInternalContext here too

// case we change the logic here.
assert(mdsrank->mds_lock.is_locked());

if (recall_timeout < 0) {

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

Why not just use a uint64_t then?

gather.activate();

mdsrank->mds_lock.Unlock();
retval = sleeper->wait_for(recall_timeout);

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

We don't want to block the thread at all here. Let's use a SafeTimer (? maybe the right choice) which runs its finisher after recall_timeout. I think this will require two detachable contexts (like MDSContextSleeper). One waits for all clients to trim caps and the other waits for the Timer to complete. We won't be using a condition variable here because we don't want to wait at all. (MDSContextSleeper may get axed since I don't think we will use it.)

This comment has been minimized.

Copy link
@batrick

batrick Sep 21, 2018

Member

Looking at this again, specifically it will block the finisher thread which is a problem. (The dispatcher thread is fortunately free now, which is good!)

@@ -36,11 +36,244 @@

#include "MDSRank.h"

#include <boost/scope_exit.hpp>

This comment has been minimized.

Copy link
@batrick

batrick Sep 27, 2018

Member

Let's use src/include/scope_guard.h instead.

This comment has been minimized.

Copy link
@vshankar

vshankar Sep 28, 2018

Contributor

this will be removed with the latest update...

vshankar added 4 commits Aug 22, 2018
Fixes: http://tracker.ceph.com/issues/23362
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
With this command, the MDS would request clients to release
caps followed by trimming its own cache and a journal flush.
The command accepts a timeout to wait for clients to respond
to session recall and flush messages.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar force-pushed the rishabh-d-dave:mds-drop-cache-command branch from 73080b3 to 43d1b8e Sep 28, 2018
@vshankar

This comment has been minimized.

Copy link
Contributor

vshankar commented Sep 28, 2018

retest this please

@vshankar

This comment has been minimized.

Copy link
Contributor

vshankar commented Sep 28, 2018

@batrick fixed and updated with the following changes

  • journal and cache drop async state machines now subclass from MDSInternalContext. Also, included are dout logs carried over from earlier implementation.
  • waiting for clients (w/ timeout) to trim its cache is now non-blocking -- a custom C_Drop_Cache::C_ContextTimeout context is now used to trigger the timeout. this context is also passed to MDSGatherBuilder which gets completed if client recall happens before the timeout. also note, that MDSGatherBuilder takes care of deleting its finisher (our context callback in this case), so, in the case of client recall timeout, its left to the gather class to eventually free this context (later?).
    this change gets rid of custom MDSContextSleeper class and the changes in class Cond for the supporting member functions.
  • added tests for dropping cache using tell interface.
  • added range=1 for asok and tell interfaces for timeout argument.
@batrick batrick changed the title mds: add drop_cache command mds: add `drop cache` command Sep 28, 2018
@batrick batrick mentioned this pull request Sep 28, 2018
1 of 3 tasks complete
@batrick
batrick approved these changes Oct 1, 2018
@batrick batrick merged commit 43d1b8e into ceph:master Oct 1, 2018
5 checks passed
5 checks passed
Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
batrick added a commit that referenced this pull request Oct 1, 2018
* refs/pull/21566/head:
	test: add test for mds drop cache command
	mds: command to trim mds cache and client caps
	mds: implement journal flush as asynchronous context execution
	mds: cleanup some asok commands

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
@rishabh-d-dave rishabh-d-dave deleted the rishabh-d-dave:mds-drop-cache-command branch Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.