New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds/Server.cc: Don't evict a slow client if... #12935

Merged
merged 1 commit into from Apr 24, 2017

Conversation

Projects
None yet
6 participants
@stiopaa1
Contributor

stiopaa1 commented Jan 15, 2017

... it's the only client

Fixes: http://tracker.ceph.com/issues/17855
Signed-off-by: Michal Jarzabek stiopa@gmail.com

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 16, 2017

Looks good to me, what do you think @ukernel @gregsfortytwo ?

@jcsp

jcsp approved these changes Jan 16, 2017

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jan 16, 2017

Yeah, this is a nice simple one. I was debating if we should try and do something even a little more sophisticated like see if the newest client is new enough to avoid being evicted?
That doesn't seem much harder but then of course it's a pretty continuous slope.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 17, 2017

see if the newest client is new enough to avoid being evicted?

@gregsfortytwo I don't think I follow?

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jan 17, 2017

This logic is preventing us from booting a single client. A marginally more complicated thing is to see if all the clients are in the same laggy/not-laggy state, and not evict them if they are all the same even if they're laggy.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 17, 2017

Oh right, that makes sense.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 17, 2017

Unrelated: the test for this seems to be failing:

2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:======================================================================
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:FAIL: test_evict_client (tasks.cephfs.test_misc.TestMisc)
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-jcsp-testing-20170116/qa/tasks/cephfs/test_misc.py", line 62, in test_evict_client
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:    self.assert_session_count(1, ls_data)
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-jcsp-testing-20170116/qa/tasks/cephfs/cephfs_test_case.py", line 211, in assert_session_count
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:    expected, len(ls_data)
2017-01-17T03:09:52.250 INFO:tasks.cephfs_test_runner:AssertionError: Expected 1 sessions, found 2

@stiopaa1 is it passing when you run locally with vstart_runner.py?

@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Jan 17, 2017

@jcsp
Yes, it it was definitely passing when I was testing it locally

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 17, 2017

@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Jan 17, 2017

@jcsp
I think it should work now.

The previous version worked for me locally only because I had
mds_session_autoclose equal to mds_session_timeout in ceph.conf

@ukernel

This comment has been minimized.

Member

ukernel commented Jan 18, 2017

mds may send session flush message to client during subtree migration. If the only client is really dead, it hangs the subtree migration process

@jcsp

This comment has been minimized.

Contributor

jcsp commented Feb 1, 2017

Discussed this briefly on CDM call -- I think we need some logic here during export to evict the slow client if we left it alive because it was the only one.

There is also the case where the client is slow on mds A but migration is from B to C -- in this case we just don't care.

@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Feb 15, 2017

@jcsp

Discussed this briefly on CDM call -- I think we need some logic here during export to evict the slow client if we left it alive because it was the only one.

Not familiar yet with exporting, but just had a quick look and found Migrator::export_dir function. Is it more or less where I should be looking?

@ukernel

This comment has been minimized.

Member

ukernel commented Feb 23, 2017

I think we can enable this for single mds only

@jcsp

This comment has been minimized.

Contributor

jcsp commented Mar 8, 2017

@stiopaa1 for now, let's just make this code only run when there is a single MDS.

So something like this:

if (mds->sessionmap.get_sessions().size() == 1 && mds->mdsmap->get_num_in_mds() == 1)

Sorry for the slow response.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Mar 8, 2017

(& this needs a rebase)

@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Mar 21, 2017

@jcsp
will have a look into this

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Apr 10, 2017

@stiopaa1 ping?

@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Apr 10, 2017

@tchaikov
Sorry, I've been a bit busy recently. Will try to have a look at this at the weekend.

mds/Server.cc: Don't evict a slow client if...
... it's the only client

Fixes: http://tracker.ceph.com/issues/17855
Signed-off-by: Michal Jarzabek <stiopa@gmail.com>
@stiopaa1

This comment has been minimized.

Contributor

stiopaa1 commented Apr 23, 2017

@tchaikov @jcsp
I have added:
mds->mdsmap->get_num_in_mds() == 1

and rebased

@tchaikov tchaikov requested a review from jcsp Apr 23, 2017

@jcsp

jcsp approved these changes Apr 23, 2017

@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 24, 2017

Thanks @stiopaa1, I've pulled this into my latest testing branch so it should merge soon

@jcsp jcsp merged commit d0d3a4a into ceph:master Apr 24, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment