Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
cephfs: fix mount point break off problem after mds switch occured #14267
The hot-standby become active as we expected but the mount piont broken strangely when the active mds is down. The root reason is the new mds use last_cap_renews decoded from ESesson::replay in find_idle_sessions and wrongly killed the session. Maybe we should reset session->last_cap_renew to the current time when server send OPEN to client in reconnect stage.
So usually the client sends a renewcaps right after opening the session, and the MDS doesn't start evicting idle sessions until it reaches the clientreplay state. But I guess there's nothing to guarantee that, so we had a race here? Still, usually proceeding through the MDS states takes several seconds due to mdsmap updates on the mons, so I'm surprised an MDS ever reached clientreplay before it recieved its first renewcaps from the client.
I'd be interested to see a log of the case where it is incorrectly evicting a client, as it seems like kind of an unlikely race.
Okay, thank you for answering me, to tell the truth, this my first commit to open source community and i feel so cool, haha.
the root cause the lattest session->last_cap_renew was set by ESession::replay during standby-replay that was so laggy behind the current time.
2017-03-30 22:36:45.812142 7f99147d9700 1 mds.0.journal ESession::replay after get_or_add_session last_cap_renew is 2017-03-30 22:36:45.812140 session is client.590301 192.168.10.9:0/626254714 session addr 0x7f992c86b180
2017-03-30 22:43:01.959201 7f991b4eb700 1 mds.0.server enter handle_client_session client_session(request_renewcaps seq 22) v1 from client.590301