cephfs: fix future rctimes #37938

thmour · 2020-11-03T15:36:33Z

In our cephfs cluster we get users who import files with invalid ctimes.
This affects the rctime up to the root and the rctime doesn't get updated for every new write until the future date is met.
Applications which depend on rctime to detect file changes optimally (rsync) can't run now.
Make rctime modifiable so sysadmins can fix them after fixing the invalid file's ctime.
Also make mtime not change the rctime, because it is a user modifiable field and it can be set to the future.
rctime should only change based on the value of ctime instead.

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

github-actions · 2020-12-17T04:23:13Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

stale · 2021-06-11T01:55:06Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

thmour · 2021-06-23T08:29:49Z

@vshankar @batrick at CERN we are using some iterations of Owncloud file sharing with a custom storage backend, and we try to create a module for Owncloud to use a ceph backend with cephfs. In our clusters we have some files with future mtimes and that has affected the recursive ctimes that we use to detect which files have changed to be synchronised. And because of the future rctimes we have to check the whole tree for changes (huge performance impact). Can you have a look at this please?

vshankar · 2021-06-23T09:49:39Z

@vshankar @batrick at CERN we are using some iterations of Owncloud file sharing with a custom storage backend, and we try to create a module for Owncloud to use a ceph backend with cephfs. In our clusters we have some files with future mtimes and that has affected the recursive ctimes that we use to detect which files have changed to be synchronised. And because of the future rctimes we have to check the whole tree for changes (huge performance impact). Can you have a look at this please?

@thmour -- @mchangir making some really nice progress for fixing recursive stats in cephfs (especially for snapshots) -- requesting his review.

github-actions · 2021-08-13T17:29:04Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

dajrivera · 2021-09-10T01:10:26Z

I would also like to put in my vote for this to be merged. Currently I cannot use rctime as some of the directories have rctimes ~70 years in the future.

stale · 2022-01-09T07:27:08Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

thmour · 2022-01-10T14:22:48Z

@mchangir can you check this out? we need it for cephfs mirroring and a project I do with having Ceph as a backend for ownCloud. In some of our clusters we have future rctimes and we can't use these file synchronisation tools.

vshankar · 2022-01-11T05:35:31Z

@mchangir can you check this out? we need it for cephfs mirroring and a project I do with having Ceph as a backend for ownCloud. In some of our clusters we have future rctimes and we can't use these file synchronisation tools.

I'm mostly ok with this change as it provides a way to fix rctime inconsistencies for the user (note that fixing rctime esp. for snapshots is going to be a non-trivial effort).

I'll let @mchangir and @batrick review the details of the change though.

vshankar · 2022-01-11T05:37:36Z

@mchangir can you check this out? we need it for cephfs mirroring and a project I do with having Ceph as a backend for ownCloud. In some of our clusters we have future rctimes and we can't use these file synchronisation tools.

@thmour Side note - CephFS supports asynchronous mirroring of snapshots via cephfs-mirror tool (available in Ceph pacific release onward).

mchangir

LGTM
@vshankar fyi

Just a small question to @thmour
If rctime is updated manually, that rctime has to bubble up to the root dir or snap-root dir appropriately.
I don't see that happening in handle_set_vxattr()

gregsfortytwo

Editing rctimes seems like a reasonable solution to me, but two issues:

The wrong lock I commented on.
Do we need to worry about the security model? I guess maybe it's the same as accepting client ctimes on files? Which is interesting because client timestamps used to be the default everywhere but we've been gradually moving away from it... :/

src/mds/Server.cc

src/mds/Locker.cc

src/mds/MDCache.cc

batrick · 2022-01-13T14:01:46Z

src/mds/Server.cc

@@ -5823,6 +5823,34 @@ void Server::handle_set_vxattr(MDRequestRef& mdr, CInode *cur)
    auto pi = cur->project_inode(mdr);
    cur->setxattr_ephemeral_dist(val);
    pip = pi.inode.get();
+  } else if (name == "ceph.dir.rctime") {
+    if (!cur->is_dir()) {


How do you update an inode's rctime? You could fix all of the rctimes but none of the inodes?

vshankar · 2022-01-20T14:46:53Z

Recursive stats need to be maintained inside the MDS, or I'd expect all kinds of things to break when they later do get pushed around by changes. We shouldn't be twiddling them in user-accessible calls in ways which prevent that reconciliation from happening; @mchangir's next comment suggests this PR is behaving right so I think we're okay, though.

OK - nice. I was thinking about introducing more changes to this PR for that, which is unneeded - so +1

vshankar · 2022-02-07T09:32:22Z

src/mds/Server.cc

+    }
+    utime_t val(sec, nsec);
+
+    if (!xlock_nestlock(mdr, cur))


Fails with the following crash here: https://pulpito.ceph.com/vshankar-2022-02-05_17:27:49-fs-wip-vshankar-testing-20220201-113815-testing-default-smithi/6663067/

/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10531-g09fc2e9e/rpm/el8/BUILD/ceph-17.0.0-10531-g09fc2e9e/src/mds/Locker.cc: 4782: ceph_abort_msg("abort() called") ceph version 17.0.0-10531-g09fc2e9e (09fc2e9e470435f1e6dbf462a51853391f4a4cc0) quincy (dev) 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f83dd2ec91f] 2: (Locker::simple_lock(SimpleLock*, bool*)+0x588) [0x560d6e1a4108] 3: (Locker::xlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&)+0x380) [0x560d6e1a5dd0] 4: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x24a0) [0x560d6e1bb8c0] 5: (Server::xlock_nestlock(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0xd2) [0x560d6e00fab2] 6: (Server::handle_set_vxattr(boost::intrusive_ptr<MDRequestImpl>&, CInode*)+0x14d4) [0x560d6e036364] 7: (Server::handle_client_setxattr(boost::intrusive_ptr<MDRequestImpl>&)+0xcd) [0x560d6e03979d] 8: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xc7b) [0x560d6e0600ab] 9: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x566) [0x560d6e06bda6] 10: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x133) [0x560d6e074fe3] 11: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0xd3c) [0x560d6dfcb74c] 12: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7cb) [0x560d6dfce39b] 13: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x560d6dfce9bc] 14: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x560d6dfbd8b8] 15: (DispatchQueue::entry()+0x14fa) [0x7f83dd57240a] 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f83dd628891] 17: /lib64/libpthread.so.0(+0x814a) [0x7f83dc2c614a] 18: clone()

I think this needs to be a wrlock rather than an xlock.

Also, BTW, as per comment here(*), seems that we may want to acquire some additional locks? @batrick @gregsfortytwo What is dirlock? Seems to me like it is something existed and is got removed?

https://github.com/ceph/ceph/blob/master/src/mds/MDCache.cc#L2105

Also, BTW, as per comment here(*), seems that we may want to acquire some additional locks? @batrick @gregsfortytwo What is dirlock? Seems to me like it is something existed and is got removed?

https://github.com/ceph/ceph/blob/master/src/mds/MDCache.cc#L2105

I ran git log -G dirlock -- src/mds/

and got

commit fca6c27
Author: Sage Weil sweil@redhat.com
Date: Thu Dec 18 10:36:47 2008 -0800

mds: kill the dirlock

LOL
sigh

So it looks like Locker::xlock_start() checks the current lock state and calls either simple_xlock() or simple_lock() based on that.

if (lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_XLOCKDONE) {
mut->start_locking(lock);
simple_xlock(lock);
} else {
simple_lock(lock);
}

And we ended up in the simple_lock branch and it aborted because it doesn't recognize the current state when we get there. I really don't know why that might have happened — it might be that we just need the wrlock, but it also might be that we need more state guards in place elsewhere whose semantics I don't have in my head any more.

Yeh, that state isn't valid for scatterlock.

For the lock state machine in general, we would need extensive design write-up covering each state with its nuances to really make sense of what all states to add, which isn't going to be straightforward. In the interest of time switching to wrlock might just be OK.

It is observed that clients have files with invalid ctimes on cephfs. Even if those files can be fixed, the parent directories up to the root have their rctime set to an invalid value. Make rctime modifiable and let sysadmins fix those invalid rctimes recusively (bottom-up). Signed-off-by: Theofilos Mouratidis <t.mour@cern.ch>

vshankar · 2022-02-14T10:36:10Z

Will run it through test this week. thx @thmour

vshankar · 2022-03-02T06:51:47Z

Setting this xattr from kclient fails since its considered as a read-only attribute - https://github.com/ceph/ceph-client/blob/for-linus/fs/ceph/xattr.c#L365

Test failure here - https://pulpito.ceph.com/vshankar-2022-02-28_04:32:15-fs-wip-vshankar-testing-20220226-211550-testing-default-smithi/6710176/

This would require kclient allowing ceph.dir.rctime to be modified (This PR allows user-space driver to set this xattr).

@jtlayton ^^

mchangir · 2022-03-02T08:06:31Z

@jtlayton If my getvxattr kclient PR is merged in the testing kernel, then this should should've just worked ?
but my userland PR is not merged yet

jtlayton · 2022-03-02T11:40:47Z

@jtlayton If my getvxattr kclient PR is merged in the testing kernel, then this should should've just worked ? but my userland PR is not merged yet

No, I don't think so. This doesn't rely on GETVXATTR. The problem is that the rstats xattrs are marked READONLY so they aren't allowed to be changed currently. You'd need to add support for setting them to the kclient.

vshankar · 2022-03-02T12:10:54Z

@jtlayton If my getvxattr kclient PR is merged in the testing kernel, then this should should've just worked ? but my userland PR is not merged yet

No, I don't think so. This doesn't rely on GETVXATTR. The problem is that the rstats xattrs are marked READONLY so they aren't allowed to be changed currently. You'd need to add support for setting them to the kclient.

I'll send a fix for that.

vshankar · 2022-03-11T03:01:49Z

https://lore.kernel.org/ceph-devel/20220310143419.14284-1-vshankar@redhat.com/T/#u

lxbsz · 2022-03-17T08:34:40Z

src/mds/Locker.cc

    }
  }
+
+  if (changed) {
+    pi->ctime = std::max(pi->ctime, ceph_clock_now());


For the ctime change here seems buggy. The client could cache the ctime changing such as if the Fx or Ax is issued. What if the cached ctime is flushed to MDS 1 minute later ? The ctime here will always be increased by 1 miniute ?

vshankar · 2022-03-17T07:27:53Z

src/mds/Locker.cc

-      if (new_mtime > pi.inode->rstat.rctime)
-	pi.inode->rstat.rctime = new_mtime;
-    }
+    pi.inode->ctime = ceph_clock_now();


This seems to be breaking pjd test - https://pulpito.ceph.com/vshankar-2022-03-14_08:09:40-fs-wip-vshankar-testing1-20220314-103400-testing-default-smithi/6735687/

Yeah, and only the recovery will set the new_size and also the mtime will be readed from the data pool objects, I don't see any case will user modified mtime could affect the ctime here.

github-actions · 2022-09-05T00:05:04Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2022-10-17T11:58:04Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2022-12-27T22:01:27Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2023-01-26T23:01:19Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

batrick added cephfs Ceph File System needs-review labels Nov 3, 2020

github-actions bot added the needs-rebase label Dec 17, 2020

thmour force-pushed the utime_test branch from 5427910 to 9433eac Compare March 24, 2021 12:19

github-actions bot added tests and removed needs-rebase labels Mar 24, 2021

dvanders requested review from vshankar and batrick March 29, 2021 12:03

stale bot added the stale label Jun 11, 2021

stale bot removed the stale label Jun 23, 2021

vshankar requested a review from mchangir June 23, 2021 09:47

github-actions bot added the needs-rebase label Aug 13, 2021

thmour force-pushed the utime_test branch from 9433eac to 01b4745 Compare September 10, 2021 07:33

github-actions bot removed the needs-rebase label Sep 10, 2021

thmour force-pushed the utime_test branch from 01b4745 to a452cd2 Compare September 10, 2021 07:44

stale bot added the stale label Jan 9, 2022

stale bot removed the stale label Jan 10, 2022

mchangir reviewed Jan 13, 2022

View reviewed changes

gregsfortytwo requested changes Jan 13, 2022

View reviewed changes

src/mds/Server.cc Outdated Show resolved Hide resolved

batrick requested changes Jan 13, 2022

View reviewed changes

vshankar self-assigned this Feb 1, 2022

vshankar added the wip-vshankar-testing label Feb 1, 2022

vshankar mentioned this pull request Feb 2, 2022

mds: handle rctime updates for snaps #42894

Closed

3 tasks

vshankar reviewed Feb 7, 2022

View reviewed changes

vshankar removed the wip-vshankar-testing label Feb 7, 2022

thmour force-pushed the utime_test branch from d830736 to 83d8f09 Compare February 9, 2022 20:49

vshankar added the wip-vshankar-testing label Feb 21, 2022

vshankar removed the wip-vshankar-testing label Mar 2, 2022

vshankar added the wip-vshankar-testing1 label Mar 14, 2022

lxbsz reviewed Mar 17, 2022

View reviewed changes

vshankar reviewed Mar 17, 2022

View reviewed changes

vshankar removed the wip-vshankar-testing1 label Mar 19, 2022

djgalloway changed the base branch from master to main July 7, 2022 00:00

github-actions bot added the stale label Sep 5, 2022

github-actions bot added the needs-rebase label Oct 17, 2022

github-actions bot removed the stale label Oct 28, 2022

github-actions bot added the stale label Dec 27, 2022

github-actions bot closed this Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cephfs: fix future rctimes #37938

cephfs: fix future rctimes #37938

thmour commented Nov 3, 2020 •

edited

github-actions bot commented Dec 17, 2020

stale bot commented Jun 11, 2021

thmour commented Jun 23, 2021

vshankar commented Jun 23, 2021

github-actions bot commented Aug 13, 2021

dajrivera commented Sep 10, 2021

stale bot commented Jan 9, 2022

thmour commented Jan 10, 2022

vshankar commented Jan 11, 2022

vshankar commented Jan 11, 2022

mchangir left a comment •

edited

gregsfortytwo left a comment

batrick Jan 13, 2022

vshankar commented Jan 20, 2022

vshankar Feb 7, 2022

gregsfortytwo Feb 7, 2022

vshankar Feb 9, 2022

vshankar commented Feb 14, 2022

vshankar commented Mar 2, 2022

mchangir commented Mar 2, 2022

jtlayton commented Mar 2, 2022

vshankar commented Mar 2, 2022

vshankar commented Mar 11, 2022

lxbsz Mar 17, 2022

vshankar Mar 17, 2022

lxbsz Mar 17, 2022 •

edited

github-actions bot commented Sep 5, 2022

github-actions bot commented Oct 17, 2022

github-actions bot commented Dec 27, 2022

github-actions bot commented Jan 26, 2023

cephfs: fix future rctimes #37938

cephfs: fix future rctimes #37938

Conversation

thmour commented Nov 3, 2020 • edited

Checklist

github-actions bot commented Dec 17, 2020

stale bot commented Jun 11, 2021

thmour commented Jun 23, 2021

vshankar commented Jun 23, 2021

github-actions bot commented Aug 13, 2021

dajrivera commented Sep 10, 2021

stale bot commented Jan 9, 2022

thmour commented Jan 10, 2022

vshankar commented Jan 11, 2022

vshankar commented Jan 11, 2022

mchangir left a comment • edited

Choose a reason for hiding this comment

gregsfortytwo left a comment

Choose a reason for hiding this comment

batrick Jan 13, 2022

Choose a reason for hiding this comment

vshankar commented Jan 20, 2022

vshankar Feb 7, 2022

Choose a reason for hiding this comment

gregsfortytwo Feb 7, 2022

Choose a reason for hiding this comment

vshankar Feb 9, 2022

Choose a reason for hiding this comment

vshankar commented Feb 14, 2022

vshankar commented Mar 2, 2022

mchangir commented Mar 2, 2022

jtlayton commented Mar 2, 2022

vshankar commented Mar 2, 2022

vshankar commented Mar 11, 2022

lxbsz Mar 17, 2022

Choose a reason for hiding this comment

vshankar Mar 17, 2022

Choose a reason for hiding this comment

lxbsz Mar 17, 2022 • edited

Choose a reason for hiding this comment

github-actions bot commented Sep 5, 2022

github-actions bot commented Oct 17, 2022

github-actions bot commented Dec 27, 2022

github-actions bot commented Jan 26, 2023

thmour commented Nov 3, 2020 •

edited

mchangir left a comment •

edited

lxbsz Mar 17, 2022 •

edited