Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: handle rctime updates for snaps #42894

Closed

Conversation

mchangir
Copy link
Contributor

Conditionally update snap rctime only if snap btime is greater than new
rctime.

If snapshots are taken frequently when lazyio is being used, then the
rctime may not be updated for the snapshot for which the capabilities
are flushed by the client, but a subsequent snapshot. This is due to
the fact that this patch assumes that all relevant data writes have
been persisted to the OSDs and caps flushed to the MDS before the
snapshot is taken.

To avoid skipping a snapshot or two for correct rctime updates to
snapshot directories, the client should wait for at least twice the
client_oc_max_dirty_age duration before creating the snapshot dir.

Fixes: https://tracker.ceph.com/issues/50238
Signed-off-by: Milind Changire mchangir@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Conditionally update snap rctime only if snap btime is greater than new
rctime.

If snapshots are taken frequently when lazyio is being used, then the
rctime may not be updated for the snapshot for which the capabilities
are flushed by the client, but a subsequent snapshot. This is due to
the fact that this patch assumes that all relevant data writes have
been persisted to the OSDs and caps flushed to the MDS before the
snapshot is taken.

To avoid skipping a snapshot or two for correct rctime updates to
snapshot directories, the client should wait for at least twice the
client_oc_max_dirty_age duration before creating the snapshot dir.

Fixes: https://tracker.ceph.com/issues/50238
Signed-off-by: Milind Changire <mchangir@redhat.com>
@github-actions github-actions bot added the cephfs Ceph File System label Aug 23, 2021
@jtlayton
Copy link
Contributor

To avoid skipping a snapshot or two for correct rctime updates to
snapshot directories, the client should wait for at least twice the
client_oc_max_dirty_age duration before creating the snapshot dir.

Is that enforced in any way? What happens if you don't wait? Does this apply to the kclient too?

pin->dirty_old_rstats.insert(last);
} else {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation looks really weird.

Copy link
Contributor Author

@mchangir mchangir Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's probably how the TAB character is emulated by the web browser code display component
things look okay in the code (Vim editor)

@mchangir
Copy link
Contributor Author

To avoid skipping a snapshot or two for correct rctime updates to
snapshot directories, the client should wait for at least twice the
client_oc_max_dirty_age duration before creating the snapshot dir.

Is that enforced in any way? What happens if you don't wait? Does this apply to the kclient too?

The waiting period is not enforced by MDS in any way.
The waiting period does apply for kclient as well.
The waiting period is a recommendation to the client (end user) ... rather to the entity who initiates the mkdir for creating a snap
As mentioned in the commit message, a designated snap might not get updated with the latest rstats as expected and might reflect an old rctime;
Data does get written to the correct snap at the OSD side.

Since things are asynchronous between snap creation and data and metadata flush, its difficult to uniformly enforce the rstat updates for various rstat update scenarios.
The primary idea is to visualize the snap creation and data/metadata update as synchronous and enforce rstat updates as if data were flushed completely followed by the final mkdir metadata operation for snap creation. In this condition, you'd never have an rctime greater than the snap btime

So, this patch updates snap rstat only for snaps whose btime is more than the impending rctime update.

@stale
Copy link

stale bot commented Jan 9, 2022

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jan 9, 2022
@vshankar vshankar closed this Feb 2, 2022
@vshankar
Copy link
Contributor

vshankar commented Feb 2, 2022

Fixing rctime is probably going to take a redesign of how recursive stats are tracked in the MDS.

Till that gets prioritized, designed and implemented, users can set (correct) rctime as a workaround. This is tracked by #37938

@gregsfortytwo
Copy link
Member

Fixing rctime is probably going to take a redesign of how recursive stats are tracked in the MDS.

Till that gets prioritized, designed and implemented, users can set (correct) rctime as a workaround. This is tracked by #37938

Hmm, does that PR work on snapshots? I would have thought not since users are generally prevented from writing updates in to them.

@vshankar
Copy link
Contributor

vshankar commented Feb 9, 2022

Hmm, does that PR work on snapshots? I would have thought not since users are generally prevented from writing updates in to them.

PR #37938 works on non-snapshotted inodes.

This PR was aimed at fixing issues with rctime in general. @mchangir IIRC, you ran into rctime issues with non-snapshotted inodes too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System stale
Projects
None yet
4 participants