Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: improved replication statistics #34408

Merged
merged 8 commits into from Apr 9, 2020
Merged

Conversation

dillaman
Copy link

@dillaman dillaman commented Apr 3, 2020

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Jason Dillaman added 7 commits April 3, 2020 13:14
The initial version mimics the existing ProgressContext callback
interface. Later commits will add additional deep-copy unique methods.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
These simple stats will be utilized by rbd-mirror to compute throughput
metrics for snapshot-based mirroring.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
When metrics are incorporated, there might not be a forced status update
if no new data is available to replicate. However, we will want the metrics
to decrease over time.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The free-form journal replay status description is now JSON-encoded. The
"master"/"mirror" designators have been changed to "primary"/"non_primary"
to better align with RBD terminology.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The mirror image status for replaying journal-based images now includes
bytes and entries per second in addition to an estimated number of seconds
until the image is fully synced.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This will make it cleaner and easier to add additional data fields
to the existing JSON replaying status.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The mirror image status for replaying snapshot-based images now includes
bytes per second and per snapshot, in addition to an estimated number of
seconds until the image is fully synced.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>

if (m_entries_behind_master > 0 && entries_per_second > 0) {
auto seconds_until_synced = round_to_two_places(
m_entries_behind_master / entries_per_second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to improve this, but I think it will be underestimate if the primary image is actively writing at that time. If we had something like new_journal_entries_per_second, for the case new_journal_entries_per_second << entries_per_second the formula could be

seconds_until_synced = entries_behind_master / entries_per_second * (1 + new_journal_entries_per_second / entries_per_second)

For larger new_journal_entries_per_second it would need to include additional terms, and if new_journal_entries_per_second >= entries_per_second it would just mean that the sync never complete.

Alternatively I think we could estimate seconds_until_synced by measuring the rate of entries_behind_master change. If it is growing -- the sync will never complete. If it is decreasing, seconds_until_synced = entries_behind_master / rate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and had put something in initially to track it, but the issue is that it's going to be a highly inaccurate stat since we only discover new master entries every 30 seconds or so. Since we don't get the actual timestamp from the master position commit, we just have to guess that it's X entries per 30 seconds.

auto pending_bytes = bytes_per_snapshot * m_pending_snapshots;
if (bytes_per_second > 0 && m_pending_snapshots > 0) {
auto seconds_until_synced = round_to_two_places(
pending_bytes / bytes_per_second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here: in this estimate we don't include new bytes that are going to be added to the image during sync. If we had something like new_snapshots_per_second, it could be:

seconds_until_synced = bytes_per_snapshot * pending_snapshots / bytes_per_second * (1 + new_snapshots_per_second * bytes_per_snapshot / bytes_per_second)

Copy link
Author

@dillaman dillaman Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can measure new snapshots in "snapshots per second", I think we have a problem. 😉

Once a new snapshot is added, the stats will update after it rescans the image since we now track how many snapshots we need to sync + the average number of bytes we copy per snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, "snapshots per second" is expected to be a float less than 1, e.g. if snapshots are created hourly it would be 1 / 3600. Anyway, thinking more about it, I tend to agree there is no need in overcomplicating this.

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trociny
Copy link
Contributor

trociny commented Apr 8, 2020

@dillaman Running on teuthology, observing rbd-mirror crashes [1], though I don't see how it could be related here.

For that run I had 3 crashes, with the same backtrace. Here is one case:

/ceph/teuthology-archive/trociny-2020-04-07_18:48:34-rbd-wip-mgolub-testing-distro-basic-smithi/4932067/remote/smithi057/log/cluster1-client.mirror.1.30307.log.gz

2020-04-08T07:31:44.943+0000 7f68941a1700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/giganti
c/release/16.0.0-432-g2e5188a/rpm/el8/BUILD/ceph-16.0.0-432-g2e5188a/src/tools/rbd_mirror/InstanceWatcher.cc: In function 'rbd::mirror::InstanceWatcher<ImageCtxT>::~InstanceWatcher() [with ImageCtxT = librbd::ImageCtx]' thread 7f68941a1700 time 2020-04-08T07:31:44.944409+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-432-g2e5188a/rpm/el8/BUILD/c
eph-16.0.0-432-g2e5188a/src/tools/rbd_mirror/InstanceWatcher.cc: 340: FAILED ceph_assert(m_requests.empty())

 ceph version 16.0.0-432-g2e5188a (2e5188a30ff76399e4f30f0ad2286465fb96e8cb) octopus (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f689a0ed030]
 2: (()+0x27b24a) [0x7f689a0ed24a]
 3: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0x145) [0x56262a01b455]
 4: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0xd) [0x56262a01b49d]
 5: (rbd::mirror::NamespaceReplayer<librbd::ImageCtx>::handle_shut_down_instance_watcher(int)+0x85) [0x562629ff3a45]
 6: (ThreadPool::PointerWQ<Context>::_void_process(void*, ThreadPool::TPHandle&)+0x148) [0x562629fb3148]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xe64) [0x7f689a1d9ea4]
 8: (ThreadPool::WorkThread::entry()+0x15) [0x7f689a1da705]
 9: (()+0x82de) [0x7f689952a2de]
 10: (clone()+0x43) [0x7f6897a99133]

In the log I traced this pending request to request_id=6:

2020-04-08T07:19:09.595+0000 7f9348484700 10 rbd::mirror::InstanceWatcher: 0x555cc14f0380 prepare_request: instance_id=4435, request_id=6
2020-04-08T07:19:09.595+0000 7f9348484700 10 rbd::mirror::InstanceWatcher: 0x555cc14f0380 handle_sync_start: instance_id=4435, sync_id=11491b28a49a
2020-04-08T07:19:09.595+0000 7f935549e700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x555cc44cbec0 finish: r=0
2020-04-08T07:19:09.595+0000 7f935549e700 10 rbd::mirror::InstanceWatcher: C_SyncRequest: 0x555cc46f8b90 finish: r=0
2020-04-08T07:19:09.595+0000 7f935549e700 10 rbd::mirror::InstanceWatcher: 0x555cc14f0380 handle_notify_sync_request: sync_id=11491b28a49a, r=0
2020-04-08T07:19:09.595+0000 7f935549e700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555cc724ec00 handle_request_sync: r=0
2020-04-08T07:19:09.595+0000 7f935549e700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555cc724ec00 copy_image: remote_snap_id_start=74, remote_snap_id_end=76, local_snap_id_start=77, last_copied_object_number=0, snap_seqs={75=79,76=18446744073709551614}

And for this replayer I see that it seemed to get stuck in unlink peer request:

2020-04-08T07:19:09.604+0000 7f932f452700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x555cc724ec00 unlink_peer: remote_snap_id=74

Unfortunately I didn't find logs for the remote context to look deeper there.

I looked at another case: /ceph/teuthology-archive/trociny-2020-04-07_18:48:34-rbd-wip-mgolub-testing-distro-basic-smithi/4932031/remote/smithi079/log/cluster1-client.mirror.1.15789.log.gz, and that time it also got stuck on unlink_peer (pending request_id=13, snapshot::Replayer: 0x55f6ac036f00).

[1] http://pulpito.ceph.com/trociny-2020-04-07_18:48:34-rbd-wip-mgolub-testing-distro-basic-smithi/

@dillaman
Copy link
Author

dillaman commented Apr 8, 2020

That is something I have fixed in my in-progress work. There is a missing lock in Replayer::request_sync that can race with shutdown and cancel sync request. I can pull that one-liner out to a new PR / this PR to unstick testing.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

dillaman commented Apr 8, 2020

@trociny I pushed the sync race fix as the last commit.

@trociny trociny merged commit 978a2e3 into ceph:master Apr 9, 2020
@dillaman dillaman deleted the wip-44727 branch April 9, 2020 11:26
@caisan
Copy link

caisan commented Jul 9, 2020

hi, any plans for backporting to N or M version ?

@trociny
Copy link
Contributor

trociny commented Jul 9, 2020

There is plan to backport it to Nautilus. See https://tracker.ceph.com/issues/44727 and related.

@caisan
Copy link

caisan commented Jul 14, 2020

There is plan to backport it to Nautilus. See https://tracker.ceph.com/issues/44727 and related.

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants