Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: read incremental metalog from master cluster based on truncate v… #36202

Closed
wants to merge 2 commits into from

Conversation

moningchao
Copy link
Contributor

…ariable

when the log entry in the meta.log object of the secondary cluster is empty,
the value of max_marker is also empty,which can't meet the requirement that
mdlog_marker <= max_marker,resulting in that the secondary cluster can't fetch
new log entry from the master cluster and infinite loop,finally, the secondary
cluster's metadata can't catch up the master cluster. when the truncate is false,
it means that the secondary cluster's meta.log is empyt,we can read more from
master cluster.

Fixes: https://tracker.ceph.com/issues/46563

Signed-off-by: gengjichao gengjichao@jd.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

…ariable

when the log entry in the meta.log object of the secondary cluster is empty,
the value of max_marker is also empty,which can't meet the requirement that
mdlog_marker <= max_marker,resulting in that the secondary cluster can't fetch
new log entry from the master cluster and infinite loop,finally, the secondary
cluster's metadata can't catch up the master cluster. when the truncate is false,
it means that the secondary cluster's meta.log is empyt,we can read more from
master cluster.

Fixes: https://tracker.ceph.com/issues/46563

Signed-off-by: gengjichao <gengjichao@jd.com>
@tchaikov tchaikov added the rgw label Jul 20, 2020
@moningchao
Copy link
Contributor Author

@cbodley Can you help me review the code?

@stale
Copy link

stale bot commented Nov 7, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Nov 7, 2020
@frittentheke
Copy link

@cbodley I jused commented in https://tracker.ceph.com/issues/46563 as I apparently hit this very issue with a current Ceph Octopus. Any chance for this proposed fix to get merged? Any other workaround?

@stale stale bot removed the stale label Nov 4, 2021
@frittentheke
Copy link

frittentheke commented Nov 5, 2021

@moningchao could you maybe take a look at the failing checks and rebase your PR onto master?

Edit: I triggered a retest to see what is actually still failing.

@frittentheke
Copy link

jenkins retest this please

@moningchao
Copy link
Contributor Author

@moningchao could you maybe take a look at the failing checks and rebase your PR onto master?

Edit: I triggered a retest to see what is actually still failing.

rebase done

@frittentheke
Copy link

@cbodley @dang could you maybe kindly take a look at this PR? I don't know if I could have assigned this or raise awareness somehow.

But I am able to reproduce this issue over and over with a rolling restart of radosgw instances on Ceph Octopus. I usually end up with one metadata shard being behind.
The (only) fix is then to stop all the secondary instances and to issue a metadata sync init and restart them. See corresponding issue: https://tracker.ceph.com/issues/46563

@frittentheke
Copy link

@mattbenjamin @cbodley I just observed the condition of some metadata shards being stuck after restarts of RADOSGW.
Could you kindly take a look at this. I am just hoping this PR fixes the issue for good, as otherwise we would need to dig further into the issue. I am just puzzled that only few people observed this issue - with us being able to provoke this quite reliably with restarts of RADOSGW instances of the zonemaster.

@mattbenjamin
Copy link
Contributor

thanks, @frittentheke ; looking for feedback from @cbodley

@mattbenjamin
Copy link
Contributor

or perhaps @adamemerson

@cbodley
Copy link
Contributor

cbodley commented Jan 4, 2022

i still haven't been able to reproduce this one on master. if you're only hitting this on octopus, it may be because https://tracker.ceph.com/issues/51784 hasn't been backported

@frittentheke
Copy link

i still haven't been able to reproduce this one on master. if you're only hitting this on octopus, it may be because https://tracker.ceph.com/issues/51784 hasn't been backported

Thanks for your reply @cbodley . Is there any additional info or debugging I should do to determine what is actually the issue with the stuck metadata replication? So "which" of the two issues we are actually hitting.

As for replicating the issue on your end - We are running the replication with a list of 3 RADOSGW hosts on each end. So this is NOT the usual RADOSGW->LB->RADOSGW setup. I know there is locking of shards happening and all, but maybe RADOSGW behaves differently when working with a list of endpoints for the other zone and not just a single (LB-powered) endpoint?

@frittentheke
Copy link

frittentheke commented Jan 26, 2022

@cbodley

  1. https://tracker.ceph.com/issues/51784 looks quite promising to be a cause of our issue as we do not use a loadbalancer handling connections coming from the other clusters RGW for replication. Restarting an RGW instance therefore is not transparent, but will actually fail in case cluster A does a rolling restart of RGW instances. Is there any chance this backport will happen for Octopus?
  2. What about the code of this PR? Is this not a valid fix?

@frittentheke
Copy link

@cbodley sorry for being a PITA, but I just observed an "interesting" crash at https://tracker.ceph.com/issues/46563#note-9 which then left a our multisite with the issue of a single stuck metadata shard no being synced. Maybe this helps narrowing down the whole issue more?

@mgugino-uipath
Copy link

i still haven't been able to reproduce this one on master. if you're only hitting this on octopus, it may be because https://tracker.ceph.com/issues/51784 hasn't been backported

@cbodley

I encountered metadata never catching up with message 'metadata sync syncing' on v16.2.6. Initially, sync was failing due to scale-down profile noted in v16.2.7's release notes. This was causing default pool <zone-name>.rgw.otp to fail to be created, and metadata sync was getting a 400 on metadata/otp key. You can see the 400 error by increasing verbosity to the radosgw-admin command during metadata sync run or on the rgw logs. You can also observe it off cluster with the following command:
awscurl --service s3 'https://rgw1/admin/metadata/otp' --access_key <access> --secret_key <secret> -vvv

After using a variety of commands to set the default profile to 'scale-up' on both ceph clusters, the otp pool was eventually created. Executing radosgw-admin metadata sync run caught the cluster up, all existing unsynced buckets (metadata) created on rgw1 were synced to rgw2 (data was already synced). Afterwards, buckets created on rgw2 were synced to rgw1. However, creating new buckets on rgw1 post manual sync never made it to rgw2. Restarting rgw2 corrected the issue.

I'm not sure if this PR would correct this set of circumstances (prolonged problem with metadata syncing that persisted for a couple days), but some form of it seems to be present as recent as 16.2.6.

Copy link
Contributor

@adamemerson adamemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean this up to get rid of the merge commit, please.

@djgalloway djgalloway changed the base branch from master to main July 9, 2022 00:00
@github-actions
Copy link

github-actions bot commented Sep 7, 2022

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 7, 2022
@cbodley
Copy link
Contributor

cbodley commented Sep 7, 2022

thanks @moningchao, it looks like this merged with #46148

@cbodley cbodley closed this Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants