New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: sync status compares the current master period #12907
Conversation
still incomplete, and seeing that metadata sync still always reports that it is behind by a few shards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking this on!
|
||
ret = sync.read_master_log_shards_info(&master_period, &master_shards_info); | ||
/* Set the master zonegroup as the remote */ | ||
RGWPeriod current_period(local_period); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming of local_period
and current_period
here is confusing. the local/current period is already initialized as RGWRados::current_period
, and can be queried with store->get_current_period_id()
. the master zone is aggressive about making sure other zones have the latest period, so it's safe to trust our local copy instead of querying it from the master's realm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, didn't know that, will modify
status.push_back(string("failed to fetch realm info from master: ") + cpp_strerror(-ret)); | ||
return; | ||
} | ||
ret = sync.read_master_log_shards_info(&master_period, &master_shards_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_status.sync_info.period
records our sync progress in the master's mdlogs, which are split up by period. master_period
from sync.read_master_log_shards_info()
just tells new zones which period to start on for incremental sync, so we shouldn't consult master_period
here at all
the comparison below should just be if (sync_status.sync_info.period != store->get_current_period_id())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah earlier we were checking against the current_period
againstmaster_period
from the above call, which is why I thought the md sync status was always wrong (since that is the first period to start sync from). Will modify to get_current_period_id
.
looks good, thanks. can we drop the changes to |
This ensures that we get the current period in contrast to the admin log which gets the master's earliest period. Fixes: http://tracker.ceph.com/issues/18064 Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Dropped the other commits, now while the sync status no longer shows the period error, I'm often seeing that the metadata in secondary is always behind by one shard, realm 3b284993-df62-448a-acdd-02fc217b71bf (gold)
zonegroup 89cd2f99-0f4b-4c0d-86d1-9fc2c9071c45 (us)
zone 97ea1518-abe3-4c6b-acfd-2113929ac77a (us-west)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is behind on 1 shards
data sync source: b4c5323d-30a2-4a4f-b84e-59dac54f5ba8 (us-east-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source and this shard seems to be mdlog from the master's first period which was the zone user creation, though the zone user is created in the secondary, I'm not seeing a mdlog entry about this. |
okay, that's an issue with -int RGWRemoteMetaLog::read_master_log_shards_info(string *master_period, map<int, RGWMetadataLogInfo> *shards_info)
+int RGWRemoteMetaLog::read_master_log_shards_info(const string& period, map<int, RGWMetadataLogInfo> *shards_info)
{
if (store->is_meta_master()) {
return 0;
}
rgw_mdlog_info log_info;
int ret = read_log_info(&log_info);
if (ret < 0) {
return ret;
}
-
- *master_period = log_info.period;
- return run(new RGWReadRemoteMDLogInfoCR(&sync_env, log_info.period, log_info.num_shards, shards_info));
+ return run(new RGWReadRemoteMDLogInfoCR(&sync_env, period, log_info.num_shards, shards_info));
} |
I see, I thought there were other consumers of the master_period, looks like there isn't, in which case we can make this change |
This is needed for rgw admin's sync status or else we end up always publishing that we're behind since we are always checking against master's first period to sync from Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Synced state:
Syncing state
|
Also make the sync output look similar to the output of data sync Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
test this please |
Previously sync status compared master's oldest period against the current local period, leading to an error message. Fixing this by getting the current period from realm.
Fixes: http://tracker.ceph.com/issues/18064