Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: sync status compares the current master period #12907

Merged
merged 3 commits into from Jan 18, 2017

Conversation

theanalyst
Copy link
Member

Previously sync status compared master's oldest period against the current local period, leading to an error message. Fixing this by getting the current period from realm.

Fixes: http://tracker.ceph.com/issues/18064

@theanalyst
Copy link
Member Author

still incomplete, and seeing that metadata sync still always reports that it is behind by a few shards

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this on!


ret = sync.read_master_log_shards_info(&master_period, &master_shards_info);
/* Set the master zonegroup as the remote */
RGWPeriod current_period(local_period);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming of local_period and current_period here is confusing. the local/current period is already initialized as RGWRados::current_period, and can be queried with store->get_current_period_id(). the master zone is aggressive about making sure other zones have the latest period, so it's safe to trust our local copy instead of querying it from the master's realm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, didn't know that, will modify

status.push_back(string("failed to fetch realm info from master: ") + cpp_strerror(-ret));
return;
}
ret = sync.read_master_log_shards_info(&master_period, &master_shards_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_status.sync_info.period records our sync progress in the master's mdlogs, which are split up by period. master_period from sync.read_master_log_shards_info() just tells new zones which period to start on for incremental sync, so we shouldn't consult master_period here at all

the comparison below should just be if (sync_status.sync_info.period != store->get_current_period_id())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah earlier we were checking against the current_period againstmaster_period from the above call, which is why I thought the md sync status was always wrong (since that is the first period to start sync from). Will modify to get_current_period_id.

@cbodley
Copy link
Contributor

cbodley commented Jan 12, 2017

looks good, thanks. can we drop the changes to do_realm_get()?

This ensures that we get the current period in contrast to the admin log
which gets the master's earliest period.

Fixes: http://tracker.ceph.com/issues/18064
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@theanalyst
Copy link
Member Author

Dropped the other commits, now while the sync status no longer shows the period error, I'm often seeing that the metadata in secondary is always behind by one shard,

          realm 3b284993-df62-448a-acdd-02fc217b71bf (gold)
      zonegroup 89cd2f99-0f4b-4c0d-86d1-9fc2c9071c45 (us)
           zone 97ea1518-abe3-4c6b-acfd-2113929ac77a (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 1 shards
      data sync source: b4c5323d-30a2-4a4f-b84e-59dac54f5ba8 (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

and this shard seems to be mdlog from the master's first period which was the zone user creation, though the zone user is created in the secondary, I'm not seeing a mdlog entry about this.

@cbodley
Copy link
Contributor

cbodley commented Jan 13, 2017

and this shard seems to be mdlog from the master's first period

okay, that's an issue with RGWRemoteMetaLog::read_master_log_shards_info(). it's reading the rgw_mdlog_info from the master, then requesting the mdlog shards for rgw_mdlog_info::period, which is the master's oldest mdlog period. what we want is the shards for the current period - so we should either pass in our store->get_current_period_id() or an empty string (which the master will interpret to mean 'current period'). something like this?

-int RGWRemoteMetaLog::read_master_log_shards_info(string *master_period, map<int, RGWMetadataLogInfo> *shards_info)
+int RGWRemoteMetaLog::read_master_log_shards_info(const string& period, map<int, RGWMetadataLogInfo> *shards_info)
 {
   if (store->is_meta_master()) {
     return 0;
   }
 
   rgw_mdlog_info log_info;
   int ret = read_log_info(&log_info);
   if (ret < 0) {
     return ret;
   }
-
-  *master_period = log_info.period;
 
-  return run(new RGWReadRemoteMDLogInfoCR(&sync_env, log_info.period, log_info.num_shards, shards_info));
+  return run(new RGWReadRemoteMDLogInfoCR(&sync_env, period, log_info.num_shards, shards_info));
 }

@theanalyst
Copy link
Member Author

I see, I thought there were other consumers of the master_period, looks like there isn't, in which case we can make this change

This is needed for rgw admin's sync status or else we end up always
publishing that we're behind since we are always checking against
master's first period to sync from

Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@theanalyst
Copy link
Member Author

Synced state:

          realm d01f13a6-45ad-45fb-8b0b-cd03a05382c8 (gold)
      zonegroup 6b1b12f5-051b-4c81-b6bf-b5f3ba82bba2 (us)
           zone 6ad3f4a2-afed-49e8-8eae-1c92b83c46c4 (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 56301478-7c86-4c66-ab96-e85bcaf6d6e6 (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Syncing state

      zonegroup 6b1b12f5-051b-4c81-b6bf-b5f3ba82bba2 (us)
           zone 6ad3f4a2-afed-49e8-8eae-1c92b83c46c4 (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is behind on 3 shards
                oldest incremental change not applied: 2017-01-13 16:24:39.0.337583s
      data sync source: 56301478-7c86-4c66-ab96-e85bcaf6d6e6 (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Also make the sync output look similar to the output of data sync
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
@theanalyst
Copy link
Member Author

test this please

@theanalyst theanalyst changed the title wip: rgw: sync status compares the current master period rgw: sync status compares the current master period Jan 16, 2017
@cbodley cbodley merged commit a18c7e1 into ceph:master Jan 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants