jewel: rgw: multisite: sync status reports master is on a different period #13175

smithfarm · 2017-01-29T09:45:53Z

http://tracker.ceph.com/issues/18684

This ensures that we get the current period in contrast to the admin log which gets the master's earliest period. Fixes: http://tracker.ceph.com/issues/18064 Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> (cherry picked from commit 4ca18df)

This is needed for rgw admin's sync status or else we end up always publishing that we're behind since we are always checking against master's first period to sync from Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> (cherry picked from commit 063c949)

Also make the sync output look similar to the output of data sync Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> (cherry picked from commit cc306c5)

smithfarm · 2017-02-01T22:55:36Z

This passed RGW runs at http://tracker.ceph.com/issues/17851#note-50 and http://tracker.ceph.com/issues/17851#note-58

smithfarm · 2017-02-02T10:02:05Z

(11:46:45 AM) smithfarm: owasserm: thanks. For jewel integration rgw, then, what it comes down to is verifying that these 6 valgrind failures are all libtcmalloc-related: http://pulpito.front.sepia.ceph.com/smithfarm-2017-01-31_12:35:14-rgw-wip-jewel-backports-distro-basic-smithi/
(11:46:58 AM) smithfarm: owasserm: I will do that now
(11:47:05 AM) owasserm: smithfarm, thanks
(11:47:33 AM) smithfarm: owasserm: and assuming they are tcmalloc related, you said I can directly merge all the rgw PRs? Or do you want me to ask you for review in the PRs first?
(11:47:53 AM) owasserm: smithfarm, yes you can merge them
(11:48:19 AM) smithfarm: ok, will merge and do at least one or two more rgw runs before passing 10.2.6 to QE

dbiazus · 2017-03-12T02:41:51Z

After running 10.2.6 We still reproducing the same issue, when changing a secondary zone to master zone.

Steps to reproduce:

On secondary cluster:
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

At this point the secondary cluster is working well as master.

After the failed (old master) cluster is back:

radosgw-admin period pull --url={url-to-master-zone-gateway} --access-key={access-key} --secret={secret}
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And now, We remove the flag master from the Secondary cluster:

radosgw-admin zone modify --rgw-zone={zone-name} --master=false
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And then, running the command "radosgw-admin sync status" on secondary cluster, we have:

master is on a different period: master_period=0c31136b-f2bd-402a-a65d-ed03a1956683 local_period=abd599c5-36d4-492e-a2f5-2a10eb2b6a93

Thanks!

smithfarm · 2017-03-12T09:25:28Z

@dbiazus Are you saying that http://tracker.ceph.com/issues/18064 is still reproducible in Jewel 10.2.6?

dbiazus · 2017-03-12T11:52:01Z

Yes, I'm able to reproduce the same behaviour in Jewel 10.2.6, even in a fresh install

smithfarm · 2017-03-12T15:31:42Z

@theanalyst Ping - see the comments above.

dbiazus · 2017-04-10T12:13:09Z

I could also reproduce this in Kraken:
ceph version 11.2.0 (f223e27)


          realm 26184571-bb6f-4b7b-8c71-a0d3d7750090 (am)
      zonegroup c1eb737c-d9b0-4850-a951-e1f4f223ebec (us)
           zone 5f945856-f27a-4a33-80c9-0d7f226a4acc (ca-central-2)
  metadata sync syncing
                full sync: 0/64 shards
                master is on a different period: master_period=69ab3325-f996-4fcb-a10e-2ce501c5b4ee local_period=ec726bd1-ca84-4db4-9a09-f53dc52499cc
                metadata is caught up with master
                incremental sync: 64/64 shards
      data sync source: c066f644-1c72-4b1f-8a6a-f6aff1237c09 (ca-central-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Best Regards

theanalyst · 2017-04-10T13:58:18Z

@dbiazus thanks, will check out

cbodley · 2017-04-10T14:22:50Z

@dbiazus i'm guessing that your issues are related to http://tracker.ceph.com/issues/18639, which tracks some problems with metadata sync across master changes. if that's the case, then radosgw-admin sync status is correctly reporting that it's on an old period, and it's not due this bug in radosgw-admin

to verify, run the non-master gateway with --debug-rgw=20, wait a few minutes, then search for the last occurrence of RGWMetaSyncCR on in its log. if it says RGWMetaSyncCR on period=X, next=Y, then you're experiencing the issues in http://tracker.ceph.com/issues/18639. if it says RGWMetaSyncCR on current period=X, then you are actually reproducing this radosgw-admin bug in http://tracker.ceph.com/issues/18064

dbiazus · 2017-04-28T17:05:54Z

hey @cbodley, I'm little confused here:

Running "radosgw-admin sync status" on non-master I got:

          realm c3278efc-56dc-4d1f-b8f2-0693400dddda (am)
      zonegroup 57ae1d4c-e653-4179-90bc-fdda7cde7baa (us)
           zone 97a1eea8-9c9e-4a91-a40d-cf0a7b659d06 (stage-ca-central-2)
  metadata sync syncing
                full sync: 0/64 shards
                master is on a different period: master_period=36420dbb-f33b-475a-99cb-3ce0f41cbda7 local_period=6fdf4373-9873-4da1-9403-10f4f0578461
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 47cd13dc-629b-409d-9b20-746a529e273b (stage-ca-central-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

However, when I run "radosgw-admin period get-current" the current period is:

{
    "current_period": "36420dbb-f33b-475a-99cb-3ce0f41cbda7"
}

And last occurrence of RGWMetaSyncCR tells me that the current period is actually "6fdf4373-9873-4da1-9403-10f4f0578461"

2017-04-28 16:56:45.748164 7f6e17fff700 20 cr:s=0x7f6e18039140:op=0x7f6e180387b0:26RGWReadSyncStatusCoroutine: operate()
2017-04-28 16:56:45.748167 7f6e17fff700 20 run: stack=0x7f6e18039140 is done
2017-04-28 16:56:45.748176 7f6e17fff700 20 rgw meta sync: run_sync(): sync
2017-04-28 16:56:45.748241 7f6e17fff700 20 cr:s=0x7f6e1800b090:op=0x7f6e180387b0:13RGWMetaSyncCR: operate()
2017-04-28 16:56:45.748259 7f6e17fff700 10 rgw meta sync: RGWMetaSyncCR on current period=6fdf4373-9873-4da1-9403-10f4f0578461

Any idea ?

Thanks!

theanalyst added 3 commits January 29, 2017 10:45

rgw_admin: read master log shards from master's current period

2cb0307

Also make the sync output look similar to the output of data sync Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com> (cherry picked from commit cc306c5)

smithfarm self-assigned this Jan 29, 2017

smithfarm added this to the jewel milestone Jan 29, 2017

smithfarm added bug-fix core labels Jan 29, 2017

smithfarm changed the title ~~jewel: multisite: sync status reports master is on a different period~~ jewel: rgw: multisite: sync status reports master is on a different period Jan 29, 2017

smithfarm merged commit f46c125 into ceph:jewel Feb 1, 2017

smithfarm deleted the wip-18684-jewel branch February 1, 2017 22:55

smithfarm mentioned this pull request Sep 20, 2015

doc: do not promise backports to Dumpling #5994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jewel: rgw: multisite: sync status reports master is on a different period #13175

jewel: rgw: multisite: sync status reports master is on a different period #13175

smithfarm commented Jan 29, 2017

smithfarm commented Feb 1, 2017

smithfarm commented Feb 2, 2017

dbiazus commented Mar 12, 2017 •

edited

smithfarm commented Mar 12, 2017

dbiazus commented Mar 12, 2017

smithfarm commented Mar 12, 2017

dbiazus commented Apr 10, 2017 •

edited

theanalyst commented Apr 10, 2017

cbodley commented Apr 10, 2017

dbiazus commented Apr 28, 2017

jewel: rgw: multisite: sync status reports master is on a different period #13175

jewel: rgw: multisite: sync status reports master is on a different period #13175

Conversation

smithfarm commented Jan 29, 2017

smithfarm commented Feb 1, 2017

smithfarm commented Feb 2, 2017

dbiazus commented Mar 12, 2017 • edited

smithfarm commented Mar 12, 2017

dbiazus commented Mar 12, 2017

smithfarm commented Mar 12, 2017

dbiazus commented Apr 10, 2017 • edited

theanalyst commented Apr 10, 2017

cbodley commented Apr 10, 2017

dbiazus commented Apr 28, 2017

dbiazus commented Mar 12, 2017 •

edited

dbiazus commented Apr 10, 2017 •

edited