New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgwlc: permit lifecycle to reduce data conditionally in archive zone #46928
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test cases!
a5af127
to
eca1cf1
Compare
@soumyakoduri @dang could you check over the zipper part of this? |
@@ -2994,6 +2994,11 @@ const std::string& RadosZone::get_realm_id() | |||
return store->svc()->zone->get_realm().get_id(); | |||
} | |||
|
|||
const std::string& RadosZone::get_tier_type() | |||
{ | |||
return store->svc()->zone->get_zone().tier_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this tier type relate to the one in RGWZoneGroupPlacementTier (exposed by PlacementTier)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this zone tier_type
specifies which multisite 'sync module' to run for that zone: archive, log, elasticsearch, cloud, pubsub, or rgw (default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it's not the same, unfortunately, a bit confusing
src/rgw/rgw_sal_motr.h
Outdated
@@ -457,7 +457,7 @@ class MotrZone : public Zone { | |||
virtual const RGWAccessKey& get_system_key() { return zone_params->system_key; } | |||
virtual const std::string& get_realm_name() { return realm->get_name(); } | |||
virtual const std::string& get_realm_id() { return realm->get_id(); } | |||
|
|||
virtual const std::string& get_tier_type() { return "fixme"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem (in principle) with returning a fixed string here, but maybe it should be the default tier type (whatever that is...) rather than "fixme"? @soumyakoduri what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for dbstore, i see that DBZone
has access to this via zone_public_config->tier_type
, but i'm not sure that ever gets initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, I'd propose to return "rgw" rather than "fixme" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes DBZone & MotrZone are just place-holders and are not properly initialized. Returning default value "rgw" seems fine to me (along with "fixme" comment maybe) for now.
auto lc_it = bci.attrs.find(RGW_ATTR_LC); | ||
if (lc_it != bci.attrs.end()) { | ||
ldpp_dout(dpp, 20) << "set lc config for " << bci.info.bucket.name << dendl; | ||
ret = lc->set_bucket_config(bucket.get(), bci.attrs, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this will copy lc_config to all the zones but not just archive_zone. Is it intentional? i.e, can every zone now execute LC rules (provided archivezone filter not applied)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been a long-standing issue, tracked in https://tracker.ceph.com/issues/44268 and https://tracker.ceph.com/issues/55487
the current model, where only the primary zone runs lifecycle processing, works okay for the 'normal' disaster recovery use case where the primary and secondary zones have the same set of data. any expirations/transitions that happen on the primary would sync to the secondary, so we should get the same result as if we'd run lifecycle processing on both zones
however, with per-bucket replication in the picture, zones may not have the same data sets. for example, a bucket replication policy may specify that only a subset of objects (like those beginning with prefix 'foo') should replicate from the secondary zone to the primary. lifecycle processing on the primary zone would only run on that subset of objects and skip the rest. so in this case, we really do need to run lifecycle processing on every zone to get the expected result
had we fixed this issue earlier, we would have broken the archive zone feature because lifecycle could delete the archived object versions. so we're fixing both at the same time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's a better answer than what I had in mind, as for full sync, it's probably not even as efficient to run lc everywhere--but still feels more correct for symmetry. but per-bucket replication does change things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay..thanks for confirming.
src/rgw/rgw_lc.h
Outdated
} | ||
|
||
// Determine if we need AND tag when creating xml | ||
bool has_multi_condition() const { | ||
if (obj_tags.count() > 1) | ||
if (obj_tags.count() + int(has_prefix()) > 1) // Prefix is a member of Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it include has_flags() as well? as ArchiveZone flag can be set even with one obj_tag as listed in this example -
https://github.com/ceph/ceph/pull/46928/files#diff-00cdaf79e8833ed31a361d881a37c7434678d84f916789f4e1bc5eab32b25427R50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I think so
7e8bc6c
to
81e9bbb
Compare
@dang do you know what's going on here? |
never
never mind, I need to change the signature of get_tier_type() |
rgwlc: add uint32_t flags bitmap to LCFilter This is intended to support a concise set of extensions to S3 LifecycleConfiguration, initially, just a flag that indicates a rule is intended for execution on RGW ArchiveZone. rgwlc: add machinery to define and recognize LCFilter flags Add a concept of filter flags to lifecycle filter rules, an RGW extension. The initial purpose of flags is to permit marking specific lifecycle rules as specific to an RGW archive zone, but other flags could be added in future. rgwlc: add new unittest_rgw_lc to run internal checks, add a few valid and invalid lifecycle configuration xml parses for now. Fixes: https://tracker.ceph.com/issues/53361 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Suggested by Casey in review, this makes the XML prettier. also: fix filter parsing, remove unused code Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Lifecycle rules with the ArchiveZone flag must execute on archive zones, but must not execute on others. Fixes: https://tracker.ceph.com/issues/56440 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
The basic idea of this change is the same as the proposal by Ilsoo Byun <ilsoobyun@linecorp.com>, but some details have changed. The main differences are to use the existing RGWLC::set(remove)_bucket_config methods, and to use the RGWBucketInstanceMetadataHandler infrastructue to dispatch the corresponding calls. Thank you! Fixes: https://tracker.ceph.com/issues/44268 Related PR: ceph#33524 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Also updates the location of Prefix, which is supposed to *generate* as <Filter><Prefix/></Filter>, regardless of how we parsed it. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Found by Soumya Koduri <skoduri@redhat.com> Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Valid values are all small strings, often static. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
fddcc17
to
ff10064
Compare
I think this was a pass, however, 1) no specific s3-tests for this feature and 2) this test run was generally suspicious: http://qa-proxy.ceph.com/teuthology/mbenjamin-2022-07-17_22:24:15-rgw-wip-rgwlc-azone-distro-default-smithi/6935342/teuthology.log |
looks ok to merge. i've raised https://tracker.ceph.com/issues/56500 to urgent priority |
i also opened https://tracker.ceph.com/issues/56609 to track the s3test failures, which appear to be timeouts from very long delays |
if (ret < 0) { | ||
ldpp_dout(dpp, 0) << __func__ << " failed to set lc config for " | ||
<< bci.info.bucket.name | ||
<< dendl; | ||
return ret; | ||
} | ||
|
||
} else { | ||
ldpp_dout(dpp, 20) << "remove lc config for " << bci.info.bucket.name << dendl; | ||
ret = lc->remove_bucket_config(bucket.get(), bci.attrs); | ||
if (ret < 0) { | ||
ldpp_dout(dpp, 0) << __func__ << " failed to remove lc config for " | ||
<< bci.info.bucket.name | ||
<< dendl; | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error path seems to be causing the regression in metadata sync. i've opened https://tracker.ceph.com/issues/56997 with what we've learned so far
cc @yuvalif
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows