Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgwlc: permit lifecycle to reduce data conditionally in archive zone #46928

Merged
merged 8 commits into from Jul 19, 2022

Conversation

mattbenjamin
Copy link
Contributor

@mattbenjamin mattbenjamin commented Jul 1, 2022

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@mattbenjamin mattbenjamin self-assigned this Jul 1, 2022
@mattbenjamin mattbenjamin changed the title Wip rgwlc azone [DNM] permit lifecycle to reduce data conditionally in archive zone Jul 1, 2022
@mattbenjamin mattbenjamin changed the title [DNM] permit lifecycle to reduce data conditionally in archive zone [DNM] rgwlc: permit lifecycle to reduce data conditionally in archive zone Jul 1, 2022
@mattbenjamin mattbenjamin requested a review from cbodley July 1, 2022 13:05
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test cases!

src/rgw/rgw_lc_s3.cc Outdated Show resolved Hide resolved
src/test/rgw/test_rgw_lc.cc Outdated Show resolved Hide resolved
@mattbenjamin
Copy link
Contributor Author

@soumyakoduri @dang could you check over the zipper part of this?

@@ -2994,6 +2994,11 @@ const std::string& RadosZone::get_realm_id()
return store->svc()->zone->get_realm().get_id();
}

const std::string& RadosZone::get_tier_type()
{
return store->svc()->zone->get_zone().tier_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this tier type relate to the one in RGWZoneGroupPlacementTier (exposed by PlacementTier)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this zone tier_type specifies which multisite 'sync module' to run for that zone: archive, log, elasticsearch, cloud, pubsub, or rgw (default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it's not the same, unfortunately, a bit confusing

@@ -457,7 +457,7 @@ class MotrZone : public Zone {
virtual const RGWAccessKey& get_system_key() { return zone_params->system_key; }
virtual const std::string& get_realm_name() { return realm->get_name(); }
virtual const std::string& get_realm_id() { return realm->get_id(); }

virtual const std::string& get_tier_type() { return "fixme"; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem (in principle) with returning a fixed string here, but maybe it should be the default tier type (whatever that is...) rather than "fixme"? @soumyakoduri what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dbstore, i see that DBZone has access to this via zone_public_config->tier_type, but i'm not sure that ever gets initialized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, I'd propose to return "rgw" rather than "fixme" :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes DBZone & MotrZone are just place-holders and are not properly initialized. Returning default value "rgw" seems fine to me (along with "fixme" comment maybe) for now.

auto lc_it = bci.attrs.find(RGW_ATTR_LC);
if (lc_it != bci.attrs.end()) {
ldpp_dout(dpp, 20) << "set lc config for " << bci.info.bucket.name << dendl;
ret = lc->set_bucket_config(bucket.get(), bci.attrs, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this will copy lc_config to all the zones but not just archive_zone. Is it intentional? i.e, can every zone now execute LC rules (provided archivezone filter not applied)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been a long-standing issue, tracked in https://tracker.ceph.com/issues/44268 and https://tracker.ceph.com/issues/55487

the current model, where only the primary zone runs lifecycle processing, works okay for the 'normal' disaster recovery use case where the primary and secondary zones have the same set of data. any expirations/transitions that happen on the primary would sync to the secondary, so we should get the same result as if we'd run lifecycle processing on both zones

however, with per-bucket replication in the picture, zones may not have the same data sets. for example, a bucket replication policy may specify that only a subset of objects (like those beginning with prefix 'foo') should replicate from the secondary zone to the primary. lifecycle processing on the primary zone would only run on that subset of objects and skip the rest. so in this case, we really do need to run lifecycle processing on every zone to get the expected result

had we fixed this issue earlier, we would have broken the archive zone feature because lifecycle could delete the archived object versions. so we're fixing both at the same time :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a better answer than what I had in mind, as for full sync, it's probably not even as efficient to run lc everywhere--but still feels more correct for symmetry. but per-bucket replication does change things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay..thanks for confirming.

src/rgw/rgw_lc.h Outdated
}

// Determine if we need AND tag when creating xml
bool has_multi_condition() const {
if (obj_tags.count() > 1)
if (obj_tags.count() + int(has_prefix()) > 1) // Prefix is a member of Filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it include has_flags() as well? as ArchiveZone flag can be set even with one obj_tag as listed in this example -
https://github.com/ceph/ceph/pull/46928/files#diff-00cdaf79e8833ed31a361d881a37c7434678d84f916789f4e1bc5eab32b25427R50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think so

@mattbenjamin mattbenjamin changed the title [DNM] rgwlc: permit lifecycle to reduce data conditionally in archive zone rgwlc: permit lifecycle to reduce data conditionally in archive zone Jul 15, 2022
@mattbenjamin mattbenjamin added needs-qa and removed DNM labels Jul 15, 2022
@mattbenjamin
Copy link
Contributor Author

@dang do you know what's going on here?
"Header issue in ../src/rgw/rgw_sal_dbstore.h
Indication 1"

@mattbenjamin
Copy link
Contributor Author

never

@dang do you know what's going on here? "Header issue in ../src/rgw/rgw_sal_dbstore.h Indication 1"

never mind, I need to change the signature of get_tier_type()

rgwlc: add uint32_t flags bitmap to LCFilter

This is intended to support a concise set of extensions to S3
LifecycleConfiguration, initially, just a flag that indicates a
rule is intended for execution on RGW ArchiveZone.

rgwlc: add machinery to define and recognize LCFilter flags

Add a concept of filter flags to lifecycle filter rules, an RGW
extension.  The initial purpose of flags is to permit marking
specific lifecycle rules as specific to an RGW archive zone, but
other flags could be added in future.

rgwlc: add new unittest_rgw_lc to run internal checks, add a few
valid and invalid lifecycle configuration xml  parses for now.

Fixes: https://tracker.ceph.com/issues/53361

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Suggested by Casey in review, this makes the XML prettier.

also: fix filter parsing, remove unused code

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Lifecycle rules with the ArchiveZone flag must execute on archive zones,
but must not execute on others.

Fixes: https://tracker.ceph.com/issues/56440

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
The basic idea of this change is the same as the proposal by
Ilsoo Byun <ilsoobyun@linecorp.com>, but some details have changed.

The main differences are to use the existing
RGWLC::set(remove)_bucket_config methods, and to use the
RGWBucketInstanceMetadataHandler infrastructue to dispatch
the corresponding calls.  Thank you!

Fixes: https://tracker.ceph.com/issues/44268
Related PR: ceph#33524

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Also updates the location of Prefix, which is supposed to *generate*
as <Filter><Prefix/></Filter>, regardless of how we parsed it.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Found by Soumya Koduri <skoduri@redhat.com>

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Valid values are all small strings, often static.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
@mattbenjamin
Copy link
Contributor Author

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Jul 18, 2022

I think this was a pass, however, 1) no specific s3-tests for this feature and 2) this test run was generally suspicious: http://qa-proxy.ceph.com/teuthology/mbenjamin-2022-07-17_22:24:15-rgw-wip-rgwlc-azone-distro-default-smithi/6935342/teuthology.log
I think I've proved there's no lc issue here, but there is a ton of valgrind InvalidRead of size 8, " Address 0x57f4fc90 is on thread 132's stack" in the valgrind log. If this is normal, maybe we should explicitly suppress it?

@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2022

looks ok to merge. i've raised https://tracker.ceph.com/issues/56500 to urgent priority

@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2022

i also opened https://tracker.ceph.com/issues/56609 to track the s3test failures, which appear to be timeouts from very long delays

@mattbenjamin mattbenjamin merged commit 9b92e15 into ceph:main Jul 19, 2022
Comment on lines +2394 to +2408
if (ret < 0) {
ldpp_dout(dpp, 0) << __func__ << " failed to set lc config for "
<< bci.info.bucket.name
<< dendl;
return ret;
}

} else {
ldpp_dout(dpp, 20) << "remove lc config for " << bci.info.bucket.name << dendl;
ret = lc->remove_bucket_config(bucket.get(), bci.attrs);
if (ret < 0) {
ldpp_dout(dpp, 0) << __func__ << " failed to remove lc config for "
<< bci.info.bucket.name
<< dendl;
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error path seems to be causing the regression in metadata sync. i've opened https://tracker.ceph.com/issues/56997 with what we've learned so far

cc @yuvalif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants