Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw/archive: Disable logging for archive zone #50676

Merged
merged 2 commits into from May 4, 2023

Conversation

soumyakoduri
Copy link
Contributor

@soumyakoduri soumyakoduri commented Mar 26, 2023

Since archive zone is not used as source for the incremental sync, disable its data logging. In addition, correct bucket sync status cmd ouput for the zones with archive zone listed as source.

Fixes: https://tracker.ceph.com/issues/59106
Signed-off-by: Soumya Koduri skoduri@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@soumyakoduri soumyakoduri requested a review from a team as a code owner March 26, 2023 17:16
@github-actions github-actions bot added the rgw label Mar 26, 2023
@soumyakoduri soumyakoduri requested review from yehudasa, cbodley and mattbenjamin and removed request for a team March 26, 2023 17:16
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks plausible to me; can you coordinate with me on downstream backports for testing?

@soumyakoduri
Copy link
Contributor Author

this looks plausible to me; can you coordinate with me on downstream backports for testing?

sure.. Also please note that this change doesn't handle deleting the logs (if any created in the old release or mixed-mode clusters). They need to be cleaned up manually.

@soumyakoduri
Copy link
Contributor Author

soumyakoduri commented Mar 27, 2023

@mattbenjamin @cbodley Other thing I noticed while working on this PR is that archive zone creates multiple versions of the same object depending on the number of zones configured.

For eg., if there are two zones z1, z2 (apart from the archive-zone) configured, when an object ('test-object') is created, archive zone seems to be replicating the object from both the zones and storing two separate copies (though with different version-ids) .. This seems incorrect and a bug to me. Based on etag/src_mtime, it should have ideally filtered the objects already sync'ed from other zones.

Can you please confirm.

@cbodley
Copy link
Contributor

cbodley commented Mar 27, 2023

Based on etag/src_mtime, it should have ideally filtered the objects already sync'ed from other zones.

in normal sync, the destination zone sends its current object mtime in the If-Modified-Since header to avoid duplicating sync from multiple source zones

it looks like that logic is broken by RGWArchiveDataSyncModule::sync_object(), which generates a random version id for the destination object; so RGWObjFetchCR would try to look up the mtime for that version and fail, so wouldn't be able to send a If-Modified-Since header

Comment on lines 1366 to 1367
if (zone.tier_type == "archive") {
zone.log_data = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but two comments:

  • in existing deployments, this wouldn't take effect until a radosgw-admin zone modify calls this function
  • ideally this would use RGWSyncModulesManager::supports_data_export(zone.tier_type) so we do the right thing for other sync modules like elasticsearch and cloud sync too

unfortunately, i think that sync module stuff is specific to rados; so maybe this should all be in RGWSyncPolicyCompat::convert_old_sync_config(), where we have access to the 'sync module service'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* in existing deployments, this wouldn't take effect until a `radosgw-admin zone modify` calls this function

Right..Thanks! Initially I tried doing this check as part of RGWSI_Zone::do_start() using RGWSyncModulesManager::supports_data_export() but that doesn't update the zone params stored at the backend.
So for existing deployments, do you suggest we need these changes in RGWSI_Zone::do_start() as well (in case radosgw-admin zone modify is not executed)? Or can we update release notes to execute the cmd post upgrade?

* ideally this would use `RGWSyncModulesManager::supports_data_export(zone.tier_type)` so we do the right thing for other sync modules like elasticsearch and cloud sync too

unfortunately, i think that sync module stuff is specific to rados; so maybe this should all be in RGWSyncPolicyCompat::convert_old_sync_config(), where we have access to the 'sync module service'?

Ok. I observed that at times RGWSyncModulesManager may not be initialized by the time we want to use supports_data_export(..).. I will recheck in this code-path and update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about leaving the zone's log_data flag alone, and relying exclusively on RGWBucketSyncPolicyHandler::bucket_exports_data() to do the right thing? that would make things easier on existing users

these services are a mess of circular dependencies :( from RGWServices_Def::init(), it looks like the order is zone_svc -> sync_modules_svc -> bucket_sync_sobj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about leaving the zone's log_data flag alone, and relying exclusively on RGWBucketSyncPolicyHandler::bucket_exports_data() to do the right thing? that would make things easier on existing users

yes..I initially started with those changes and it helps to disable the logging internally. But bucket sync status and zone_params->log_data still shows incorrect values, which did not seem right. If that is acceptable, I will redo those changes.

these services are a mess of circular dependencies :( from RGWServices_Def::init(), it looks like the order is zone_svc -> sync_modules_svc -> bucket_sync_sobj

Thanks for confirming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for bucket sync status, i see that bucket_sync_status() is using RGWBucketSyncPolicyHandler::bucket_imports_data() to tell whether the bucket has any sources. but bucket_source_sync_status() is using RGWZone::syncs_from(zone_name) which only consults the sync_from/sync_from_all fields. maybe we should use bucket_exports_data() there instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. log_index_operation() in cls/rgw/cls_rgw.cc runs on the osd side, and bool log_op is a parameter for the bucket index operations in cls/rgw/cls_rgw_client.h; so we'd use bucket_exports_data() on the rgw side to determine what to pass for log_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley @smanjara The changes updated now disables both datalog and bilog for the archive zone. Kindly review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was hoping that we could add the per-bucket control as well here (using bucket_exports_data() instead of need_to_log_data()), but this looks correct 👍

@smanjara should we follow up on the per-bucket part later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley sure we could follow up with another pr.

@soumyakoduri
Copy link
Contributor Author

Based on etag/src_mtime, it should have ideally filtered the objects already sync'ed from other zones.

in normal sync, the destination zone sends its current object mtime in the If-Modified-Since header to avoid duplicating sync from multiple source zones

it looks like that logic is broken by RGWArchiveDataSyncModule::sync_object(), which generates a random version id for the destination object; so RGWObjFetchCR would try to look up the mtime for that version and fail, so wouldn't be able to send a If-Modified-Since header

Thanks! Will file a tracker to fix it then.

@smanjara
Copy link
Contributor

Based on etag/src_mtime, it should have ideally filtered the objects already sync'ed from other zones.

in normal sync, the destination zone sends its current object mtime in the If-Modified-Since header to avoid duplicating sync from multiple source zones
it looks like that logic is broken by RGWArchiveDataSyncModule::sync_object(), which generates a random version id for the destination object; so RGWObjFetchCR would try to look up the mtime for that version and fail, so wouldn't be able to send a If-Modified-Since header

Thanks! Will file a tracker to fix it then.

thanks soumya. seems to be coming from my change in bcea963

@soumyakoduri
Copy link
Contributor Author

thanks soumya. seems to be coming from my change in bcea963

Thanks Shilpa. Raised https://tracker.ceph.com/issues/59184 for the same.

Comment on lines 2512 to 2513
if (!zone.syncs_from(source.name) ||
(!static_cast<rgw::sal::RadosStore*>(driver)->svc()->zone->zone_syncs_from(static_cast<rgw::sal::RadosStore*>(driver)->svc()->zone->get_zone(), source))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!driver->svc()->zone->zone_syncs_from(zone, source)) {
  • driver is already a RadosStore* so we don't need the casts
  • RGWSI_Zone::zone_syncs_from() already checks target_zone.syncs_from(source_zone)
  • driver->svc()->zone->get_zone() is just zone here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! addressed it.

@soumyakoduri
Copy link
Contributor Author

jenkins test windows

Zones (such as archive zone) which do not export data should have
sync logging disabled.

Fixes# https://tracker.ceph.com/issues/59106

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
@soumyakoduri
Copy link
Contributor Author

jenkins test make check

@soumyakoduri
Copy link
Contributor Author

jenkins test make check arm64

@soumyakoduri
Copy link
Contributor Author

jenkins test make check arm64

@soumyakoduri soumyakoduri merged commit 9d61a2e into ceph:main May 4, 2023
10 of 11 checks passed
@soumyakoduri soumyakoduri mentioned this pull request May 23, 2023
14 tasks
@soumyakoduri soumyakoduri mentioned this pull request May 29, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants