Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/SnapMapper: fix legacy key conversion in snapmapper class #46908

Merged
merged 1 commit into from Jul 15, 2022

Conversation

mlausch
Copy link
Contributor

@mlausch mlausch commented Jun 30, 2022

Octopus modified the SnapMapper key format from

<LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>

to

<MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>

When this change was introduced, 94ebe0e also introduced a conversion
with a crucial bug which essentially destroyed legacy keys by mapping them
to

<MAPPING_PREFIX><poolid>_<snapid>_

without the object-unique suffix. This commit fixes this conversion going
forward, but a fix for existing clusters still needs to be developed.

Unit test result:

[ RUN      ] SnapMapperTest.Legacy
 Converted: MAP_0000000000000008_.1_0000000000000000.E3826231.138.srumroypjftinm.gbdloxfphi.dzitxoz
 To:        SNA_0_0000000000000008_.1_0000000000000000.E3826231.138.srumroypjftinm.gbdloxfphi.dzitxoz 
 New key:   SNA_0_0000000000000008_.1_0000000000000000.E3826231.138.srumroypjftinm.gbdloxfphi.dzitxoz
[       OK ] SnapMapperTest.Legacy (0 ms)

Fixes: https://tracker.ceph.com/issues/56147
Signed-off-by: Manuel Lausch manuel.lausch@1und1.de
Signed-off-by: Matan Breizman mbreizma@redhat.com

@mlausch mlausch requested a review from a team as a code owner June 30, 2022 12:40
@github-actions github-actions bot added the core label Jun 30, 2022
@ronen-fr ronen-fr requested review from aclamk and Matan-B June 30, 2022 13:43
@Matan-B
Copy link
Contributor

Matan-B commented Jul 3, 2022

Introduced here: 94ebe0e
We should also be able to convert already (Octopus) updated clusters since the conversion here only applies upon running Octopus for the first time. (e.g calling convert_legacy() when finding old structured SnapMapper key)

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlausch
Copy link
Contributor Author

mlausch commented Jul 4, 2022

Hi @Matan-B

The commit / PR title should be prefixed with osd: ... , or do I miss something else?

Do you think there could old structures remaining after successful conversion? As far as I see, after successful conversion the superblock will be updated.
Anyway. I fear more complex changes will exceed my coding skills. It would be nice If you or someone else could support me with this.

@Matan-B
Copy link
Contributor

Matan-B commented Jul 5, 2022

Hi @Matan-B

The commit / PR title should be prefixed with osd: ... , or do I miss something else?

Yes, osd/SnapMapper:

Do you think there could old structures remaining after successful conversion? As far as I see, after successful conversion the superblock will be updated.

I'm afraid that clone objects created before upgrading may not be deleted correctly, as the tracker implies.
While this patch will result in successful conversion for future upgrades, already updated clusters should also be able to delete (old) clone objects.

Anyway. I fear more complex changes will exceed my coding skills. It would be nice If you or someone else could support me with this.

I'm looking into this and I'm available for any question! 👍

@mlausch mlausch changed the title RADOS: fix legacy key conversion in snapmapper class osd/SnapMapper: fix legacy key conversion in snapmapper class Jul 5, 2022
@mlausch
Copy link
Contributor Author

mlausch commented Jul 5, 2022

I'm afraid that clone objects created before upgrading may not be deleted correctly, as the tracker implies.
While this patch will result in successful conversion for future upgrades, already updated clusters should also be able to delete (old) clone objects.

Ah yes, I see your point. Unfortunately the faulty conversion deleted the old MAP_ entries. So I'm not quite sure how to recover this.
And I think the "repair" should only run once if the error is present. Iterating over the whole omap, or something else, just in case, might slow down the startup.

@Matan-B Matan-B self-requested a review July 6, 2022 08:09
@Matan-B
Copy link
Contributor

Matan-B commented Jul 11, 2022

jenkins test make check

@Matan-B
Copy link
Contributor

Matan-B commented Jul 11, 2022

Upgrade suite Teuthology runs:
https://gist.github.com/Matan-B/9913f9bf413f7805d7a52dac22a589c1

@ifed01 ifed01 self-requested a review July 11, 2022 15:47
src/osd/SnapMapper.cc Show resolved Hide resolved
@github-actions github-actions bot added the tests label Jul 13, 2022
@rzarzynski
Copy link
Contributor

@athanatos: do the updated commits addresses the worry for having the fix in octopus?

@athanatos
Copy link
Contributor

athanatos commented Jul 13, 2022

Squash all three commits to a single commit. In general (and particularly since we need to backport this) the commit message needs to do a good job of explaining what the commit does. Since it's a backport it also must include the Fixes line in the commit message itself. How about:

osd/SnapMapper: fix pacific legacy key conversion and introduce test

Pacific modified the SnapMapper key format from

  <LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>

to

  <MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>

When this change was introduced, 94ebe0ea also introduced a conversion
with a crucial bug which essentially destroyed legacy keys by mapping them
to

  <MAPPING_PREFIX><poolid>_<snapid>_

without the object-unique suffix.  This commit fixes this conversion going
forward, but a fix for existing clusters still needs to be developed.

Fixes: https://tracker.ceph.com/issues/56147
Signed-off-by: Manuel Lausch <manuel.lausch@1und1.de>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>

edit: had the version wrong

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty close, I suggested an explanatory comment, a name change, and the commit message needs to be fixed.

src/osd/SnapMapper.cc Show resolved Hide resolved
src/osd/SnapMapper.cc Outdated Show resolved Hide resolved
src/osd/SnapMapper.cc Show resolved Hide resolved
src/test/test_snap_mapper.cc Outdated Show resolved Hide resolved
@athanatos athanatos requested review from neha-ojha and removed request for ifed01 and aclamk July 13, 2022 22:44
@athanatos
Copy link
Contributor

athanatos commented Jul 13, 2022

@neha-ojha Given that we'll want to backport this relatively quickly, can you look over it as well?

Octopus modified the SnapMapper key format from

  <LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>

to

  <MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>

When this change was introduced, 94ebe0e also introduced a conversion
with a crucial bug which essentially destroyed legacy keys by mapping them
to

  <MAPPING_PREFIX><poolid>_<snapid>_

without the object-unique suffix.  This commit fixes this conversion going
forward, but a fix for existing clusters still needs to be developed.

Fixes: https://tracker.ceph.com/issues/56147
Signed-off-by: Manuel Lausch <manuel.lausch@1und1.de>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B Matan-B requested a review from athanatos July 14, 2022 08:18
Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will run it through testing and we'll include the octopus backport of this patch in the final octopus point release, if testing goes fine.

@neha-ojha
Copy link
Member

Upgrade suite Teuthology runs: https://gist.github.com/Matan-B/9913f9bf413f7805d7a52dac22a589c1

@Matan-B Please raise a separate PR for the teuthology test you added to catch this bug

@ljflores
Copy link
Contributor

jenkins test windows

@ljflores
Copy link
Contributor

ljflores commented Jul 15, 2022

https://pulpito.ceph.com/?branch=snapshot_key_conversion

Unrelated Failures:
1. https://tracker.ceph.com/issues/56575 -- new Tracker
2. https://tracker.ceph.com/issues/56384
3. https://tracker.ceph.com/issues/55853
4. https://tracker.ceph.com/issues/56573 -- new Tracker
5. https://tracker.ceph.com/issues/53789
6. https://tracker.ceph.com/issues/53939
7. https://tracker.ceph.com/issues/56574 -- new Tracker
8. https://tracker.ceph.com/issues/45423

Details:
1. test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fails from "method lock.get_info tried to update object but is not marked WR" - Ceph - RADOS
2. ceph/test.sh: check_response erasure-code didn't find erasure-code in output - Ceph - CephFS
3. test_cls_rgw.sh: failures in 'cls_rgw.index_list' and 'cls_rgw.index_list_delimited` - Ceph - RGW
4. test_cephadm.sh: KeyError: 'TYPE' - Ceph - Orchestrator
5. CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados.TestWatchNotify.test_aio_notify to fail - Ceph - RADOS
6. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Rook
7. rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in cluster log - Ceph - RADOS
8. api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite - Ceph - RADOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants