osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

markhpc · 2024-04-16T14:15:35Z

Back in PR #11701 when EC Overwrites were added, there was significant churn in the ECTransaction code. Several asserts were added in generate_transactions, but the code changed several times and it appears some of those asserts don't make sense in the final version of the PR.

This PR removes two of those asserts. The first doesn't appear to be necessary given how the surrounding conditional block changed. The second appears to be an incorrect assert that was left over from an earlier commit that no longer should be in place given the logic that was added to handle truncate (which roughly mirrors what we do in ReplicatedBackend).

It's possible the assert for entry and/or obc are also unnecessary, however I left those in place for now.

Fixes: https://tracker.ceph.com/issues/65509

Back in PR ceph#11701 when EC Overwrites were added, there was significant churn in the ECTransaction code. Several asserts were added in generate_transactions, but the code changed several times and it appears some of those asserts don't make sense in the final version of the PR. This PR removes two of those asserts. The first doesn't appear to be necessary given how the surrounding conditional block changed. The second appears to be an incorrect assert that was left over from an earlier commit that no longer should be in place given the logic that was added to handle truncate (which roughly mirrors what we do in ReplicatedBackend). It's possible the assert for obc and entry are also unnecessary, however I left those in place for now. Fixes: https://tracker.ceph.com/issues/65509 Signed-off-by: Mark Nelson <mark.nelson@clyso.com>

jmlowe · 2024-04-16T14:48:12Z

This is currently impacting our operations, we have 2 pools inoperable because of it. It appears to be the second time this bug has taken down our cluster following a full cluster wide recovery.

athanatos

Yep, that looks right to me.

ifed01 · 2024-04-17T10:16:49Z

@athanatos - do we really need QA for changes like that?

athanatos · 2024-04-17T17:50:48Z

@ifed01 For this change specifically, no. It's the removal of an assert and so can't actually create a failure condition. However, I prefer that core contributions go through testing regardless unless there's a specific, pressing need to merge without testing. Is there such a pressing need here? This bug is very old.

@yuriw Can we get this into the next testing batch?

markhpc · 2024-04-17T18:19:24Z

@ifed01 For this change specifically, no. It's the removal of an assert and so can't actually create a failure condition. However, I prefer that core contributions go through testing regardless unless there's a specific, pressing need to merge without testing. Is there such a pressing need here? This bug is very old.

@athanatos I won't comment on whether or not we've reached that threshold, but here's the failure condition that Mike mentioned in slack that might inform that decision:

So it looks like if you have clients that do discard/trim on an erasure
coded pool you have a cluster time bomb and we really kind of need
#56924.  We can reliably crash osd’s by issuing fstrim in a vm against
an EC rbd device.

jmlowe · 2024-04-17T18:28:41Z

The most confounding part of this is the affected cluster has been in operation for 3 years without tripping this bug. I don't know what has changed in client behavior such that steady state operations triggered the bug but it caused a 52 hour outage until we could recover the cluster and stay healthy by forcibly disabling trim/discards from clients.

rzarzynski · 2024-04-17T18:38:05Z

@athanatos - do we really need QA for changes like that?

I think we should convey also negligible-risk changes through QA, even for the sake of honoring the workflow.
What's worth doing in such cases IMHO is to explicitly say to @yuriw or @ljflores a PR is of ~0 risk, so they can take it to any testing branch without consuming its regular slots.

The most confounding part of this is the affected cluster has been in operation for 3 years without tripping this bug. I don't know what has changed in client behavior such that steady state operations triggered the bug but it caused a 52 hour outage until we could recover the cluster and stay healthy by forcibly disabling trim/discards from clients.

OK, I added information about the workaround to the tracker. The ticket is High (but not Urgent), so let's backport this without delays but let's also have the due QA.

rzarzynski · 2024-04-17T18:44:05Z

src/osd/ECTransaction.cc

@@ -193,9 +193,6 @@ void ECTransaction::generate_transactions(
      xattr_rollback[ECUtil::get_hinfo_key()] = old_hinfo;

      if (op.is_none() && op.truncate && op.truncate->first == 0) {
-	ceph_assert(op.truncate->first == 0);
-	ceph_assert(op.truncate->first ==
-	       op.truncate->second);
 	ceph_assert(entry);
 	ceph_assert(obc);


A dozen lines below there is the if (obc) which seems to be always taken. Maybe it's worth a follow-up?

@rzarzynski yeah, I considered the same thing, but ultimately decided to leave it for a future PR.

fdebotfairbanks · 2024-04-19T21:39:48Z

Hi, I was also involved in finding this. I'm trying to have it reproduced, but did not quite succeed.

In the logs I find this a notable log:

-73> 2024-04-16T15:36:23.282+0000 7b9120657700 10 osd.262 pg_epoch: 1767514 pg[27.fbs1( v 1767399'84791252 lc 1767366'84791250 (1762977'84782620,1767399'84791252] local-lis/les=1767513/1767514 n=115845 ec=1570233/1419873 lis/c=1767513/1762264 les/c/f=1767514/1762265/0 sis=1767513) [NONE,262,737,609,322,314,57,275,155,464]p262(1) r=1 lpr=1767513 pi=[1762264,1767513)/2 crt=1767399'84791252 lcod 0'0 mlcod 0'0 active+undersized+degraded m=2 mbc={0={(0+0)=2},1={(0+0)=2},2={(1+0)=2},3={(1+0)=2},4={(1+0)=2},5={(1+0)=2},6={(1+0)=2},7={(1+0)=2},8={(1+0)=2},9={(1+0)=2}}] new_repop rep_tid 0 on osd_op(client.10169051886.0:19946 27.fbs1 27:df5e71b0:::rbd_data.3.abac3fb624614b.0000000000000520:head [call rbd.sparse_copyup in=24b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 0,truncate 1048576] snapc 0=[] RETRY=47 ondisk+retry+write+known_if_redirected+supports_pool_eio e1767513) v8

The call rbd.sparse_copyup in=24b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 0,truncate 1048576 in particular. The reproducing steps are: having a qemu vm talk with rbd with discard=unmap enabled, writing data and fstrim'ing the fs.
In the logs the in my reproduction tries I could only get a single truncate mentioned, like [call rbd.sparse_copyup in=7340072b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 7340032]

Perhaps someone has an idea what could be missing here to fully reproduce it.

amathuria · 2024-05-03T07:18:29Z

Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues65560

markhpc added bug-fix core labels Apr 16, 2024

markhpc requested review from athanatos and rzarzynski April 16, 2024 14:15

markhpc requested a review from a team as a code owner April 16, 2024 14:15

athanatos approved these changes Apr 16, 2024

View reviewed changes

athanatos added the needs-qa label Apr 16, 2024

rzarzynski reviewed Apr 17, 2024

View reviewed changes

yuriw added the wip-yuri5-testing label Apr 17, 2024

yuriw merged commit 9a41cd7 into ceph:main May 3, 2024
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

markhpc commented Apr 16, 2024

jmlowe commented Apr 16, 2024

athanatos left a comment

ifed01 commented Apr 17, 2024

athanatos commented Apr 17, 2024

markhpc commented Apr 17, 2024 •

edited

jmlowe commented Apr 17, 2024

rzarzynski commented Apr 17, 2024 •

edited

rzarzynski Apr 17, 2024

markhpc Apr 17, 2024

fdebotfairbanks commented Apr 19, 2024

amathuria commented May 3, 2024

osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

Conversation

markhpc commented Apr 16, 2024

jmlowe commented Apr 16, 2024

athanatos left a comment

Choose a reason for hiding this comment

ifed01 commented Apr 17, 2024

athanatos commented Apr 17, 2024

markhpc commented Apr 17, 2024 • edited

jmlowe commented Apr 17, 2024

rzarzynski commented Apr 17, 2024 • edited

rzarzynski Apr 17, 2024

Choose a reason for hiding this comment

markhpc Apr 17, 2024

Choose a reason for hiding this comment

fdebotfairbanks commented Apr 19, 2024

amathuria commented May 3, 2024

markhpc commented Apr 17, 2024 •

edited

rzarzynski commented Apr 17, 2024 •

edited