Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/ECTransaction: Remove incorrect asserts in generate_transactions #56924

Merged
merged 1 commit into from May 3, 2024

Conversation

markhpc
Copy link
Member

@markhpc markhpc commented Apr 16, 2024

Back in PR #11701 when EC Overwrites were added, there was significant churn in the ECTransaction code. Several asserts were added in generate_transactions, but the code changed several times and it appears some of those asserts don't make sense in the final version of the PR.

This PR removes two of those asserts. The first doesn't appear to be necessary given how the surrounding conditional block changed. The second appears to be an incorrect assert that was left over from an earlier commit that no longer should be in place given the logic that was added to handle truncate (which roughly mirrors what we do in ReplicatedBackend).

It's possible the assert for entry and/or obc are also unnecessary, however I left those in place for now.

Fixes: https://tracker.ceph.com/issues/65509

Back in PR ceph#11701 when EC Overwrites were added, there was significant churn in the ECTransaction code.  Several asserts were added in generate_transactions, but the code changed several times and it appears some of those asserts don't make sense in the final version of the PR.

This PR removes two of those asserts.  The first doesn't appear to be necessary given how the surrounding conditional block changed.   The second appears to be an incorrect assert that was left over from an earlier commit that no longer should be in place given the logic that was added to handle truncate (which roughly mirrors what we do in ReplicatedBackend).

It's possible the assert for obc and entry are also unnecessary, however I left those in place for now.

Fixes: https://tracker.ceph.com/issues/65509

Signed-off-by: Mark Nelson <mark.nelson@clyso.com>
@jmlowe
Copy link

jmlowe commented Apr 16, 2024

This is currently impacting our operations, we have 2 pools inoperable because of it. It appears to be the second time this bug has taken down our cluster following a full cluster wide recovery.

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that looks right to me.

@ifed01
Copy link
Contributor

ifed01 commented Apr 17, 2024

@athanatos - do we really need QA for changes like that?

@athanatos
Copy link
Contributor

@ifed01 For this change specifically, no. It's the removal of an assert and so can't actually create a failure condition. However, I prefer that core contributions go through testing regardless unless there's a specific, pressing need to merge without testing. Is there such a pressing need here? This bug is very old.

@yuriw Can we get this into the next testing batch?

@markhpc
Copy link
Member Author

markhpc commented Apr 17, 2024

@ifed01 For this change specifically, no. It's the removal of an assert and so can't actually create a failure condition. However, I prefer that core contributions go through testing regardless unless there's a specific, pressing need to merge without testing. Is there such a pressing need here? This bug is very old.

@athanatos I won't comment on whether or not we've reached that threshold, but here's the failure condition that Mike mentioned in slack that might inform that decision:

So it looks like if you have clients that do discard/trim on an erasure
coded pool you have a cluster time bomb and we really kind of need
#56924.  We can reliably crash osd’s by issuing fstrim in a vm against
an EC rbd device.

@jmlowe
Copy link

jmlowe commented Apr 17, 2024

The most confounding part of this is the affected cluster has been in operation for 3 years without tripping this bug. I don't know what has changed in client behavior such that steady state operations triggered the bug but it caused a 52 hour outage until we could recover the cluster and stay healthy by forcibly disabling trim/discards from clients.

@rzarzynski
Copy link
Contributor

rzarzynski commented Apr 17, 2024

@athanatos - do we really need QA for changes like that?

I think we should convey also negligible-risk changes through QA, even for the sake of honoring the workflow.
What's worth doing in such cases IMHO is to explicitly say to @yuriw or @ljflores a PR is of ~0 risk, so they can take it to any testing branch without consuming its regular slots.

The most confounding part of this is the affected cluster has been in operation for 3 years without tripping this bug. I don't know what has changed in client behavior such that steady state operations triggered the bug but it caused a 52 hour outage until we could recover the cluster and stay healthy by forcibly disabling trim/discards from clients.

OK, I added information about the workaround to the tracker. The ticket is High (but not Urgent), so let's backport this without delays but let's also have the due QA.

@@ -193,9 +193,6 @@ void ECTransaction::generate_transactions(
xattr_rollback[ECUtil::get_hinfo_key()] = old_hinfo;

if (op.is_none() && op.truncate && op.truncate->first == 0) {
ceph_assert(op.truncate->first == 0);
ceph_assert(op.truncate->first ==
op.truncate->second);
ceph_assert(entry);
ceph_assert(obc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dozen lines below there is the if (obc) which seems to be always taken. Maybe it's worth a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski yeah, I considered the same thing, but ultimately decided to leave it for a future PR.

@fdebotfairbanks
Copy link

Hi, I was also involved in finding this. I'm trying to have it reproduced, but did not quite succeed.

In the logs I find this a notable log:

-73> 2024-04-16T15:36:23.282+0000 7b9120657700 10 osd.262 pg_epoch: 1767514 pg[27.fbs1( v 1767399'84791252 lc 1767366'84791250 (1762977'84782620,1767399'84791252] local-lis/les=1767513/1767514 n=115845 ec=1570233/1419873 lis/c=1767513/1762264 les/c/f=1767514/1762265/0 sis=1767513) [NONE,262,737,609,322,314,57,275,155,464]p262(1) r=1 lpr=1767513 pi=[1762264,1767513)/2 crt=1767399'84791252 lcod 0'0 mlcod 0'0 active+undersized+degraded m=2 mbc={0={(0+0)=2},1={(0+0)=2},2={(1+0)=2},3={(1+0)=2},4={(1+0)=2},5={(1+0)=2},6={(1+0)=2},7={(1+0)=2},8={(1+0)=2},9={(1+0)=2}}] new_repop rep_tid 0 on osd_op(client.10169051886.0:19946 27.fbs1 27:df5e71b0:::rbd_data.3.abac3fb624614b.0000000000000520:head [call rbd.sparse_copyup in=24b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 0,truncate 1048576] snapc 0=[] RETRY=47 ondisk+retry+write+known_if_redirected+supports_pool_eio e1767513) v8

The call rbd.sparse_copyup in=24b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 0,truncate 1048576 in particular. The reproducing steps are: having a qemu vm talk with rbd with discard=unmap enabled, writing data and fstrim'ing the fs.
In the logs the in my reproduction tries I could only get a single truncate mentioned, like [call rbd.sparse_copyup in=7340072b,set-alloc-hint object_size 8388608 write_size 8388608,truncate 7340032]

Perhaps someone has an idea what could be missing here to fully reproduce it.

@amathuria
Copy link
Contributor

@yuriw yuriw merged commit 9a41cd7 into ceph:main May 3, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants