[DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary #41293

trociny · 2021-05-12T07:04:36Z

First commit:

This is a direct commit to nautilus branch to recreate the
behaviour we have post-nautilus, after refactoring made in
8a8947d, which seems to be
correct.

The in-flight backfill prevents updating of "backfill complete"
position, which remains on the object before the missing oid. So
when pg is re-peered it retries backfill from this position
instead of entering the clean state.

Fixes: https://tracker.ceph.com/issues/50747
Signed-off-by: Mykola Golub mgolub@suse.com

Second commit:

backport tracker: https://tracker.ceph.com/issues/50792

backport of: #41270

parent tracker: https://tracker.ceph.com/issues/50351

…imary This is a direct commit to nautilus branch to recreate the behaviour we have post-nautilus, after refactoring made in 8a8947d, which seems to be correct. The in-flight backfill prevents updating of "backfill complete" position, which remains on the object before the missing oid. So when pg is re-peered it retries backfill from this position instead of entering the clean state. Fixes: https://tracker.ceph.com/issues/50747 Signed-off-by: Mykola Golub <mgolub@suse.com>

trociny · 2021-05-12T07:05:32Z

It will also require the backport of #41270

In PrimaryLogPG::on_failed_pull, we unconditionally remove soid from recovering list, but remove it from backfills_in_flight only when the backfill source is the primary osd. Fixes: https://tracker.ceph.com/issues/50351 Signed-off-by: Mykola Golub <mgolub@suse.com> (cherry picked from commit 9b78e00)

smithfarm · 2021-05-13T08:16:15Z

jenkins test make check

neha-ojha

This approach looks reasonable to me! @athanatos I'd like to get your thoughts as well.
On a different note, it'd be great to add a test (in master and then backport) to replicate what was done manually in https://tracker.ceph.com/issues/50351, to catch future bugs in this area.

athanatos · 2021-05-19T21:59:40Z

I'm not sure about this. Why don't we call on_primary_error or similar? I'm also kind of leery of backporting something like this to nautilus. What's the actual consequence of this bug?

trociny · 2021-05-20T10:28:11Z

@athanatos The actual consequence of this bug, that without this change the last_backfill position does not stop before "unfound" object, but advances. And after the backfill is finished it is at MAX. The pg enters backfill_unfound state. But when re-peering happens, the state is reset to clean. It looks confusing to users, who believe the unfound issue is resolved, while it is actually not.

With the fix the in-flight backfill prevents updating the last_backfill position, and it remains on the object before the missing oid. So when the pg is re-peered it retries backfill from this position instead of entering the clean state.

You can find more details in my emails to ceph-dev [1].

Why don't we call on_primary_error or similar?

I am not sure what you mean. When the error is on primary the current code behaves correctly (well, I mean the same as on the master). It behaves differently when the missing shard is not on a primary -- we want to skip removing the oid from backfills_in_flight in this case. I am not sure how on_primary_error could be used here.

My approach looks a bit hackish but as it is a direct commit in a maintenance branch I wanted to make it as simple as possible.

[1] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/OJEZB4YEGWV3EUJPFGLG2O3VHERR5ATI/

trociny · 2021-05-20T10:28:57Z

@neha-ojha thanks. I will work on the test and will push a separate PR for it.

yuriw · 2021-05-20T15:15:27Z

tests passed https://pulpito.ceph.com/yuriw-2021-05-19_19:35:08-rados-wip-yuri3-testing-2021-05-19-0809-nautilus-distro-basic-smithi

athanatos · 2021-05-20T19:21:48Z

It doesn't sound like that would prevent an upgrade. This pathway is pretty complicated, and this seems like quite a risky patch to a stable branch without a test particularly considering that that pathway is poorly tested anyway.

athanatos · 2021-05-20T20:26:33Z

PrimaryLogPG::primary_error on nautilus appears to update the primary's missing set as on current master, so that should be ok.

I think this patch replicates the behavior on master w.r.t. backfills_in_flight. @trociny If you've carefully tested this and related scenarios, it might be ok to merge, but I'd characterize this as a very risky backport. It might actually be safer to try to backport the refactor (just the patch in question, not the whole PeeringState refactor).

athanatos · 2021-05-20T20:37:39Z

With this patch, does restarting the primary still clear the unfound state?

neha-ojha · 2021-05-20T20:42:11Z

@trociny @athanatos I agree this patch is risky (especially at this stage in nautilus) and needs more testing that one rados suite run. It would really help to add a test to reproduce the issue and confirm that this patch fixes the problem with no side effects. Also, is there a workaround that can be used for https://tracker.ceph.com/issues/50351 instead?

trociny · 2021-05-21T08:08:21Z

It might actually be safer to try to backport the refactor (just the patch in question, not the whole PeeringState refactor).

@athanatos Although 8a8947d is cherry-pickable (just with some rather trivial conflicts) it relies on the changes introduced earlier in d33a8b8. And backporting this (or just needed changes) looks like a hell and much more risky.

With my current approach I am sure that I introduce the change (and potential issues) only for ECBackend, and only for the case
when an error occurs for a non-primary shard during backfill. Backporting 8a8947d (and all other bits it would need) would affect both EC and Replicated backends and much larger variety of cases.

trociny · 2021-05-21T08:17:57Z

With this patch, does restarting the primary still clear the unfound state?

Yes. The same as it is on the master currently. When the primary is temporarily down, another osd becomes primary. The last_backfill position is still (properly) set to an object before the "unfound", but as the unfound object shard is missing on this new primary osd, the object is not returned by collection_list here and skipped from backfilling. So the issue is not detected, the backfill finishes successfully and the pg enters clean state. It becomes similar to what was recently discussed in this thread [1].

[1] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/GYZAXV2JO5JK4J3L2G4NCYVFFWRIX5VD/

trociny · 2021-05-21T08:31:04Z

@trociny @athanatos I agree this patch is risky (especially at this stage in nautilus) and needs more testing that one rados suite run. It would really help to add a test to reproduce the issue and confirm that this patch fixes the problem with no side effects.

I have started working on the test. My first thought was to use as an example the tests test_backfill.sh, test_lost.sh, test_missing_unfound.sh, which I found in src/test. But they do not run on teuthology and seem not to be executed in any CI tests. So I think I have to add a new yaml to qa/suites/rados/singleton-nomsgr/all, using recovery-unfound-found.yaml as an example. Any suggestions are appreciated.

Also, is there a workaround that can be used for https://tracker.ceph.com/issues/50351 instead?

I am not sure I understand. The backport for 50351 in the nautilus branch is needed only if we introduce the post-nautilus behavior, i.e. as an addition to my first "risky" commit. Without the first commit it will not crash there, it will just enter a clean state.

trociny · 2021-05-21T08:36:34Z

Also, is there a workaround that can be used for https://tracker.ceph.com/issues/50351 instead?

I am not sure I understand. The backport for 50351 in the nautilus branch is needed only if we introduce the post-nautilus behavior, i.e. as an addition to my first "risky" commit. Without the first commit it will not crash there, it will just enter a clean state.

Probably you meant a workaround for the backfill_unfound state reset issue (50747), not for the crash? Then a workaround is to run scrub, which will detect unfound objects and the pg will enter inconsistent state. But the user will not aware of the issue (will think it has magically resolved) until the scrub is run.

athanatos · 2021-05-21T18:44:32Z

With this patch, does restarting the primary still clear the unfound state?

Yes. The same as it is on the master currently. When the primary is temporarily down, another osd becomes primary. The last_backfill position is still (properly) set to an object before the "unfound", but as the unfound object shard is missing on this new primary osd, the object is not returned by collection_list here and skipped from backfilling. So the issue is not detected, the backfill finishes successfully and the pg enters clean state. It becomes similar to what was recently discussed in this thread [1].

[1] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/GYZAXV2JO5JK4J3L2G4NCYVFFWRIX5VD/

Ok. Once we've got this backport sorted, we probably want to revisit the fix on master. Rather than blocking backfill at the unfound object, we probably want to add the offending object to the durably recorded missing sets where its missing and proceed with backfill. This probably means adding a force_missing message of some form.

athanatos · 2021-05-21T18:45:18Z

I have started working on the test. My first thought was to use as an example the tests test_backfill.sh, test_lost.sh, test_missing_unfound.sh, which I found in src/test. But they do not run on teuthology and seem not to be executed in any CI tests. So I think I have to add a new yaml to qa/suites/rados/singleton-nomsgr/all, using recovery-unfound-found.yaml as an example. Any suggestions are appreciated.

Ah, I have found useful examples in qa/tasks, like ec_lost_unfound.py.

Yeah, that's probably a good place to start.

The osd may be 0. Signed-off-by: Mykola Golub <mgolub@suse.com> (cherry picked from commit 93139db)

Fixes: https://tracker.ceph.com/issues/50925 Signed-off-by: Mykola Golub <mgolub@suse.com> (cherry picked from commit 0a9f7e1) Conflicts: qa/suites/rados/singleton/all/ec-backfill-unfound.yaml: log-whitelist instead of log-ignorelist

trociny · 2021-05-26T07:19:35Z

I added the teuthology test, backported from #41532. Note the master PR is not merged yet, so the test might need re-backporting later.

Below are results when scheduling with--filter ec-backfill-unfound and --subset 0/5 [1] and --subset 1/5 [2].

[1] https://pulpito.ceph.com/trociny-2021-05-26_06:18:46-rados-wip-mgolub-testing-nautilus-distro-basic-smithi/
[2] https://pulpito.ceph.com/trociny-2021-05-26_06:17:04-rados-wip-mgolub-testing-nautilus-distro-basic-smithi/

smithfarm · 2021-06-04T17:40:59Z

@neha-ojha @jdurgin @athanatos Is there still hope to get this into 14.2.22?

athanatos · 2021-06-04T19:42:58Z

@neha-ojha Where did we end up on this one?

neha-ojha · 2021-06-04T21:54:29Z

@neha-ojha @jdurgin @athanatos Is there still hope to get this into 14.2.22?

@athanatos @smithfarm This is waiting on @trociny to compare test runs from nautilus and master to figure out why we are not seeing #41532 (comment) in nautilus. Once we have a better understanding of the root cause and whether it imposes a threat to nautilus, we can take a call in this PR.

@trociny any updates? let's also run this through the upgrade suite

trociny · 2021-06-05T07:10:30Z

@trociny any updates?

I am still in process of investigating it. The current findings:

The crash is not filestore specific. The important condition seems to be that after a non-primary osd is stopped, we are in "backfilling" state with that osd missing in active set, and then the osd boots and the peering event comes. Filestore osds seem to start faster that is why we were able to see the crash here, while with the bluestore the backfill completed before the osd booted. And I was able to reproduce the issue with the bluestore too by increasing the backfill time (increasing the number of objects in the pg) [1].

Looking at the crash log [2] one can see that just before the crash PGPeeringEvent comes while the pg currently in active+undersized+degraded+remapped+backfilling state with [1,NONE,2] osd active set:

   -54> 2021-06-02T16:46:55.167+0000 7fd172a4d700 20 osd.1 op_wq(0) _process OpSchedulerItem(1.0s0 PGPeeringEvent(epoch_sent: 43 epoch_requested: 43 MLogRec from 3(1) log log((16'102016,16'102021], crt=16'102021) pi ([0,0] all_participants= intervals=) +create_info) prio 255 cost 10 e43) pg 0x56079df93000
   -53> 2021-06-02T16:46:55.167+0000 7fd172a4d700 10 osd.1 pg_epoch: 43 pg[1.0s0( v 16'102021 (16'101986,16'102021] local-lis/les=40/41 n=102021 ec=13/13 lis/c=40/35 les/c/f=41/36/0 sis=40) [1,0,2]/[1,NONE,2]p1(0) backfill=[0(1)] r=0 lpr=40 pi=[35,40)/1 crt=16'102021 lcod 16'102020 mlcod 0'0 active+undersized+degraded+remapped+backfilling rops=1 mbc={0={(1+0)=1},1={(0+0)=1},2={(0+0)=1}}] do_peering_event: epoch_sent: 43 epoch_requested: 43 MLogRec from 3(1) log log((16'102016,16'102021], crt=16'102021) pi ([0,0] all_participants= intervals=) +create_info

And when PrimaryLogPG::recover_replicas is called after this it bypasses if (recovery_state.get_missing_loc().is_unfound(soid)) condition [3] for the test object and aborts on "object added to missing set for backfill, but is not in recovering" error. And in the logs you can find that when PrimaryLogPG::recover_replicas was called before PGPeeringEvent came, the is_unfound condition returned true ("still unfound" messages were logged).

So my current assumption is that when PGPeeringEvent comes it updates the mission_loc in some way that it does not return is_unfound true any more, but I still have failed to figure out where exactly this happens. It is not clear from the log to me and it is problematic to reproduce it locally. May be you have some ideas?

In nautilus we have similar condition in PrimaryLogPG::recover_replicas [4] but looking in the osd log here for the filestore case, it looks like the test reproduces the conditions and the PGPeeringEvent comes when we are backfilling, but when PrimaryLogPG::recover_replicas is called later it is "still unfound":

2021-05-26 06:37:53.101 7f9a90c96700 20 osd.1 op_wq(0) _process 1.0s0 to_process <OpQueueItem(1.0s0 PGPeeringEvent(epoch_sent: 36 epoch_requested: 36 MLogRec from 3(1) +create_info) prio 255 cost 10 e36),OpQueueItem(1.0s0 PGOpItem(op=pg_scan(digest 1.0s0 1:f8d56238:::benchmark_data_smithi116_19237_object11416:head-MAX e 36/34) v2) prio 127 cost 33150 e36)> waiting <> waiting_peering {}
2021-05-26 06:37:53.101 7f9a90c96700 20 osd.1 op_wq(0) _process OpQueueItem(1.0s0 PGPeeringEvent(epoch_sent: 36 epoch_requested: 36 MLogRec from 3(1) +create_info) prio 255 cost 10 e36) pg 0xbfcf400
2021-05-26 06:37:53.101 7f9a90c96700 10 osd.1 pg_epoch: 36 pg[1.0s0( v 15'12248 (15'12177,15'12248] local-lis/les=34/35 n=12248 ec=12/12 lis/c 34/29 les/c/f 35/30/0 31/34/12) [1,0,2]/[1,2147483647,2]p1(0) backfill=[0(1)] r=0 lpr=34 pi=[29,34)/1 rops=1 crt=15'12248 lcod 15'12247 mlcod 0'0 active+undersized+remapped+backfilling mbc={}] do_peering_event: epoch_sent: 36 epoch_requested: 36 MLogRec from 3(1) +create_info
...
2021-05-26 06:37:53.105 7f9a94c9e700 10 osd.1 pg_epoch: 36 pg[1.0s0( v 15'12248 (15'12177,15'12248] local-lis/les=34/35 n=12248 ec=12/12 lis/c 34/29 les/c/f 35/30/0 31/34/12) [1,0,2]/[1,2147483647,2]p1(0) backfill=[0(1)] r=0 lpr=34 pi=[29,34)/1 crt=15'12248 lcod 15'12247 mlcod 0'0 active+undersized+remapped+backfilling mbc={0={(0+0)=1},1={(0+1)=1},2={(0+0)=1}}] recover_replicas: 1:ad551702:::test:head still unfound

My current assumption it that missing_loc is handled differently in nautilus on PGPeeringEvent.

[1] https://pulpito.ceph.com/trociny-2021-06-02_13:57:51-rados-master-distro-basic-smithi/
[2] http://qa-proxy.ceph.com/teuthology/trociny-2021-06-02_13:57:51-rados-master-distro-basic-smithi/6145804/remote/smithi193/crash/posted/2021-06-02T16%3A46%3A55.174344Z_240586b1-fca5-4c32-98a6-8c5b01a87c55/
[3]

ceph/src/osd/PrimaryLogPG.cc

Line 13631 in 9370ac9

if (recovery_state.get_missing_loc().is_unfound(soid)) {

[4]

ceph/src/osd/PrimaryLogPG.cc

Line 12937 in e48b581

if (missing_loc.is_unfound(soid)) {

trociny · 2021-06-05T12:17:16Z

So my current assumption is that when PGPeeringEvent comes it updates the mission_loc in some way that it does not return is_unfound true any more, but I still have failed to figure out where exactly this happens.

After adding some prints, I see now why is_unfound becomes to return false.

PeeringState::Active::react calls ps->search_for_missing, which calls MissingLoc::add_source_info, and according to this message in the log:

   -50> 2021-06-05T11:30:25.832+0000 7f448dfab700 10 osd.1 pg_epoch: 42 pg[1.0s0( v 16'64935 (16'64850,16'64935] local-lis/les=40/41 n=64935 ec=13/13 lis/c=40/35 les/c/f=41/36/0 sis=40) [1,0,2]/[1,NONE,2]p1(0) backfill=[0(1)] r=0 lpr=40 pi=[35,40)/1 crt=16'64935 lcod 16'64934 mlcod 0'0 active+undersized+degraded+remapped+backfilling rops=1 mbc={0={(1+0)=1},1={(0+0)=1},2={(0+0)=1}}] search_for_missing 1:ad551702:::test:head 16'2 is on osd.3(1)

It adds osd.3(1) to missing_loc (although actually the osd.3 is missing the object too). So when is_unfound [1] is called later, missing_loc contains two peers (``mit->second=1(0),3(1)) and (*is_recoverable)(mit->second)` returns true (and `is_unfound` returns false).

I still can't understand why in nautilus case is_unfound returns true here. In the log I see that add_source_info is called in this case too adding osd.3(1) to missing_loc, but is_unfound still returns true:

2021-05-26 06:37:53.101 7f9a90c96700 10 osd.1 pg_epoch: 36 pg[1.0s0( v 15'12248 (15'12177,15'12248] local-lis/les=34/35 n=12248 ec=12/12 lis/c 34/29 les/c/f 35/30/0 31/34/12) [1,0,2]/[1,2147483647,2]p1(0) backfill=[0(1)] r=0 lpr=34 pi=[29,34)/1 rops=1 crt=15'12248 lcod 15'12247 mlcod 0'0 active+undersized+remapped+backfilling mbc={}] search_for_missing 1:ad551702:::test:head 15'2 is on osd.3(1)
...
2021-05-26 06:37:53.105 7f9a94c9e700 10 osd.1 pg_epoch: 36 pg[1.0s0( v 15'12248 (15'12177,15'12248] local-lis/les=34/35 n=12248 ec=12/12 lis/c 34/29 les/c/f 35/30/0 31/34/12) [1,0,2]/[1,2147483647,2]p1(0) backfill=[0(1)] r=0 lpr=34 pi=[29,34)/1 crt=15'12248 lcod 15'12247 mlcod 0'0 active+undersized+remapped+backfilling mbc={0={(0+0)=1},1={(0+1)=1},2={(0+0)=1}}] recover_replicas: 1:ad551702:::test:head still unfound

[1]

ceph/src/osd/MissingLoc.h

Line 147 in 9370ac9

bool is_unfound(const hobject_t &hoid) const {

trociny · 2021-06-06T08:36:23Z

This is waiting on @trociny to compare test runs from nautilus and master to figure out why we are not seeing #41532 (comment) in nautilus.

So the difference between master and nautilus is that when the object error (missing shards) is detected during backfill, the master (properly) adds the primary osd to missing_loc (PrimaryLogPG::on_failed_pull calls recovery_state.force_object_missing, which calls missing_loc.rebuild), so after this mission_loc contains one peer (mission_loc.is_unfound properly returns true at this time). Then when the restarted osd boots and PGPeeringEvent comes, search_for_missing -> add_source_info is called which (improperly?) adds the booted osd to mission_loc, so it now contains 2 peers (object is "recoverable") and mission_loc.is_unfound returns false triggering the abort.

While in nautilus when the object error (missing shards) is detected during backfill, the primary osd is not added to missing_loc (mission_loc.is_unfound returns true due to mit == missing_loc.end() is true). And when the restarted osd boots and PGPeeringEvent comes and adds the booted osd to mission_loc, it contains only one (this) peer and mission_loc.is_unfound still returns true.

yuriw · 2021-06-07T19:48:00Z

tests passed https://trello.com/c/EPPqwRyb

neha-ojha · 2021-06-08T20:51:43Z

Thanks for your analysis @trociny! The problem is that though osd.3(1) is missing the object, it advertises 0 missing to the primary. This ties back to https://trello.com/c/swaxVPq8/722-osd-bluestore-persist-flags-on-objects-to-generate-an-eio-add-tests-to-exercise-it

2021-05-26T06:09:06.882+0000 7fd7f73a7700 10 osd.1 pg_epoch: 41 pg[1.0s0( v 16'34826 (16'34797,16'34826] local-lis/les=39/40 n=34826 ec=13/13 lis/c=39/34 les/c/f=40/35/0 sis=39) [1,0,2]/[1,NONE,2]p1(0) backfill=[0(1)] r=0 lpr=39 pi=[34,39)/1 crt=16'34826 lcod 16'34825 mlcod 0'0 active+undersized+degraded+remapped+backfilling rops=1 mbc={0={(1+0)=1},1={(0+0)=1},2={(0+0)=1}}] proc_replica_log for osd.3(1): 1.0s1( v 16'34826 (16'34821,16'34826] local-lis/les=37/38 n=34826 ec=13/13 lis/c=39/34 les/c/f=40/35/0 sis=39) log((16'34821,16'34826], crt=16'34826) missing(0 may_include_deletes = 1)
...
2021-05-26T06:09:06.882+0000 7fd7f73a7700 10 osd.1 pg_epoch: 41 pg[1.0s0( v 16'34826 (16'34797,16'34826] local-lis/les=39/40 n=34826 ec=13/13 lis/c=39/34 les/c/f=40/35/0 sis=39) [1,0,2]/[1,NONE,2]p1(0) backfill=[0(1)] r=0 lpr=39 pi=[34,39)/1 crt=16'34826 lcod 16'34825 mlcod 0'0 active+undersized+degraded+remapped+backfilling rops=1 mbc={0={(1+0)=1},1={(0+0)=1},2={(0+0)=1}}] search_for_missing 1:ad551702:::test:head 16'2 is on osd.3(1)

above is from /a/trociny-2021-05-26_04:56:24-rados-master-distro-basic-smithi/6135980

I appreciate your efforts in finding the root cause of the problem. This PR may workaround the backfill_unfound edge case if an osd restarts, but I do not feel confident that this patch will not introduce any other problems. Since this is the last nautilus point release, we won't be able to patch any future bugs in this area. I would really like for us to invest in fixing this issue in master with a more robust solution and provide #41293 (comment) as a workaround for nautilus.

trociny · 2021-06-09T04:58:05Z

@neha-ojha Thanks! I totally agree with you that trying to fix this in nautilus at this stage is of high risk.

trociny added bug-fix core labels May 12, 2021

trociny added this to the nautilus milestone May 12, 2021

smithfarm requested review from neha-ojha and jdurgin May 13, 2021 07:54

smithfarm added nautilus-batch-1 nautilus point releases needs-qa labels May 13, 2021

liewegas force-pushed the nautilus branch from 98b8bc5 to 71b14f0 Compare May 13, 2021 15:15

jdurgin changed the base branch from nautilus to nautilus-saved May 14, 2021 21:58

jdurgin changed the base branch from nautilus-saved to nautilus May 14, 2021 21:58

yuriw added the wip-yuri3-testing label May 19, 2021

neha-ojha reviewed May 19, 2021

View reviewed changes

neha-ojha requested a review from athanatos May 19, 2021 21:27

yuriw removed the wip-yuri3-testing label May 20, 2021

yuriw removed nautilus-batch-1 nautilus point releases needs-qa wip-yuri-testing labels May 24, 2021

trociny mentioned this pull request May 25, 2021

qa/suites/rados: add backfill_unfound test #41532

Closed

3 tasks

trociny added 2 commits May 25, 2021 18:08

qa/tasks/ceph_manager: fix assertion

87fa9df

The osd may be 0. Signed-off-by: Mykola Golub <mgolub@suse.com> (cherry picked from commit 93139db)

qa/suites/rados: add backfill_unfound test

e67b3f7

Fixes: https://tracker.ceph.com/issues/50925 Signed-off-by: Mykola Golub <mgolub@suse.com> (cherry picked from commit 0a9f7e1) Conflicts: qa/suites/rados/singleton/all/ec-backfill-unfound.yaml: log-whitelist instead of log-ignorelist

smithfarm added nautilus-batch-1 nautilus point releases needs-qa labels Jun 4, 2021

yuriw added the wip-yuri2-testing label Jun 5, 2021

yuriw requested review from neha-ojha and ideepika June 7, 2021 19:38

yuriw removed the wip-yuri2-testing label Jun 7, 2021

yuriw added the TESTED label Jun 7, 2021

neha-ojha added DNM and removed nautilus-batch-1 nautilus point releases labels Jun 9, 2021

yuriw removed the needs-qa label Jun 9, 2021

smithfarm changed the title ~~nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary~~ [DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary Jul 20, 2021

trociny closed this Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary #41293

[DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary #41293

trociny commented May 12, 2021 •

edited by smithfarm

trociny commented May 12, 2021

smithfarm commented May 13, 2021

neha-ojha left a comment

athanatos commented May 19, 2021

trociny commented May 20, 2021

trociny commented May 20, 2021

yuriw commented May 20, 2021

athanatos commented May 20, 2021 •

edited

athanatos commented May 20, 2021 •

edited

athanatos commented May 20, 2021 •

edited

neha-ojha commented May 20, 2021 •

edited

trociny commented May 21, 2021

trociny commented May 21, 2021

trociny commented May 21, 2021

trociny commented May 21, 2021 •

edited

athanatos commented May 21, 2021

athanatos commented May 21, 2021

trociny commented May 26, 2021

smithfarm commented Jun 4, 2021

athanatos commented Jun 4, 2021

neha-ojha commented Jun 4, 2021

trociny commented Jun 5, 2021

trociny commented Jun 5, 2021

trociny commented Jun 6, 2021

yuriw commented Jun 7, 2021

neha-ojha commented Jun 8, 2021

trociny commented Jun 9, 2021

[DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary #41293

[DNM: TOO RISKY] nautilus: osd/ECBackend: don't remove in-flight backfill when missing is not primary #41293

Conversation

trociny commented May 12, 2021 • edited by smithfarm

trociny commented May 12, 2021

smithfarm commented May 13, 2021

neha-ojha left a comment

Choose a reason for hiding this comment

athanatos commented May 19, 2021

trociny commented May 20, 2021

trociny commented May 20, 2021

yuriw commented May 20, 2021

athanatos commented May 20, 2021 • edited

athanatos commented May 20, 2021 • edited

athanatos commented May 20, 2021 • edited

neha-ojha commented May 20, 2021 • edited

trociny commented May 21, 2021

trociny commented May 21, 2021

trociny commented May 21, 2021

trociny commented May 21, 2021 • edited

athanatos commented May 21, 2021

athanatos commented May 21, 2021

trociny commented May 26, 2021

smithfarm commented Jun 4, 2021

athanatos commented Jun 4, 2021

neha-ojha commented Jun 4, 2021

trociny commented Jun 5, 2021

trociny commented Jun 5, 2021

trociny commented Jun 6, 2021

yuriw commented Jun 7, 2021

neha-ojha commented Jun 8, 2021

trociny commented Jun 9, 2021

trociny commented May 12, 2021 •

edited by smithfarm

athanatos commented May 20, 2021 •

edited

athanatos commented May 20, 2021 •

edited

athanatos commented May 20, 2021 •

edited

neha-ojha commented May 20, 2021 •

edited

trociny commented May 21, 2021 •

edited