osd: refcounting chunks for snapshotted manifest object #29283

myoungwon · 2019-07-24T16:43:05Z

https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/7G222W5F7MD7X5MJS3BF6YDJ76RKJGN5/

To prevent removing a manifest object in use,
chunk maps in clone object are introduced.

Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

myoungwon · 2019-07-29T09:13:49Z

@liewegas Before starting scrub work for snapshotted manifest object, I want to get a feedback to remove misunderstanding about the concept.

These commit present the reference tracking based on clone info as @gregsfortytwo's comment.
The refcount will be increased if clone or head object is created and it will be decreased if clone or head object is deleted through each object_info_t's refcount.

Note that he key idea behind of reference counting is false-positive, which means (manifest object (no ref), chunk object(has ref)) can be possible instead of (manifest object (has ref), chunk 1(no ref)).
If inconsistency occurs, this will be fixed by a scrub processing (ceph-dedup-tool).

Am I missing something?

myoungwon · 2019-08-02T11:48:00Z

@liewegas Ping.

liewegas · 2019-08-06T17:26:37Z

src/osd/PrimaryLogPG.cc

+      }
+    }
+  }
+


This part of the design worries me. It means that the first time we touch a snapshotted object and need to make a clone, the op is blocked while we go bump a bunch of ref counts.

myoungwon · 2019-08-09T13:17:47Z

@liewegas Make sense?

myoungwon · 2019-09-04T13:21:24Z

@liewegas Ping.

liewegas · 2019-09-06T13:27:11Z

src/osd/PrimaryLogPG.cc

+	interval_set<uint64_t> clone_overlap = newest_overlap;
+	interval_set<uint64_t> chunk_range;
+	chunk_range.insert(p.first, p.second.length);
+	clone_overlap.intersection_of(chunk_range);


i think here you can replace the last 4 lines with just if (newest_overlap.intersects(p.first, p.second.length)) { ...

src/osd/PrimaryLogPG.cc

myoungwon · 2019-09-09T12:40:17Z

@liewegas Fixed. Could you take a look?

liewegas · 2019-09-09T13:29:04Z

src/osd/PrimaryLogPG.cc

+  if (!has_reference) {
+    return false;
+  }
+  return true;


just return has_reference;?

liewegas · 2019-09-09T13:41:28Z

src/osd/PrimaryLogPG.cc

+	// check if the references is still used
+	object_info_t& oi = cobc->obs.oi;
+	if (oi.has_manifest()) {
+	  coi.manifest.build_non_intersection_set(oi.manifest.chunk_map, no_refs, NULL);


The way build_non_intersection_set is written, it will only add things that don't overlap. So calling it inside a loop won't find things that don't intersect with all other clones.. instead, it'll find things that don't intersect with at least one other clone, which isn't very useful.

I think this only needs to check the before and after clones.

I think it needs to do the first pass, where it finds the intersection, on both of those clones, and then do the second pass.

myoungwon · 2019-09-09T16:36:12Z

@liewegas I addressed your comments. Please review.

liewegas · 2019-09-09T16:45:19Z

src/osd/PrimaryLogPG.cc

+    object_info_t& coi = cobc->obs.oi;
+    oi.manifest.build_intersection_set(coi.manifest.chunk_map, refs, &newest_overlap);
+
+    if (refs.size() > 0) {


I think this if should go

liewegas · 2019-09-09T16:46:19Z

other than that if, i think this looks right!

tchaikov · 2020-07-05T13:32:58Z

@athanatos @myoungwon i will include this PR in my next batch.

tchaikov · 2020-07-06T12:57:45Z

myoungwon · 2020-07-07T09:56:32Z

@athanatos The test result looks like there is no fails which have to do with this PR. What do you think this?

athanatos · 2020-07-07T17:54:04Z

@myoungwon What caused the two rados/test.sh failures? They persisted into the second run and could be related.

tchaikov · 2020-07-08T03:44:28Z

2020-07-06T12:06:10.597 INFO:teuthology.orchestra.run.smithi192:workunit test rados/test.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_
CLI_TEST_DUP_COMMAND=1 CEPH_REF=7b60e408aedc30fb1b71a2c6e541618527d6e6d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.c
lient.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rad
os/test.sh
...
2020-07-06T18:06:10.641 DEBUG:teuthology.orchestra.run:got remote process result: 124

they timedout.

$ grep -A1 '\[==========\]' /a/kchai-2020-07-06_11:39:50-rados-wip-kefu-testing-2020-07-06-1016-distro-basic-smithi/5202896/teuthology.log
2020-07-06T12:06:10.708 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: [==========] Running 12 tests from 1 test suite.
2020-07-06T12:06:10.708 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: [----------] Global test environment set-up.
--
2020-07-06T12:06:36.503 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: 2020-07-06T12:06:35.935+0000 7fe5167fc700  3 client.4328.objecter handle_osd_map decoding i              api_service: [==========] Running 3 tests from 1 test suite.
2020-07-06T12:06:36.504 INFO:tasks.workunit.client.0.smithi192.stdout:              api_service: [----------] Global test environment set-up.
--
2020-07-06T12:06:36.506 INFO:tasks.workunit.client.0.smithi192.stdout:              api_service: [==========] 3 tests from 1 test suite ran. (25690 ms total)
2020-07-06T12:06:36.506 INFO:tasks.workunit.client.0.smithi192.stdout:              api_service: [  PASSED  ] 3 tests.
--
2020-07-06T12:06:47.337 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: 2020-07-06T12:06:46           api_service_pp: [==========] Running 4 tests from 1 test suite.
2020-07-06T12:06:47.338 INFO:tasks.workunit.client.0.smithi192.stdout:           api_service_pp: [----------] Global test environment set-up.
--
2020-07-06T12:06:47.353 INFO:tasks.workunit.client.0.smithi192.stdout:           api_service_pp: [==========] 4 tests from 1 test suite ran. (36510 ms total)
2020-07-06T12:06:47.363 INFO:tasks.workunit.client.0.smithi192.stdout:           api_service_pp: [  PASSED  ] 4 tests.
--
2020-07-06T12:06:53.049 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_pool: [==========] Running 7 tests from 1 test suite.
2020-07-06T12:06:53.049 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_pool: [----------] Global test environment set-up.
--
2020-07-06T12:06:53.053 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_pool: [==========] 7 tests from 1 test suite ran. (42335 ms total)
2020-07-06T12:06:53.053 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_pool: [  PASSED  ] 7 tests.
--
2020-07-06T12:07:15.841 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: 2020-07-06T12:07:14.743+0000                 api_misc: [==========] Running 11 tests from 4 test suites.
2020-07-06T12:07:15.842 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_misc: [----------] Global test environment set-up.
--
2020-07-06T12:07:43.729 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: [==========] 12 tests from 1 test suite ran. (93020 ms total)
2020-07-06T12:07:43.729 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_asio: [  PASSED  ] 12 tests.
--
2020-07-06T12:08:35.928 INFO:tasks.workunit.client.0.smithi192.stdout:    api_c_read_operations: [==========] Running 16 tests from 1 test suite.
2020-07-06T12:08:35.929 INFO:tasks.workunit.client.0.smithi192.stdout:    api_c_read_operations: [----------] Global test environment set-up.
--
2020-07-06T12:08:35.936 INFO:tasks.workunit.client.0.smithi192.stdout:    api_c_read_operations: [==========] 16 tests from 1 test suite ran. (145085 ms total)
2020-07-06T12:08:35.936 INFO:tasks.workunit.client.0.smithi192.stdout:    api_c_read_operations: [  PASSED  ] 16 tests.
--
2020-07-06T12:08:52.886 INFO:tasks.workunit.client.0.smithi192.stdout:               api_cmd_pp: [==========] Running 3 tests from 1 test suite.
2020-07-06T12:08:52.886 INFO:tasks.workunit.client.0.smithi192.stdout:               api_cmd_pp: [----------] Global test environment set-up.
--
2020-07-06T12:08:52.888 INFO:tasks.workunit.client.0.smithi192.stdout:               api_cmd_pp: [==========] 3 tests from 1 test suite ran. (162085 ms total)
2020-07-06T12:08:52.888 INFO:tasks.workunit.client.0.smithi192.stdout:               api_cmd_pp: [  PASSED  ] 3 tests.
--
2020-07-06T12:08:53.969 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_cmd: [==========] Running 4 tests from 1 test suite.
2020-07-06T12:08:53.969 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_cmd: [----------] Global test environment set-up.
--
2020-07-06T12:08:56.082 INFO:tasks.workunit.client.0.smithi192.stdout:      api_watch_notify_pp: [==========] Running 16 tests from 2 test suites.
2020-07-06T12:08:56.083 INFO:tasks.workunit.client.0.smithi192.stdout:      api_watch_notify_pp: [----------] Global test environment set-up.
--
2020-07-06T12:09:23.024 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_cmd: got: 2020-07-06T12:09:14.073735+0000 mon.c [INF] from='client.? 172.21.15.192:0/1203117768' entity='client.admin' cmd=[{"prefix": "osd erasure-code-profile rm", "name": "testprofile-test-rados-api-smithi192-26074-12"}]: d                 api_list: [==========] Running 10 tests from 3 test suites.
2020-07-06T12:09:23.024 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_list: [----------] Global test environment set-up.
--
2020-07-06T12:09:28.884 INFO:tasks.workunit.client.0.smithi192.stdout:      api_watch_notify_pp: [==========] 16 tests from 2 test suites ran. (198104 ms total)
2020-07-06T12:09:28.884 INFO:tasks.workunit.client.0.smithi192.stdout:      api_watch_notify_pp: [  PASSED  ] 16 tests.
--
2020-07-06T12:09:29.907 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_misc: [==========] 11 tests from 4 test suites ran. (199185 ms total)
2020-07-06T12:09:29.907 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_misc: [  PASSED  ] 11 tests.
--
2020-07-06T12:09:29.908 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_stat: [==========] Running 8 tests from 2 test suites.
2020-07-06T12:09:29.908 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_stat: [----------] Global test environment set-up.
--
2020-07-06T12:09:29.922 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_stat: [==========] 8 tests from 2 test suites ran. (199160 ms total)
2020-07-06T12:09:29.922 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_stat: [  PASSED  ] 8 tests.
--
2020-07-06T12:09:29.958 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_lock: [==========] Running 16 tests from 2 test suites.
2020-07-06T12:09:29.958 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_lock: [----------] Global test environment set-up.
--
2020-07-06T12:09:29.967 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_lock: [==========] 16 tests from 2 test suites ran. (199292 ms total)
2020-07-06T12:09:29.967 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_lock: [  PASSED  ] 16 tests.
--
2020-07-06T12:09:30.410 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_cmd: [==========] 4 tests from 1 test suite ran. (199629 ms total)
2020-07-06T12:09:30.411 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_cmd: [  PASSED  ] 4 tests.
--
2020-07-06T12:09:34.151 INFO:tasks.workunit.client.0.smithi192.stdout:   api_c_write_operations: [==========] Running 8 tests from 2 test suites.
2020-07-06T12:09:34.152 INFO:tasks.workunit.client.0.smithi192.stdout:   api_c_write_operations: [----------] Global test environment set-up.
--
2020-07-06T12:09:34.156 INFO:tasks.workunit.client.0.smithi192.stdout:   api_c_write_operations: [==========] 8 tests from 2 test suites ran. (203323 ms total)
2020-07-06T12:09:34.156 INFO:tasks.workunit.client.0.smithi192.stdout:   api_c_write_operations: [  PASSED  ] 8 tests.
--
2020-07-06T12:09:41.134 INFO:tasks.workunit.client.0.smithi192.stdout:              api_stat_pp: [==========] Running 9 tests from 2 test suites.
2020-07-06T12:09:41.134 INFO:tasks.workunit.client.0.smithi192.stdout:              api_stat_pp: [----------] Global test environment set-up.
--
2020-07-06T12:09:41.141 INFO:tasks.workunit.client.0.smithi192.stdout:              api_stat_pp: [==========] 9 tests from 2 test suites ran. (210368 ms total)
2020-07-06T12:09:41.142 INFO:tasks.workunit.client.0.smithi192.stdout:              api_stat_pp: [  PASSED  ] 9 tests.
--
2020-07-06T12:09:47.473 INFO:tasks.workunit.client.0.smithi192.stdout:         api_watch_notify: [==========] Running 11 tests from 2 test suites.
2020-07-06T12:09:47.473 INFO:tasks.workunit.client.0.smithi192.stdout:         api_watch_notify: [----------] Global test environment set-up.
--
2020-07-06T12:09:47.483 INFO:tasks.workunit.client.0.smithi192.stdout:         api_watch_notify: [==========] 11 tests from 2 test suites ran. (216704 ms total)
2020-07-06T12:09:47.483 INFO:tasks.workunit.client.0.smithi192.stdout:         api_watch_notify: [  PASSED  ] 11 tests.
--
2020-07-06T12:09:58.884 INFO:tasks.workunit.client.0.smithi192.stdout:                api_io_pp: [==========] Running 37 tests from 2 test suites.
2020-07-06T12:09:58.884 INFO:tasks.workunit.client.0.smithi192.stdout:                api_io_pp: [----------] Global test environment set-up.
--
2020-07-06T12:10:11.568 INFO:tasks.workunit.client.0.smithi192.stdout:              api_lock_pp: [==========] Running 16 tests from 2 test suites.
2020-07-06T12:10:11.568 INFO:tasks.workunit.client.0.smithi192.stdout:              api_lock_pp: [----------] Global test environment set-up.
--
2020-07-06T12:10:11.579 INFO:tasks.workunit.client.0.smithi192.stdout:              api_lock_pp: [==========] 16 tests from 2 test suites ran. (240895 ms total)
2020-07-06T12:10:11.579 INFO:tasks.workunit.client.0.smithi192.stdout:              api_lock_pp: [  PASSED  ] 16 tests.
--
2020-07-06T12:10:24.253 INFO:tasks.workunit.client.0.smithi192.stdout:                   api_io: [==========] Running 23 tests from 2 test suites.
2020-07-06T12:10:24.253 INFO:tasks.workunit.client.0.smithi192.stdout:                   api_io: [----------] Global test environment set-up.
--
2020-07-06T12:10:24.265 INFO:tasks.workunit.client.0.smithi192.stdout:                   api_io: [==========] 23 tests from 2 test suites ran. (253612 ms total)
2020-07-06T12:10:24.265 INFO:tasks.workunit.client.0.smithi192.stdout:                   api_io: [  PASSED  ] 23 tests.
--
2020-07-06T12:10:38.618 INFO:tasks.workunit.client.0.smithi192.stdout:                api_io_pp: [==========] 37 tests from 2 test suites ran. (267971 ms total)
2020-07-06T12:10:38.618 INFO:tasks.workunit.client.0.smithi192.stdout:                api_io_pp: [  PASSED  ] 37 tests.
--
2020-07-06T12:10:51.373 INFO:tasks.workunit.client.0.smithi192.stdout:            api_snapshots: [==========] Running 12 tests from 4 test suites.
2020-07-06T12:10:51.374 INFO:tasks.workunit.client.0.smithi192.stdout:            api_snapshots: [----------] Global test environment set-up.
--
2020-07-06T12:10:51.389 INFO:tasks.workunit.client.0.smithi192.stdout:            api_snapshots: [==========] 12 tests from 4 test suites ran. (280643 ms total)
2020-07-06T12:10:51.389 INFO:tasks.workunit.client.0.smithi192.stdout:            api_snapshots: [  PASSED  ] 12 tests.
--
2020-07-06T12:11:41.979 INFO:tasks.workunit.client.0.smithi192.stdout:         api_snapshots_pp: [==========] Running 18 tests from 4 test suites.
2020-07-06T12:11:41.979 INFO:tasks.workunit.client.0.smithi192.stdout:         api_snapshots_pp: [----------] Global test environment set-up.
--
2020-07-06T12:11:41.990 INFO:tasks.workunit.client.0.smithi192.stdout:         api_snapshots_pp: [==========] 18 tests from 4 test suites ran. (331247 ms total)
2020-07-06T12:11:41.990 INFO:tasks.workunit.client.0.smithi192.stdout:         api_snapshots_pp: [  PASSED  ] 18 tests.
--
2020-07-06T12:11:49.271 INFO:tasks.workunit.client.0.smithi192.stdout:               api_aio_pp: [==========] Running 53 tests from 4 test suites.
2020-07-06T12:11:49.271 INFO:tasks.workunit.client.0.smithi192.stdout:               api_aio_pp: [----------] Global test environment set-up.
--
2020-07-06T12:11:53.908 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_aio: [==========] Running 40 tests from 2 test suites.
2020-07-06T12:11:53.908 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_aio: [----------] Global test environment set-up.
--
2020-07-06T12:13:24.642 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_list: [==========] 10 tests from 3 test suites ran. (433968 ms total)
2020-07-06T12:13:24.642 INFO:tasks.workunit.client.0.smithi192.stdout:                 api_list: [  PASSED  ] 10 tests.
--
2020-07-06T12:13:52.873 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_aio: [==========] 40 tests from 2 test suites ran. (462232 ms total)
2020-07-06T12:13:52.874 INFO:tasks.workunit.client.0.smithi192.stdout:                  api_aio: [  PASSED  ] 40 tests.
--
2020-07-06T12:15:11.053 INFO:tasks.workunit.client.0.smithi192.stdout:               api_aio_pp: [==========] 53 tests from 4 test suites ran. (540413 ms total)
2020-07-06T12:15:11.053 INFO:tasks.workunit.client.0.smithi192.stdout:               api_aio_pp: [  PASSED  ] 53 tests.
--
2020-07-06T12:15:27.648 INFO:tasks.workunit.client.0.smithi192.stdout:              api_misc_pp: [==========] Running 31 tests from 7 test suites.
2020-07-06T12:15:27.649 INFO:tasks.workunit.client.0.smithi192.stdout:              api_misc_pp: [----------] Global test environment set-up.
--
2020-07-06T12:16:01.177 INFO:tasks.workunit.client.0.smithi192.stdout:              api_misc_pp: [==========] 31 tests from 7 test suites ran. (590461 ms total)
2020-07-06T12:16:01.177 INFO:tasks.workunit.client.0.smithi192.stdout:              api_misc_pp: [  PASSED  ] 31 tests.
--
2020-07-06T12:17:59.992 INFO:tasks.workunit.client.0.smithi192.stdout:              api_tier_pp: [==========] Running 62 tests from 4 test suites.
2020-07-06T12:17:59.992 INFO:tasks.workunit.client.0.smithi192.stdout:              api_tier_pp: [----------] Global test environment set-up.

in which api_tier_pp never finishes.

tchaikov · 2020-07-08T03:51:33Z

rerunning the failed tests in the second run at https://pulpito.ceph.com/kchai-2020-07-08_03:34:45-rados-wip-kefu-testing-2020-07-06-1016-distro-basic-smithi/

myoungwon · 2020-07-08T05:51:23Z

This seems like a bug in dec_all_refcount_manifest().

api_tier_pp never finish because vargrind reports an error

2020-07-06T12:23:32.593 INFO:tasks.workunit.client.0.smithi023.stdout:              api_tier_pp: [       OK ] LibRadosTwoPoolsPP.ProxyRead (23522 ms)
2020-07-06T12:23:32.593 INFO:tasks.workunit.client.0.smithi023.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.CachePin
2020-07-06T12:23:36.447 INFO:tasks.ceph.mon.a.smithi023.stderr:==00:00:17:31.848 21858== Warning: unimplemented fcntl command: 1036
2020-07-06T12:23:36.489 INFO:tasks.ceph.mon.a.smithi023.stderr:==00:00:17:31.890 21858== Warning: unimplemented fcntl command: 1036
.......................
2020-07-06T12:24:28.771  INFO:tasks.ceph.osd.3.smithi023.stderr:==00:00:18:02.099 22484== Exit program on first error (--exit-on-first-error=yes)

And this caused by

<kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x8A25F8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>__shared_ptr</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>1165</line>
    </frame>
    <frame>
      <ip>0x8A25F8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>shared_ptr</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr.h</file>
      <line>129</line>
    </frame>
    <frame>
      <ip>0x8A25F8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-16.0.0-3235.g7b60e408aed.el8.x86_64/src/osd</dir>
      <file>PrimaryLogPG.cc</file>
      <line>3520</line>
    </frame>
    <frame>
      <ip>0x8A25F8</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>std::_Function_handler&lt;void (), PrimaryLogPG::dec_all_refcount_manifest(object_info_t const&amp;, PrimaryLogPG::OpContext*)::{lambda()#2}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0x864703</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>operator()</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>687</line>
    </frame>
    <frame>
      <ip>0x864703</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PrimaryLogPG::eval_repop(PrimaryLogPG::RepGather*)</fn>
      <dir>/usr/src/debug/ceph-16.0.0-3235.g7b60e408aed.el8.x86_64/src/osd</dir>
      <file>PrimaryLogPG.cc</file>
      <line>10589</line>
    </frame>
    <frame>

Based on valgrind's report, in dec_all_refcount_manifest(),

   if (!refs.is_empty()) {
      ctx->register_on_commit(                                                            
        [ctx, this, refs](){
          dec_refcount(ctx->obc, refs);                                                   
        });                                                                               
    }

dec_refcount is called with unreferenced ctx due to move(ctx) in AwaitAsyncWork::react().
( AwaitAsyncWork::react() -> trim_object -> move(ctx) ).

So, to fix this, replace above code with

    if (!refs.is_empty()) { 
      hobject_t soid = ctx->obc->obs.oi.soid;
      ctx->register_on_commit(
        [soid, this, refs](){
          ObjectContextRef obc = get_object_context(soid, false, NULL);
          ceph_assert(obc);
          dec_refcount(obc, refs);
        });
    }

Make sense?

ctx will be moved, so replace it with the value Signed-off-by: Myoungwon Oh <ohmyoungwon@gmail.com>

tchaikov · 2020-07-08T07:44:49Z

@myoungwon probably you could rerun the failed tests with your latest fix to verify it?

myoungwon · 2020-07-08T08:07:44Z

@tchaikov I cannot access the sepia yet because my update request for login credentials is in-progress. When that request is competed, I can rerun this.

myoungwon · 2020-07-08T09:57:50Z

retest this please

gregsfortytwo · 2020-07-08T23:04:55Z

src/osd/osd_types.cc

+    if (i == manifest.chunk_map.end() || current != i->first) {
+      return nullptr;
+    } else {
+      // We advance the iterator iff we consider the chunk_map on this iteration


This comment was not actually helpful in understanding what's going on here. The way you're advancing through the loop is elegant but "hiding" the iter advancement inside of a lambda function is messy, so you should explain the lambda in light of the overall algorithm, especially since its very clear name only describes a portion of its purpose.

gregsfortytwo · 2020-07-08T23:23:40Z

Those functions look fine @athanatos other than the one bit needing a better comment.

athanatos · 2020-07-09T01:41:54Z

@myoungwon Top commit on https://github.com/athanatos/ceph/commits/sjust/wip-fix-comment (athanatos@1c65b0a) has an improvement to address @gregsfortytwo's comment.

Signed-off-by: Samuel Just <sjust@redhat.com>

athanatos · 2020-07-09T17:30:47Z

retest this please

tchaikov · 2020-07-11T11:29:43Z

failures tracked by

https://tracker.ceph.com/issues/46442

athanatos · 2020-07-13T17:50:53Z

retest this please

myoungwon · 2020-07-14T02:17:08Z

jenkins retest this please

myoungwon · 2020-07-14T02:30:19Z

@athanatos Can you look over this PR? QA result looks good. Do I need to rerun the tests?

myoungwon added the core label Jul 24, 2019

myoungwon force-pushed the wip-refcount-snap branch 5 times, most recently from e57060f to b496bf3 Compare July 29, 2019 08:37

liewegas reviewed Aug 6, 2019

View reviewed changes

myoungwon force-pushed the wip-refcount-snap branch 3 times, most recently from f23ca9c to 54bf4b2 Compare August 9, 2019 10:43

myoungwon requested a review from liewegas August 9, 2019 13:18

myoungwon mentioned this pull request Sep 3, 2019

osd/PrimaryLogPG: cancel in-flight manifest ops on interval changing; fix race with scrub #29985

Merged

3 tasks

liewegas reviewed Sep 6, 2019

View reviewed changes

src/osd/PrimaryLogPG.cc Outdated Show resolved Hide resolved

liewegas reviewed Sep 6, 2019

View reviewed changes

src/osd/PrimaryLogPG.cc Outdated Show resolved Hide resolved

myoungwon force-pushed the wip-refcount-snap branch 2 times, most recently from 290d83e to 95c0df2 Compare September 8, 2019 11:32

myoungwon changed the title ~~WIP: osd: refcounting chunks for snapshotted manifest object~~ osd: refcounting chunks for snapshotted manifest object Sep 9, 2019

liewegas reviewed Sep 9, 2019

View reviewed changes

myoungwon force-pushed the wip-refcount-snap branch from 95c0df2 to 64c0709 Compare September 9, 2019 16:26

liewegas reviewed Sep 9, 2019

View reviewed changes

myoungwon force-pushed the wip-refcount-snap branch from 64c0709 to f11fa97 Compare September 9, 2019 16:56

tchaikov added the wip-kefu-testing label Jul 5, 2020

tchaikov removed the wip-kefu-testing label Jul 6, 2020

osd: fix reference leak

65c99a4

ctx will be moved, so replace it with the value Signed-off-by: Myoungwon Oh <ohmyoungwon@gmail.com>

gregsfortytwo reviewed Jul 8, 2020

View reviewed changes

osd_types: clarify comments in calc_refs_to_drop_on_removal

94b57f0

Signed-off-by: Samuel Just <sjust@redhat.com>

tchaikov added the wip-kefu-testing label Jul 10, 2020

tchaikov added needs-review and removed needs-qa wip-kefu-testing labels Jul 11, 2020

athanatos self-requested a review July 14, 2020 22:47

athanatos approved these changes Jul 14, 2020

View reviewed changes

athanatos merged commit f88211b into ceph:master Jul 14, 2020

myoungwon mentioned this pull request Jul 19, 2020

osd: fix incrementing refcount snap before flush #36105

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: refcounting chunks for snapshotted manifest object #29283

osd: refcounting chunks for snapshotted manifest object #29283

myoungwon commented Jul 24, 2019 •

edited

myoungwon commented Jul 29, 2019

myoungwon commented Aug 2, 2019

liewegas Aug 6, 2019

myoungwon commented Aug 9, 2019

myoungwon commented Sep 4, 2019

liewegas Sep 6, 2019 •

edited

myoungwon commented Sep 9, 2019

liewegas Sep 9, 2019

myoungwon Sep 9, 2019

liewegas Sep 9, 2019

myoungwon commented Sep 9, 2019

liewegas Sep 9, 2019

liewegas commented Sep 9, 2019

tchaikov commented Jul 5, 2020

tchaikov commented Jul 6, 2020

myoungwon commented Jul 7, 2020 •

edited

athanatos commented Jul 7, 2020 •

edited

tchaikov commented Jul 8, 2020

tchaikov commented Jul 8, 2020

myoungwon commented Jul 8, 2020 •

edited

tchaikov commented Jul 8, 2020

myoungwon commented Jul 8, 2020

myoungwon commented Jul 8, 2020

gregsfortytwo Jul 8, 2020

gregsfortytwo commented Jul 8, 2020

athanatos commented Jul 9, 2020

athanatos commented Jul 9, 2020

tchaikov commented Jul 11, 2020

athanatos commented Jul 13, 2020

myoungwon commented Jul 14, 2020

myoungwon commented Jul 14, 2020 •

edited

osd: refcounting chunks for snapshotted manifest object #29283

osd: refcounting chunks for snapshotted manifest object #29283

Conversation

myoungwon commented Jul 24, 2019 • edited

myoungwon commented Jul 29, 2019

myoungwon commented Aug 2, 2019

liewegas Aug 6, 2019

Choose a reason for hiding this comment

myoungwon commented Aug 9, 2019

myoungwon commented Sep 4, 2019

liewegas Sep 6, 2019 • edited

Choose a reason for hiding this comment

myoungwon commented Sep 9, 2019

liewegas Sep 9, 2019

Choose a reason for hiding this comment

myoungwon Sep 9, 2019

Choose a reason for hiding this comment

liewegas Sep 9, 2019

Choose a reason for hiding this comment

myoungwon commented Sep 9, 2019

liewegas Sep 9, 2019

Choose a reason for hiding this comment

liewegas commented Sep 9, 2019

tchaikov commented Jul 5, 2020

tchaikov commented Jul 6, 2020

myoungwon commented Jul 7, 2020 • edited

athanatos commented Jul 7, 2020 • edited

tchaikov commented Jul 8, 2020

tchaikov commented Jul 8, 2020

myoungwon commented Jul 8, 2020 • edited

tchaikov commented Jul 8, 2020

myoungwon commented Jul 8, 2020

myoungwon commented Jul 8, 2020

gregsfortytwo Jul 8, 2020

Choose a reason for hiding this comment

gregsfortytwo commented Jul 8, 2020

athanatos commented Jul 9, 2020

athanatos commented Jul 9, 2020

tchaikov commented Jul 11, 2020

athanatos commented Jul 13, 2020

myoungwon commented Jul 14, 2020

myoungwon commented Jul 14, 2020 • edited

myoungwon commented Jul 24, 2019 •

edited

liewegas Sep 6, 2019 •

edited

myoungwon commented Jul 7, 2020 •

edited

athanatos commented Jul 7, 2020 •

edited

myoungwon commented Jul 8, 2020 •

edited

myoungwon commented Jul 14, 2020 •

edited