Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: Fix local rbd mirror journals growing forever #49614

Merged
merged 3 commits into from Feb 17, 2023

Conversation

isodude
Copy link
Contributor

@isodude isodude commented Jan 2, 2023

This commit fixes commit 7ca1bab by pushing properly aligned discards back to m_image_extents, if corrected.

If discards are misaligned (off 0, len 4608, gran=4096), they are corrected properly, but only in object_extents and not in m_image_extents.

When journal_append_event is triggered it will only append from m_image_extents and does not now about the alignment fixes. In commit_io_events_extent it will throw an error that there are pending io since the larger misaligned area was sent to the journal. This will in turn break rbd journal mirroring since the local client will wait indefinately on the commit to be completed, which it never does.

Fixes: 7ca1bab
Fixes: https://tracker.ceph.com/issues/57396
Signed-off-by: Josef Johansson josef@oderland.se

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • -New feature (ticket optional)-
    • -Doc update (no ticket needed)-
    • -Code cleanup (no ticket needed)-
  • Component impact
    • -Affects Dashboard, opened tracker ticket-
    • -Affects Orchestrator, opened tracker ticket-
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • -Updates relevant documentation-
    • No doc update is appropriate
  • Tests (select at least one)

@isodude isodude requested a review from a team as a code owner January 2, 2023 13:33
@github-actions github-actions bot added the rbd label Jan 2, 2023
@isodude isodude force-pushed the wip-librbd-misalign-discard branch from 98282b9 to f67ebe1 Compare January 3, 2023 08:10
@idryomov
Copy link
Contributor

idryomov commented Jan 3, 2023

jenkins test make check

@isodude
Copy link
Contributor Author

isodude commented Jan 3, 2023

This should be backported to nautilus+.

I think I have an idea for tests.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When journal_append_event is triggered it will only append from m_image_extents and does not now about the alignment fixes. In commit_io_events_extent it will throw an error that there are pending io since the larger misaligned area was sent to the journal.

Hi Josef,

What do you mean by "it will throw an error"? Doesn't commit_io_event_extent just bail?

  if (!event.pending_extents.empty()) {
    ldout(cct, 20) << this << " " << __func__ << ": "
                   << "pending extents: " << event.pending_extents << dendl;
    return;
  }
  complete_event(it, event.ret_val);

This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.

Am I understanding correctly that the root cause has nothing to do with the consumer/replayer side (rbd-mirror)? The issue is that the journal entry doesn't get "fulfilled" and therefore marked as committed on the producer side?

src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
@idryomov
Copy link
Contributor

idryomov commented Jan 3, 2023

This should be backported to nautilus+.

Nautilus is EOL. I'll adjust the tracker ticket to make sure the fix gets backported to supported releases.

I think I have an idea for tests.

I'd look in the direction of replicating TestMockIoImageRequest.DiscardGranularity mock test in the journaling setting. Adding a similar end-to-end integration test would be nice too.

@isodude
Copy link
Contributor Author

isodude commented Jan 3, 2023

Hi Josef,

What do you mean by "it will throw an error"? Doesn't commit_io_event_extent just bail?

True, bad wording on my part. I regard the ldout as an error in this case, it should be worded as "logs a warning and returns without completing the commit".

  if (!event.pending_extents.empty()) {
    ldout(cct, 20) << this << " " << __func__ << ": "
                   << "pending extents: " << event.pending_extents << dendl;
    return;
  }
  complete_event(it, event.ret_val);

This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.

Am I understanding correctly that the root cause has nothing to do with the consumer/replayer side (rbd-mirror)?

Yes.

The issue is that the journal entry doesn't get "fulfilled" and therefore marked as committed on the producer side?

complete_event is not runned on the event, I read that as that the journal entry is not fulfilled and not marked as committed on the producer side.

@isodude
Copy link
Contributor Author

isodude commented Jan 3, 2023

This should be backported to nautilus+.

Nautilus is EOL. I'll adjust the tracker ticket to make sure the fix gets backported to supported releases.

Ok. No problem in supplying a patch for others to implement then I guess? (for those running nautilus). This is a pretty major problem for those running journaling. Since in the end you might loose data on this bug. Case in point:

My test-vm that I ran the compilation on had the bug, and I did not want to wait 24h+ for the journaling to complete so I kill -9 the KVM process. The resulting filesystem was not possible to rescue and I had to wipe and disk and restart.
A kill on the process should be enough and JournalPlayer should stop replaying and end the process in a safe place.
So right now this is a potential data-destroyer bug. I get that nautilus is not supported, but it's never good to have that kind of bugs laying around.

I think I have an idea for tests.

I'd look in the direction of replicating TestMockIoImageRequest.DiscardGranularity mock test in the journaling setting. Adding a similar end-to-end integration test would be nice too.

Thanks, I'll take a look.

@isodude isodude force-pushed the wip-librbd-misalign-discard branch 2 times, most recently from 7fe3246 to e464a96 Compare January 3, 2023 14:59
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
@idryomov
Copy link
Contributor

idryomov commented Jan 3, 2023

it should be worded as "logs a warning and returns without completing the commit".

It's not even a warning -- just an informational debug message. It's expected when an image extent spans an object boundary.

@idryomov
Copy link
Contributor

idryomov commented Jan 3, 2023

This is a pretty major problem for those running journaling. Since in the end you might loose data on this bug.

So far I don't see how one might lose data due to this bug. It only affects discards and the failure mode should be basically just a hang. AFAIU nothing bad is written to the image. Am I mistaken?

My test-vm that I ran the compilation on had the bug, and I did not want to wait 24h+ for the journaling to complete so I kill -9 the KVM process. The resulting filesystem was not possible to rescue and I had to wipe and disk and restart.
A kill on the process should be enough and JournalPlayer should stop replaying and end the process in a safe place. So right now this is a potential data-destroyer bug.

Did you encounter the bad image on the producer side or on the consumer side? You seemed to confirm above that rbd-mirror isn't in the picture but now mention JournalPlayer.

@idryomov
Copy link
Contributor

idryomov commented Jan 3, 2023

I get that nautilus is not supported, but it's never good to have that kind of bugs laying around.

I don't think it's possible to produce a new nautilus release upstream at this point -- all the infrastructure has moved on. Given that there appears to be a straightforward workaround (setting rbd_skip_partial_discard to false), realistically the best we can do is try to raise awareness among journal-based mirroring users.

@idryomov idryomov requested a review from trociny January 3, 2023 17:57
@isodude
Copy link
Contributor Author

isodude commented Jan 3, 2023

This is a pretty major problem for those running journaling. Since in the end you might loose data on this bug.

So far I don't see how one might lose data due to this bug. It only affects discards and the failure mode should be basically just a hang. AFAIU nothing bad is written to the image. Am I mistaken?

You are not mistaken. Nothing bad is written.

My test-vm that I ran the compilation on had the bug, and I did not want to wait 24h+ for the journaling to complete so I kill -9 the KVM process. The resulting filesystem was not possible to rescue and I had to wipe and disk and restart.

A kill on the process should be enough and JournalPlayer should stop replaying and end the process in a safe place. So right now this is a potential data-destroyer bug.

Did you encounter the bad image on the producer side or on the consumer side?

On the producer side, the consumer side is fine.

You seemed to confirm above that rbd-mirror isn't in the picture but now mention JournalPlayer.

Initially when a client opens an image it will run JournalPlayer that will check for journal entries that are not commited to the rbd image. I will try my best do describe the scenario again.

  • I enable journaling feature on the rbd images of a VM.
  • Everything starts to mirror, as expected.
  • The bug occurs and local replay stops but remote replay works as expected.
  • Weeks pass and since ceph health is green no one digs and comitted status posted by 'rbd journal status' that the minimum set is way lower than active set. Here it's also visible that the remote site is fine but the local side lags.
  • Someone does a poweroff of the VM
  • Upon start the VM will go through all journal entries that has not yet been committed to the rbd image.
  • In the case of Proxmox, execution of the VM will time out since the QEMU process just sits there and replays journal entries. To make mattera worse there are actually two QEMU processes which leads the operator to believe that the VM is running twice and that data corruption is upon them.
  • Since neither the customer nor operator scheduled a longer downtime than 1 minute and they are now up to 30min and the VM still does not respond a kill order is issued.
  • The operator now starts the VM again and the same pattern repeats as above.
  • Hopefully the journal is replayed and the VM starts up, but by now the VM will not start since the filesystems that was replaying while kill -9 was issued is broken, they run repair tools but in the end a backup is needed because the damage is too much.
  • In another scenario some wise person disables the journaling feature on the rbd images and the journal entries are replayed in full speed compared to crawling speed QEMU manage, the VM starts but they still need to do a backup restore in the end.

I actually waited out a dev VM that had this bug, just to see how long it would take, 24h later it was booted :)

I realize that Proxmox should detect this scenario and opt for fixing the journal prior to run the QEMU process, I'm working on a patch for that. Optimally this is a CEPH Health error.

@isodude
Copy link
Contributor Author

isodude commented Jan 5, 2023

I'd look in the direction of replicating TestMockIoImageRequest.DiscardGranularity mock test in the journaling setting. Adding a similar end-to-end integration test would be nice too.

Thanks, I'll take a look.

I'm trying out the below, but I always fail to access the protected m_image_extents, not sure how to do that properly?

diff --git a/src/test/librbd/io/test_mock_ImageRequest.cc b/src/test/librbd/io/test_mock_ImageRequest.cc
index e2ac825630c..23cf0e49a80 100644
--- a/src/test/librbd/io/test_mock_ImageRequest.cc
+++ b/src/test/librbd/io/test_mock_ImageRequest.cc
@@ -57,7 +57,6 @@ inline ImageCtx *get_image_ctx(MockTestImageCtx *image_ctx) {
 
 namespace librbd {
 namespace io {
-
 namespace util {
 
 template <>
@@ -83,6 +82,14 @@ std::pair<Extents, ImageArea> object_to_area_extents(
 
 } // namespace util
 
+template <>
+class ImageDiscardRequest<librbd::MockTestImageCtx> {
+  public:
+  Extents get_image_extents() {
+    return this->m_image_extents;
+  }
+};
+
 using ::testing::_;
 using ::testing::InSequence;
 using ::testing::Invoke;
@@ -96,6 +103,7 @@ struct TestMockIoImageRequest : public TestMockFixture {
   typedef ImageReadRequest<librbd::MockTestImageCtx> MockImageReadRequest;
   typedef ImageWriteRequest<librbd::MockTestImageCtx> MockImageWriteRequest;
   typedef ImageDiscardRequest<librbd::MockTestImageCtx> MockImageDiscardRequest;
+
   typedef ImageFlushRequest<librbd::MockTestImageCtx> MockImageFlushRequest;
   typedef ImageWriteSameRequest<librbd::MockTestImageCtx> MockImageWriteSameRequest;
   typedef ImageCompareAndWriteRequest<librbd::MockTestImageCtx> MockImageCompareAndWriteRequest;
@@ -367,6 +375,44 @@ TEST_F(TestMockIoImageRequest, DiscardGranularity) {
   ASSERT_EQ(0, aio_comp_ctx.wait());
 }
 
+TEST_F(TestMockIoImageRequest, DiscardGranularityJournalAppendDisabled) {
+  REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);
+
+  librbd::ImageCtx *ictx;
+  ASSERT_EQ(0, open_image(m_image_name, &ictx));
+  ASSERT_EQ(0, resize(ictx, ictx->layout.object_size));
+  ictx->discard_granularity_bytes = 32;
+
+  MockTestImageCtx mock_image_ctx(*ictx);
+  MockTestJournal mock_journal;
+  mock_image_ctx.journal = &mock_journal;
+
+  InSequence seq;
+  expect_get_modify_timestamp(mock_image_ctx, false);
+  expect_is_journal_appending(mock_journal, false);
+  Extents extents = {{32,32}, {96,64}, {ictx->layout.object_size - 32, 32}};
+  for (auto extent : extents) {
+    expect_object_discard_request(mock_image_ctx, 0, extent.first, extent.second, 0);
+  }
+
+  C_SaferCond aio_comp_ctx;
+  AioCompletion *aio_comp = AioCompletion::create_and_start(
+    &aio_comp_ctx, ictx, AIO_TYPE_DISCARD);
+  MockImageDiscardRequest mock_aio_image_discard(
+    mock_image_ctx, aio_comp,
+    {{16, 63}, {96, 31}, {84, 100}, {ictx->layout.object_size - 33, 33}},
+    ImageArea::DATA, ictx->discard_granularity_bytes,
+    mock_image_ctx.get_data_io_context(), {});
+  {
+    std::shared_lock owner_locker{mock_image_ctx.owner_lock};
+    mock_aio_image_discard.send();
+  }
+  ASSERT_EQ(0, aio_comp_ctx.wait());
+  ASSERT_EQ(extents, mock_aio_image_discard.get_image_extents());
+
+  ASSERT_EQ(0, mock_journal.is_journal_ready());
+}
+
 TEST_F(TestMockIoImageRequest, AioWriteJournalAppendDisabled) {
   REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);

@trociny
Copy link
Contributor

trociny commented Jan 6, 2023

I don't think it's possible to produce a new nautilus release upstream at this point -- all the infrastructure has moved on. Given that there appears to be a straightforward workaround (setting rbd_skip_partial_discard to false), realistically the best we can do is try to raise awareness among journal-based mirroring users.

Another workaround coud be to set rbd_discard_granularity_bytes to 0 to restore the pre 7ca1bab behaviour.

@trociny
Copy link
Contributor

trociny commented Jan 6, 2023

I'm trying out the below, but I always fail to access the protected m_image_extents, not sure how to do that properly?

I think you could add ImageRequest::get_image_extents to librbd/io/ImageRequest.h, something like:

// for tests
inline Extents &get_image_extents()  const{
    return this->m_image_extents;
}

On the other hand I am not sure we have to check the image extents in the test. What we want to test that the commiting the journal works as expected with a specific request, right?

@isodude
Copy link
Contributor Author

isodude commented Jan 6, 2023

On the other hand I am not sure we have to check the image extents in the test. What we want to test that the commiting the journal works as expected with a specific request, right?

I know there is mocks for this in mock_test_journal, maybe have it there as well. I wanted a check directly in ImageRequest that does not use journal. Maybe even just extend the granulaity checks only. Since the file does not have any good journal support I think it would make sense to avoid it.

Also, mock_test_journal does not seem to have a way to look at each extent in the journal, only length. I'm looking at support structure there as well.

@isodude
Copy link
Contributor Author

isodude commented Jan 6, 2023

I don't think it's possible to produce a new nautilus release upstream at this point -- all the infrastructure has moved on. Given that there appears to be a straightforward workaround (setting rbd_skip_partial_discard to false), realistically the best we can do is try to raise awareness among journal-based mirroring users.

Another workaround coud be to set rbd_discard_granularity_bytes to 0 to restore the pre 7ca1bab behaviour.

This is in escense what rbd_skip_partial_discard does. Deprecating that one would make sense though, maybe add an info saying to set rbd_discard_granularity instead.

@trociny
Copy link
Contributor

trociny commented Jan 6, 2023

I know there is mocks for this in mock_test_journal, maybe have it there as well. ...

I think @idryomov's idea was just to take the TestMockIoImageRequest.DiscardGranularity as the base. In that test the journal is set to null. So you may either extend that test by setting the journal to the mock, or create a new one with a name like DiscardGranularityJournallingEnabled where you set the journal mock. And then you will need to add expect_is_journal_appending(mock_journal, true). Note, you should make the is_journal_appending mock return true, so the discard request will call ImageWriteRequest::append_journal_event, and consequently image_ctx.journal->append_write_event. Thus you will need to implement expect_journal_append_write_event where you may check what extents are passed to the journal.

@isodude
Copy link
Contributor Author

isodude commented Jan 9, 2023

I know there is mocks for this in mock_test_journal, maybe have it there as well. ...

I think @idryomov's idea was just to take the TestMockIoImageRequest.DiscardGranularity as the base. In that test the journal is set to null. So you may either extend that test by setting the journal to the mock, or create a new one with a name like DiscardGranularityJournallingEnabled where you set the journal mock. And then you will need to add expect_is_journal_appending(mock_journal, true). Note, you should make the is_journal_appending mock return true, so the discard request will call ImageWriteRequest::append_journal_event, and consequently image_ctx.journal->append_write_event. Thus you will need to implement expect_journal_append_write_event where you may check what extents are passed to the journal.

The code seems to be working, however, when running unittest_librbd with RBD_FEATURES=125 (to enable journaling in features), tests with journal stops working.

How should I run the unittests from commandline successfully? I tried to understand how the unittests are runned, but it's not entirely clear.

diff --git a/src/test/librbd/io/test_mock_ImageRequest.cc b/src/test/librbd/io/test_mock_ImageRequest.cc
index e2ac825630c..2764c379994 100644
--- a/src/test/librbd/io/test_mock_ImageRequest.cc
+++ b/src/test/librbd/io/test_mock_ImageRequest.cc
@@ -18,6 +18,7 @@ struct MockTestImageCtx;
 struct MockTestJournal : public MockJournal {
   MOCK_METHOD4(append_write_event, uint64_t(uint64_t, size_t,
                                             const bufferlist &, bool));
+
   MOCK_METHOD5(append_compare_and_write_event, uint64_t(uint64_t, size_t,
                                                         const bufferlist &,
                                                         const bufferlist &,
@@ -32,6 +33,12 @@ struct MockTestJournal : public MockJournal {
                                 filter_ret_val);
   }
 
+  bool appending = false;
+
+  bool is_journal_appending() const {
+    return appending;
+  }
+
   MOCK_METHOD2(commit_io_event, void(uint64_t, int));
 };
 
@@ -57,7 +64,6 @@ inline ImageCtx *get_image_ctx(MockTestImageCtx *image_ctx) {
 
 namespace librbd {
 namespace io {
 namespace util {
 
 template <>
@@ -84,10 +90,12 @@ std::pair<Extents, ImageArea> object_to_area_extents(
 } // namespace util
 
 using ::testing::_;
+using ::testing::DoAll;
 using ::testing::InSequence;
 using ::testing::Invoke;
 using ::testing::Return;
 using ::testing::WithArg;
+using ::testing::WithArgs;
 using ::testing::WithoutArgs;
 using ::testing::Exactly;
 
@@ -119,6 +127,19 @@ struct TestMockIoImageRequest : public TestMockFixture {
     }
   }
 
+  void expect_append_io_event(MockTestJournal &mock_journal, uint64_t journal_tid,
+                                     uint64_t offset,
+                                     size_t length) {
+    EXPECT_CALL(mock_journal, append_io_event_mock(_, _, _, _, _)).WillOnce(DoAll(
+                     WithArgs<1, 2>(Invoke([offset, length] (long unsigned int entry_offset, long unsigned int entry_length) {
+
+                           ASSERT_EQ(offset, entry_offset);
+                           ASSERT_EQ(length, entry_length);
+                       })),
+                     Return(journal_tid)
+                     ));
+  }
+
   void expect_object_discard_request(MockTestImageCtx &mock_image_ctx,
                                      uint64_t object_no, uint64_t offset,
                                      uint32_t length, int r) {
@@ -367,6 +388,46 @@ TEST_F(TestMockIoImageRequest, DiscardGranularity) {
   ASSERT_EQ(0, aio_comp_ctx.wait());
 }
 
+TEST_F(TestMockIoImageRequest, DiscardGranularityJournalAppendEnabled) {
+  REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);
+
+  librbd::ImageCtx *ictx;
+  ASSERT_EQ(0, open_image(m_image_name, &ictx));
+  ASSERT_EQ(0, resize(ictx, ictx->layout.object_size));
+  ictx->discard_granularity_bytes = 32;
+
+  MockTestImageCtx mock_image_ctx(*ictx);
+  MockTestJournal mock_journal;
+  mock_image_ctx.journal = &mock_journal;
+  mock_journal.appending = true;
+
+  InSequence seq;
+  expect_get_modify_timestamp(mock_image_ctx, false);
+  expect_is_journal_appending(mock_journal, true);
+  expect_object_discard_request(mock_image_ctx, 0, 32, 32, 0);
+  expect_append_io_event(mock_journal, 0, 32, 32);
+  expect_object_discard_request(mock_image_ctx, 0, 96, 64, 0);
+  expect_append_io_event(mock_journal, 1, 96, 64);
+  expect_object_discard_request(mock_image_ctx, 0, ictx->layout.object_size - 32, 32, 0);
+  expect_append_io_event(mock_journal, 2, ictx->layout.object_size - 32, 32);
+
+  C_SaferCond aio_comp_ctx;
+  AioCompletion *aio_comp = AioCompletion::create_and_start(
+    &aio_comp_ctx, ictx, AIO_TYPE_DISCARD);
+  MockImageDiscardRequest mock_aio_image_discard(
+    mock_image_ctx, aio_comp,
+    {{16, 63}, {96, 31}, {84, 100}, {ictx->layout.object_size - 33, 33}},
+    ImageArea::DATA, ictx->discard_granularity_bytes,
+    mock_image_ctx.get_data_io_context(), {});
+  {
+    std::shared_lock owner_locker{mock_image_ctx.owner_lock};
+    mock_aio_image_discard.send();
+  }
+  ASSERT_EQ(0, aio_comp_ctx.wait());
+}
+
 TEST_F(TestMockIoImageRequest, AioWriteJournalAppendDisabled) {
   REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);

@trociny
Copy link
Contributor

trociny commented Jan 9, 2023

I recommend just to add your test to your commit. It will be easier to comment on it.

+  void expect_append_io_event(MockTestJournal &mock_journal, uint64_t journal_tid,
+                                     uint64_t offset,
+                                     size_t length) {
+    EXPECT_CALL(mock_journal, append_io_event_mock(_, _, _, _, _)).WillOnce(DoAll(
+                     WithArgs<1, 2>(Invoke([offset, length] (long unsigned int entry_offset, long unsigned int entry_length) {
+
+                           ASSERT_EQ(offset, entry_offset);
+                           ASSERT_EQ(length, entry_length);
+                       })),
+                     Return(journal_tid)
+                     ));
+  }
+

Why do you need this? Doesn't expect_is_journal_appending(mock_journal, true) in your test do the same?

+TEST_F(TestMockIoImageRequest, DiscardGranularityJournalAppendEnabled) {
+  REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);

With this the test is run only when RBD_FEATURE_JOURNALING is set in RBD_FEATURES environment variable. So no wonder the test does not fail when the test is run without it, as it is just skipped.

How should I run the unittests from commandline successfully? I tried to understand how the unittests are runned, but it's not entirely clear.

Not sure I understand the question. Do you mean how to run a particular unit test with particular RBD_FEATURES from a command line? I would use something like below (from the build directory):

CEPH_ARGS='--debug-rbd=30' RBD_FEATURES=125 ./bin/unittest_librbd --gtest_filter=TestMockIoImageRequest.DiscardGranularityJournalAppendEnabled

Note, I added debug-rbd so you could find the test log in out/client.admin.*.log file in case it fails and you need to troubleshoot.

@isodude
Copy link
Contributor Author

isodude commented Jan 9, 2023

I recommend just to add your test to your commit. It will be easier to comment on it.

+  void expect_append_io_event(MockTestJournal &mock_journal, uint64_t journal_tid,
+                                     uint64_t offset,
+                                     size_t length) {
+    EXPECT_CALL(mock_journal, append_io_event_mock(_, _, _, _, _)).WillOnce(DoAll(
+                     WithArgs<1, 2>(Invoke([offset, length] (long unsigned int entry_offset, long unsigned int entry_length) {
+
+                           ASSERT_EQ(offset, entry_offset);
+                           ASSERT_EQ(length, entry_length);
+                       })),
+                     Return(journal_tid)
+                     ));
+  }
+

Why do you need this? Doesn't expect_is_journal_appending(mock_journal, true) in your test do the same?

+TEST_F(TestMockIoImageRequest, DiscardGranularityJournalAppendEnabled) {
+  REQUIRE_FEATURE(RBD_FEATURE_JOURNALING);

With this the test is run only when RBD_FEATURE_JOURNALING is set in RBD_FEATURES environment variable. So no wonder the test does not fail when the test is run without it, as it is just skipped.

How should I run the unittests from commandline successfully? I tried to understand how the unittests are runned, but it's not entirely clear.

Not sure I understand the question. Do you mean how to run a particular unit test with particular RBD_FEATURES from a command line? I would use something like below (from the build directory):

CEPH_ARGS='--debug-rbd=30' RBD_FEATURES=125 ./bin/unittest_librbd --gtest_filter=TestMockIoImageRequest.DiscardGranularityJournalAppendEnabled

Note, I added debug-rbd so you could find the test log in out/client.admin.*.log file in case it fails and you need to troubleshoot.

Thanks!
I've been hitting the bricks on understanding Google Test here, but it seems to be working more in my favor now. Your feedback has been really helpful.

I have successfully compiled and tested the new commit. Ready for review.

One thing that I could not properly manage to get rid of is that the test stalls if it's not getting the expected calls, maybe this is expected?

There's a way to end-to-end test this by simply running blkdiscard on a nbd-device. Not sure if there's support for that in the infrastructure though?

To enable appending I mocked journal_policy inside ImageCtx, this way it would return true, but also other tests would not fail.

@isodude
Copy link
Contributor Author

isodude commented Jan 9, 2023

Before I modify the Title, I'm suggesting to change it to something along the line of
librbd: Fix stalled local rbd journaling replay or
librbd: Fix local rbd mirror journals growing forever

To make it easier for folks reading the changelogs to see that the issue is solved.

@isodude isodude changed the title librbd: Update m_image_extents after prune_object_extents librbd: Fix local rbd mirror journals growing forever Jan 10, 2023
@trociny
Copy link
Contributor

trociny commented Jan 10, 2023

I think you could add your test to qa/workunits/rbd/journal.sh or (less preferable) qa/workunits/rbd/rbd-nbd.sh though I would leave it to @idryomov to decide if we want this.

@idryomov
Copy link
Contributor

I think you could add your test to qa/workunits/rbd/journal.sh or (less preferable) qa/workunits/rbd/rbd-nbd.sh though I would leave it to @idryomov to decide if we want this.

I agree, a test based on #49614 (comment) should go to qa/workunits/rbd/rbd-nbd.sh but note that it would need to be considerably massaged first: get rid of nested functions, integrate trap-based cleanup into the global cleanup, etc. The whole "make sure that there is no race with rbd-nbd mounting" business could probably be simplified or even removed too since we assume fairly modern distros in the test lab environment.

@isodude
Copy link
Contributor Author

isodude commented Jan 11, 2023

I think you could add your test to qa/workunits/rbd/journal.sh or (less preferable) qa/workunits/rbd/rbd-nbd.sh though I would leave it to @idryomov to decide if we want this.

I agree, a test based on #49614 (comment) should go to qa/workunits/rbd/rbd-nbd.sh but note that it would need to be considerably massaged first: get rid of nested functions, integrate trap-based cleanup into the global cleanup, etc. The whole "make sure that there is no race with rbd-nbd mounting" business could probably be simplified or even removed too since we assume fairly modern distros in the test lab environment.

Done. With the rbd-nbd.sh support structure it was quite easy to adopt the test.

I run my tests inside a podman running fedora 37 with debian 11 as a host os, rbd-nbd list-mapped does not work here since pid inside /sys/.../nbd0/pid shows the pid in the default pid namespace (the host). I worked around that with the below, but it only got my test working. Not sure if it's a bug or not supported case by the linux kernel (the linux kernel does a task_pid_nr(current) for pid and does not care about namespaces at all).

@@ -105,8 +106,19 @@ function get_pid()
     local pool=$1
     local ns=$2
 
-    PID=$(rbd device --device-type nbd --format xml list | $XMLSTARLET sel -t -v \
-      "//devices/device[pool='${pool}'][namespace='${ns}'][image='${IMAGE}'][device='${DEV}']/id")
+
+
+    for pid in `pgrep -f rbd-nbd`; do
+      lsof -p ${pid} | grep -q ${DEV} || continue
+      arg=`< /proc/${pid}/cmdline tr '\0' '\n' | tail -n 1`
+      [ "${arg%%/*}" == "${pool}" ] || continue
+      [ "${arg##*/}" == "${IMAGE}" ] || continue
+      if [ -n "${ns}" ]; then
+       arg=${arg#*/}
+        [ "${arg%%/*}" == "${ns}" ] || continue
+      fi
+      PID=${pid}
+    done

qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Show resolved Hide resolved
@isodude
Copy link
Contributor Author

isodude commented Jan 11, 2023

Is this a valid failure? Maybe racey since it succeded earlier and failed even though I did not touch that code in the last commit.

[ RUN      ] TestMigration.StressLive
resize: 83886080
write: A 51957577~2427788
discard: 18556109~3632061
write: A 60731795~6824484
discard: 6002006~7511550
write: A 29891569~2151167
discard: 78556247~5329833
write: A 38631908~4093481
discard: 31550~3001716
snap_create: snap0
snap_protect: snap0
clone image1320 -> image1323
resize: 104501499
write: B 99247878~5253621
discard: 85139685~5322170
write: B 63812109~6274495
discard: 89641894~1403317
write: B 29923981~1567749
discard: 55845811~4627598
write: B 9814537~7463138
discard: 61710286~5585214
snap_create: snap1
resize: 173011133
write: C 156564189~2302802
discard: 159687876~4702111
write: C 95862563~5431995
discard: 160488953~4321503
write: C 50273036~438535
discard: 102850696~2296335
write: C 8007823~7284347
discard: 96887524~1210301
snap_create: snap2
snap_protect: snap2
clone image1323 -> image1325
resize: 106679449
write: D 130628~5897018
discard: 25860558~1602598
write: D 51769926~2423844
discard: 5296541~928
write: D 23840003~1530552
discard: 62429599~7217204
write: D 73142895~6993134
discard: 94868151~6209030
snap_create: snap3
resize: 90801047
migration_prepare
migration_execute
write: a 80964454~5665008
discard: 5353373~4200783
write: a 15792126~2942680
discard: 48476138~4399668
write: a 64812341~6649061
discard: 24867686~3055923
write: a 8400774~4620898
discard: 1768819~4818907
snap_create: user0
resize: 149576013
write: b 68547957~6757835
discard: 110817510~7026588
write: b 24141748~4399504
discard: 48074362~1664085
write: b 48550333~6353930
discard: 94268119~1818709
write: b 120917393~1682695
discard: 137513300~3247279
snap_create: user1
write: c 120994745~6189959
discard: 95897908~3067514
write: c 35470804~7240454
discard: 13294626~5010256
write: c 50479909~6412293
discard: 100018120~6887063
write: c 71228333~6098719
discard: 45474015~3149819
snap_create: user2
write: d 74789387~6209378
discard: 27613519~2777371
write: d 76365143~3879328
discard: 140040743~7386101
write: d 141992572~5324030
discard: 46507512~4793147
write: d 93147775~6563667
discard: 47929125~6372661
snap_create: user3
write: 1 14957601~3739400
discard: 15892451~1869700
write: 2 29915202~3739400
discard: 30850052~1869700
write: 3 44872803~3739400
discard: 45807653~1869700
write: 4 59830405~3739400
discard: 60765255~1869700
write: 5 74788006~3739400
discard: 75722856~1869700
migration_commit
after commit snap: snap3, block 96468992~4194304 differs
src block: 51854dc1f5f: 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
003ffff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00400000
dst block: 518bbfc5fd9: 
00000000  43 43 43 43 43 43 43 43  43 43 43 43 43 43 43 43  |CCCCCCCCCCCCCCCC|
*
000662e0  43 43 43 43 00 00 00 00  00 00 00 00 00 00 00 00  |CCCC............|
000662f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0018daa0  00 43 43 43 43 43 43 43  43 43 43 43 43 43 43 43  |.CCCCCCCCCCCCCCC|
0018dab0  43 43 43 43 43 43 43 43  43 43 43 43 43 43 43 43  |CCCCCCCCCCCCCCCC|
*
003ffff0  43 43 43 43 43 43 43 43  43 43 43 43 43 43 43 43  |CCCCCCCCCCCCCCCC|
00400000
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/test_Migration.cc:150: Failure
Value of: src_bl.contents_equal(dst_bl)
  Actual: false
Expected: true

@isodude
Copy link
Contributor Author

isodude commented Jan 11, 2023

I was wrong in the commit message.

rbd_discard_granularity_bytes cannot be 0. (https://github.com/ceph/ceph/blob/main/src/common/options/rbd.yaml.in#L311)

skip_partial_discard = false will set it to 0 internally (https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L867)

@isodude
Copy link
Contributor Author

isodude commented Jan 12, 2023

I created new trackers for related problems
https://tracker.ceph.com/issues/58423 https://tracker.ceph.com/issues/58422

src/librbd/io/ImageRequest.cc Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
This commit fixes commit 7ca1bab by pushing properly aligned
discards back to m_image_extents, if corrected.

If discards are misaligned (off 0, len 4608, gran=4096), they are
corrected properly, but only in object_extents and not in
m_image_extents.

When journal_append_event is triggered it will only append from
m_image_extents and does not now about the alignment fixes. In
commit_io_events_extent it will log a message and return without
completing the io since the larger misaligned area was sent to the journal.
This will in turn break rbd journal mirroring since the local client will wait
indefinately on the commit to be completed, which it never does.

This does not effect rbd-mirror in any way, which may be confusing and
dangerous since it's only rbd-mirror that updates ceph health, and not
the local client.

Setting `rbd_skip_partial_discard = false` under client will restore the
pre 7ca1bab behaviour and thus not trigger the bug with journals growing.
This will set `rbd_discard_granularity_bytes = 0` internally. This
setting is only changed during startup of a client.

Fixes: 7ca1bab
Fixes: https://tracker.ceph.com/issues/57396
Signed-off-by: Josef Johansson <josef@oderland.se>
@isodude
Copy link
Contributor Author

isodude commented Jan 20, 2023

Rebase against main also.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, Josef! I hit the hang (rbd feature disable issue) but attributed it to the on-going lab breakage initially...

A few things need to be addressed. If you no longer have the development setup, let me know -- I'd be happy to incorporate the changes and move this PR forward on your behalf. Thanks!

src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
src/test/librbd/io/test_mock_ImageRequest.cc Outdated Show resolved Hide resolved
qa/workunits/rbd/rbd-nbd.sh Outdated Show resolved Hide resolved
@isodude
Copy link
Contributor Author

isodude commented Feb 16, 2023

Apologies for the delay, Josef! I hit the hang (rbd feature disable issue) but attributed it to the on-going lab breakage initially...

A few things need to be addressed. If you no longer have the development setup, let me know -- I'd be happy to incorporate the changes and move this PR forward on your behalf. Thanks!

LGTM

Do go ahead. Let's close this, I might find more bugs if I recommit :)

Thanks for reviewing!

I'll hand you my bugswatter!

Currently nothing triggers the length_modified case in
ImageDiscardRequest::prune_object_extents() in isolation. It's only
triggered in DiscardGranularityJournalAppendEnabled test together with
the prune_required case and a bad refactoring could easily break the
length_modified logic again.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"rbd feature disable" appears to reliably hang if the corresponding
remote request is proxied to rbd-nbd (because rbd-nbd happens to own
the exclusive lock after a series of blkdiscard calls) [1].  Work
around it here by enabling journaling before the image is mapped
and disabling it after the image is unmapped.

Also, don't assert on the output of "rbd journal inspect --verbose"
having a certain number of entries.  This is racy: if the script gets
delayed after the last blkdiscard call for some reason, there may be
fewer entries present in the journal or none at all.

[1] https://tracker.ceph.com/issues/58740

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor

jenkins test make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants