librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

idryomov · 2021-05-23T19:13:46Z

This manifested as sporadic segfaults during 4K randwrite workload with QD=2 and higher but turned out to be much more serious. When it segfaulted the cache's root was already corrupted on media, resulting in data loss.

idryomov · 2021-05-23T20:09:08Z

cc @MahatiC

idryomov · 2021-05-23T20:09:20Z

cc @CongMinYin

idryomov · 2021-05-24T17:12:49Z

jenkins test api

MahatiC · 2021-05-25T02:07:35Z

@idryomov Hi Ilya, I'm trying to reproduce this error locally and apply this PR on it. Will leave a review after that. Thanks.

MahatiC · 2021-05-26T13:20:30Z

lgtm, thanks!

In append_ops(), new_first_free_entry is assigned to after aio_submit() is called. This can result in accessing uninitialized or freed memory because all I/Os may complete and append_ctx callback may run before the assignment is executed. Garbage value gets written to first_free_entry and we eventually crash, most likely in bufferlist manipulation code. But worse, the corrupted first_free_entry makes it to media almost all the time. The result is a corrupted cache -- dirty user data is lost. Fixes: https://tracker.ceph.com/issues/50832 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

In append_op_log_entries(), new_first_free_entry is read after append_ops() returns. This can result in accessing freed memory because all I/Os may complete and append_ctx callback may run by the time new_first_free_entry is read. Garbage value gets written to m_first_free_entry and depending on the circumstances it may allow AbstractWriteLog code to accept more dirty user data than we have space for. Luckily we usually crash before then. Fixes: https://tracker.ceph.com/issues/50832 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Ensure first_{valid,free}_entry are inside the expected range when scheduling root updates and decoding the root on recovery. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

idryomov · 2021-05-29T16:54:37Z

Sorry, there was a bug in write_log_entries() -- I tried to keep the change as small as possible and confused myself. The first patch is now larger but it also eradicates some of pool_root usage (which should go away anyway).

MahatiC · 2021-05-31T11:45:11Z

Thanks Ilya for the changes. Could you let us know the specific test that triggered this bug? (with the first revision of this PR)

idryomov · 2021-05-31T13:45:45Z

There wasn't any specific test failure. I'm working through ssd mode issues and was looking into https://tracker.ceph.com/issues/50670 when I realized that my fix for the tail pointer was incomplete.

MahatiC · 2021-06-07T08:23:33Z

Thanks for the change, looks good to me.

idryomov · 2021-06-23T07:52:39Z

@trociny This isn't ready for full qa because of another issue. I'm holding off on it for now.

idryomov · 2021-07-14T09:52:33Z

The crashes that appeared to be triggered by this PR turned out to be an existing use-after-free bug, fixed in #42145.

idryomov · 2021-07-14T09:52:58Z

https://pulpito.ceph.com/dis-2021-07-13_14:24:34-rbd-wip-dis-testing-distro-basic-smithi/

idryomov added bug-fix rbd labels May 23, 2021

idryomov added 4 commits May 29, 2021 18:44

librbd/cache/pwl/ssd: ensure first_{valid,free}_entry aren't bogus

b46f81f

Ensure first_{valid,free}_entry are inside the expected range when scheduling root updates and decoding the root on recovery. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

librbd/cache/pwl/ssd: flushed_sync_gen capture is unused

f8fb760

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

idryomov force-pushed the wip-rbd-pwl-ssd-tailp branch from e1e6f46 to f8fb760 Compare May 29, 2021 16:50

trociny approved these changes Jun 23, 2021

View reviewed changes

trociny added needs-qa wip-mgolub-testing labels Jun 23, 2021

trociny removed needs-qa wip-mgolub-testing labels Jun 23, 2021

CongMinYin mentioned this pull request Jun 24, 2021

librbd/cache/pwl/ssd: fix m_bytes_allocated exceeding m_bytes_allocated_cap #41968

Merged

3 tasks

idryomov merged commit 89caa62 into ceph:master Jul 14, 2021

idryomov deleted the wip-rbd-pwl-ssd-tailp branch July 14, 2021 09:54

ideepika mentioned this pull request Nov 2, 2021

pacific: librbd/cache/pwl: persistant cache backports #43772

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

idryomov commented May 23, 2021

idryomov commented May 23, 2021

idryomov commented May 23, 2021

idryomov commented May 24, 2021

MahatiC commented May 25, 2021

MahatiC commented May 26, 2021

idryomov commented May 29, 2021 •

edited

MahatiC commented May 31, 2021

idryomov commented May 31, 2021

MahatiC commented Jun 7, 2021

idryomov commented Jun 23, 2021

idryomov commented Jul 14, 2021

idryomov commented Jul 14, 2021

librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

Conversation

idryomov commented May 23, 2021

idryomov commented May 23, 2021

idryomov commented May 23, 2021

idryomov commented May 24, 2021

MahatiC commented May 25, 2021

MahatiC commented May 26, 2021

idryomov commented May 29, 2021 • edited

MahatiC commented May 31, 2021

idryomov commented May 31, 2021

MahatiC commented Jun 7, 2021

idryomov commented Jun 23, 2021

idryomov commented Jul 14, 2021

idryomov commented Jul 14, 2021

idryomov commented May 29, 2021 •

edited