Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd/cache/pwl/ssd: fix first_free_entry and m_first_free_entry corruption #41490

Merged
merged 4 commits into from Jul 14, 2021

Conversation

idryomov
Copy link
Contributor

This manifested as sporadic segfaults during 4K randwrite workload with QD=2 and higher but turned out to be much more serious. When it segfaulted the cache's root was already corrupted on media, resulting in data loss.

@idryomov
Copy link
Contributor Author

cc @MahatiC

@idryomov
Copy link
Contributor Author

cc @CongMinYin

@idryomov
Copy link
Contributor Author

jenkins test api

@MahatiC
Copy link
Contributor

MahatiC commented May 25, 2021

@idryomov Hi Ilya, I'm trying to reproduce this error locally and apply this PR on it. Will leave a review after that. Thanks.

@MahatiC
Copy link
Contributor

MahatiC commented May 26, 2021

lgtm, thanks!

In append_ops(), new_first_free_entry is assigned to after aio_submit()
is called.  This can result in accessing uninitialized or freed memory
because all I/Os may complete and append_ctx callback may run before the
assignment is executed.  Garbage value gets written to first_free_entry
and we eventually crash, most likely in bufferlist manipulation code.

But worse, the corrupted first_free_entry makes it to media almost all
the time.   The result is a corrupted cache -- dirty user data is lost.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In append_op_log_entries(), new_first_free_entry is read after
append_ops() returns.  This can result in accessing freed memory
because all I/Os may complete and append_ctx callback may run
by the time new_first_free_entry is read.  Garbage value gets
written to m_first_free_entry and depending on the circumstances
it may allow AbstractWriteLog code to accept more dirty user data
than we have space for.  Luckily we usually crash before then.

Fixes: https://tracker.ceph.com/issues/50832
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Ensure first_{valid,free}_entry are inside the expected range when
scheduling root updates and decoding the root on recovery.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

idryomov commented May 29, 2021

Sorry, there was a bug in write_log_entries() -- I tried to keep the change as small as possible and confused myself. The first patch is now larger but it also eradicates some of pool_root usage (which should go away anyway).

@MahatiC
Copy link
Contributor

MahatiC commented May 31, 2021

Thanks Ilya for the changes. Could you let us know the specific test that triggered this bug? (with the first revision of this PR)

@idryomov
Copy link
Contributor Author

There wasn't any specific test failure. I'm working through ssd mode issues and was looking into https://tracker.ceph.com/issues/50670 when I realized that my fix for the tail pointer was incomplete.

@MahatiC
Copy link
Contributor

MahatiC commented Jun 7, 2021

Thanks for the change, looks good to me.

@idryomov
Copy link
Contributor Author

@trociny This isn't ready for full qa because of another issue. I'm holding off on it for now.

@idryomov
Copy link
Contributor Author

The crashes that appeared to be triggered by this PR turned out to be an existing use-after-free bug, fixed in #42145.

@idryomov
Copy link
Contributor Author

@idryomov idryomov merged commit 89caa62 into ceph:master Jul 14, 2021
@idryomov idryomov deleted the wip-rbd-pwl-ssd-tailp branch July 14, 2021 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants