Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries() #42843

Merged
merged 1 commit into from Aug 26, 2021

Conversation

majianpeng
Copy link
Member

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@majianpeng
Copy link
Member Author

@MahatiC please review.

src/librbd/cache/pwl/ssd/WriteLog.cc Outdated Show resolved Hide resolved
src/librbd/cache/pwl/ssd/WriteLog.cc Outdated Show resolved Hide resolved
@idryomov idryomov changed the title can't restart program w/ pwl/ssd cache. librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries() Aug 19, 2021
Copy link
Contributor

@MahatiC MahatiC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is fixing or correcting. Could you clarify? Is there a bug specific to this? If so can you add the relevant tracker ticket?

src/librbd/cache/pwl/ssd/WriteLog.cc Show resolved Hide resolved
@majianpeng
Copy link
Member Author

@idryomov . please review. Thanks!

…ries()

Consider one control_block which cotain multi encode(WriteLogCacheEntry):
Log1: WriteLogEntry
Log2: WriteLogEntry
Log3: Non-WriteLogEntry
For this case, currently calc method is: control_block_pos + sizeof(control_block).
But in fact, it should: control_block_pos + sizeof(control_block) +
data_length(Log1 + Log2).

Wrong first_valid_entry will persist to superblock and restart to read.
This cause read wrong position and when decode(WriteLogCacheEntry) it
will report bug.

Fixes: https://tracker.ceph.com/issues/52323
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
@idryomov
Copy link
Contributor

idryomov commented Aug 25, 2021

It looked good to me but I decided to tweak it bit:

  • use control_block_pos instead of retiring_subentries.back()->log_entry_index

    All entries in retiring_subentries are guaranteed to have the same log_entry_index, equal to control_block_pos.

  • apply modulo logic after each advancement of first_valid_entry so that it is always correct

    A bit more CPU cycles but less confusion in case first_valid_entry needs to be printed, etc.

  • a new log message and a formatting fixup

Let me know if you see any issues with that.

@idryomov idryomov merged commit 989e8aa into ceph:master Aug 26, 2021
@majianpeng majianpeng deleted the pwl-ssd-restart-failed branch August 28, 2021 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants