New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: core: by pass cache if performing deep scrub #24802

Merged
merged 1 commit into from Nov 5, 2018

Conversation

Projects
None yet
5 participants
@tchaikov
Contributor

tchaikov commented Oct 29, 2018

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Oct 29, 2018

@tchaikov FTBFS

@smithfarm smithfarm changed the title from luminous: by pass cache if performing deep scrub to luminous: core: by pass cache if performing deep scrub Oct 29, 2018

os/bluestore: fix deep-scrub operation againest disk silent errors
Say a object who has data caches, but in a while later, caches' underlying
physical device has silent disk erros accidentally, then caches and physical
data are not same. In such case, deep-scrub operation still tries to read
caches firstly and won't do crc checksum, then deep-scrub won't find such
data corruptions timely.

Here introduce a new flag 'CEPH_OSD_OP_FLAG_BYPASS_CLEAN_CACHE' which tells
deep-scrub to bypass object caches. Note that we only bypass cache who is in
STATE_CLEAN state. For STATE_WRITING caches, currently they are not written
to physical device, so deep-scrub operation can not read physical device and
can read these dirty caches safely. Once they are in STATE_CLEAN state(or not
added to bluestore cache), next round deep-scurb can check them correctly.

As to above discussions, I refactor BlueStore::BufferSpace::read sightly,
adding a new 'flags' argument, whose value will be 0 or:
     enum {
       BYPASS_CLEAN_CACHE = 0x1,     // bypass clean cache
     };

flags 0: normal read, do not bypass clean or dirty cache
flags BYPASS_CLEAN_CACHE: bypass clean cache, currently only for deep-scrube
                        operation

Test:
   I deliberately corrupt a object with cache, with this patch, deep-scrub
   can find data error very timely.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@easystack.cn>
(cherry picked from commit a7f1af2)

Conflicts:
	src/include/rados.h
	src/os/bluestore/BlueStore.cc: trivial resolution

@tchaikov tchaikov force-pushed the tchaikov:wip-luminous-35067 branch from 3c1f297 to a7bcb26 Oct 29, 2018

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Oct 29, 2018

@smithfarm fixed and repushed.

@smithfarm smithfarm requested review from neha-ojha and jdurgin and removed request for smithfarm Oct 29, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Oct 31, 2018

@yuriw

This comment has been minimized.

Contributor

yuriw commented Nov 2, 2018

@yuriw yuriw merged commit 0bb5efe into ceph:luminous Nov 5, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Nov 5, 2018

@yuriw I suggest we stop merging PRs into luminous until v12.2.10 is out. What do you think?

@tchaikov tchaikov deleted the tchaikov:wip-luminous-35067 branch Nov 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment