New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore/BlueFS: clear current log entrys before dump all fnode, #15973

Merged
merged 1 commit into from Jul 9, 2017

Conversation

Projects
None yet
3 participants
@majianpeng
Member

majianpeng commented Jun 28, 2017

We do async-compact-log, i met this bug:
2017-06-28 11:51:42.747315 7f193dd70bc0 -1
/root/ceph/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool)' thread 7f193dd70bc0 time 2017-06-28
11:51:42.741868
/root/ceph/src/os/bluestore/BlueFS.cc: 714: FAILED assert(r == q->second->file_map.end())

ceph version 12.0.3-2327-gc74625e
(c74625e) luminous (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x10e) [0x5628ee1f8a0e]
2: (BlueFS::_replay(bool)+0x3bc3) [0x5628ee18cb13]
3: (BlueFS::mount()+0x1cf) [0x5628ee18cf0f]
4: (BlueStore::_open_db(bool)+0xd99) [0x5628ee0af7f9]
5: (BlueStore::_mount(bool)+0x3da) [0x5628ee0e056a]
6: (OSD::init()+0x28f) [0x5628edce10bf]
7: (main()+0x29ca) [0x5628edbf116a]
8: (__libc_start_main()+0xf5) [0x7f193b2c1f45]
9: (()+0x493306) [0x5628edc8b306]
NOTE: a copy of the executable, or objdump -rdS <executable> is
needed to interpret this.

assume this case :
Thread1 Thread2
_compact_log_async
_flush_and_sync_log
lock.unlock()
open_for_write(A)
op_file_update
op_dir_link
lock.lock()
_compact_log_dump_metadata
contail file A
flush
lock.unlock
op_file_update(alloc new extent)
_flush_and_sync_log

So two log entry have the same infos(op_dir_link). When do _replay the
above bug occur.

Before reflect everything to compact, we should clear current log entrys
to avoid this. And compact contain all infos. It don't miss something.

Signed-off-by: Jianpeng Ma jianpeng.ma@intel.com

@majianpeng

This comment has been minimized.

Show comment
Hide comment
@majianpeng

majianpeng Jun 28, 2017

Member

@liewegas . I'm not sure 100%. Can you review this? Thanks!

Member

majianpeng commented Jun 28, 2017

@liewegas . I'm not sure 100%. Can you review this? Thanks!

os/bluestore/BlueFS: clear current log entrys before dump all fnode,
We do async-compact-log, i met this bug:
 2017-06-28 11:51:42.747315 7f193dd70bc0 -1
/root/ceph/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool)' thread 7f193dd70bc0 time 2017-06-28
11:51:42.741868
/root/ceph/src/os/bluestore/BlueFS.cc: 714: FAILED assert(r == q->second->file_map.end())

 ceph version 12.0.3-2327-gc74625e
(c74625e) luminous (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x10e) [0x5628ee1f8a0e]
 2: (BlueFS::_replay(bool)+0x3bc3) [0x5628ee18cb13]
 3: (BlueFS::mount()+0x1cf) [0x5628ee18cf0f]
 4: (BlueStore::_open_db(bool)+0xd99) [0x5628ee0af7f9]
 5: (BlueStore::_mount(bool)+0x3da) [0x5628ee0e056a]
 6: (OSD::init()+0x28f) [0x5628edce10bf]
 7: (main()+0x29ca) [0x5628edbf116a]
 8: (__libc_start_main()+0xf5) [0x7f193b2c1f45]
 9: (()+0x493306) [0x5628edc8b306]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

assume this case :
  Thread1                       Thread2
_compact_log_async
 _flush_and_sync_log
  lock.unlock()
                             open_for_write(A)
                             op_file_update
                             op_dir_link
  lock.lock()
 _compact_log_dump_metadata
    contail file A
  flush
  lock.unlock
                            op_file_update(alloc new extent)
                            _flush_and_sync_log

So two log entry have the same infos(op_dir_link). When do _replay the
above bug occur.

Before reflect everything to compact, we should clear current log entrys
to avoid this. And compact contain all infos. It don't miss something.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
@majianpeng

This comment has been minimized.

Show comment
Hide comment
@majianpeng

majianpeng Jul 3, 2017

Member

@liewegas . Ping, this bug 100% occur in our all-flash cluster.

Member

majianpeng commented Jul 3, 2017

@liewegas . Ping, this bug 100% occur in our all-flash cluster.

@tchaikov tchaikov requested a review from liewegas Jul 3, 2017

@liewegas liewegas added the needs-qa label Jul 3, 2017

@tchaikov

This comment has been minimized.

Show comment
Hide comment
@tchaikov

tchaikov Jul 6, 2017

Contributor

retest this please

Contributor

tchaikov commented Jul 6, 2017

retest this please

@tchaikov

This comment has been minimized.

Show comment
Hide comment
@tchaikov

tchaikov Jul 7, 2017

Contributor

retest this please.

Contributor

tchaikov commented Jul 7, 2017

retest this please.

@liewegas liewegas merged commit 91e5d5b into ceph:master Jul 9, 2017

5 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment