Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reef: os/bluestore/bluefs: fix dir_link might add link that already exists in compact log #51001

Merged
merged 2 commits into from May 25, 2023

Conversation

ifed01
Copy link
Contributor

@ifed01 ifed01 commented Apr 11, 2023

backport of #50185

backport tracker: https://tracker.ceph.com/issues/59391
parent tracker: https://tracker.ceph.com/issues/56210

Signed-off-by: Igor Fedotov igor.fedotov@croit.io

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

…in compact log

After commit eac1807 os/bluestore/bluefs: Weaken locks in open_for_write
There's a race window between open_for_write and log compaction

Process A                                  Process B
open_for_write                             _compact_log_async_LD_LNF_D
                                              log.lock
   node.lock                                    ...
     update nodes.dir_map(add dirlink A)        node.lock(wait for process A)
   node.unlock                                    ...
   log.lock(wait for Process B)                   <get lock>
     ...                                          compact log(create log based on nodes.dir_map which has dirlink A)
     ...                                          ...
     ...                                          ...
     ...                                         node.unlock()
     ...                                      log.unlock
     <get lock>
     log file create event(dirlink A)
   log.unlock

After the above case, bluefs log will have something like this

0x0: txn(seq 1 len 0x141ee crc 0x3e1c626f)
 0x0:  op_init
 0x0:  op_file_update  file(ino 2524749 size 0x246b6 mtime 2023-02-08T03:07:19.950963+0800 allocated 30000 alloc_commit 30000 extents [1:0xa135e0000~30000])
 0x0:  op_file_update  file(ino 2524746 size 0x175af mtime 2023-02-08T03:07:19.771584+0800 allocated 20000 alloc_commit 20000 extents [1:0xa13530000~20000])
 ...
 0x0:  op_dir_link  db/2524749.sst to 2524751
 0x0:  op_dir_link  db/2524750.sst to 2524752
 0x0:  op_dir_link  db/CURRENT to 2491157
 ...
 0x0:  op_jump seq 18414993 offset 0x20000
 0x20000: txn(seq 18414994 len 0x65 crc 0xc1f9ec5f)
 0x20000:  op_file_update  file(ino 2524752 size 0x0 mtime 2023-02-08T03:07:20.205074+0800 allocated 0 alloc_commit 0 extents [])
 0x20000:  op_dir_link  db/2524750.sst to 2524752

dir_link db/2524750.sst to 2524752 exists at both compacted log(txn seq 1) and log txn seq 18414994.
If log compaction won't happen later or abnormal shutdown happens,
next time bluefs mount replay will fail at following assert

2023-02-10T11:05:09.826+0800 7f1f97b71280 10 bluefs _replay 0x20000: txn(seq 18414994 len 0x65 crc 0xc1f9ec5f)
2023-02-10T11:05:09.826+0800 7f1f97b71280 20 bluefs _replay 0x20000:  op_file_update  file(ino 2524752 size 0x0 mtime 2023-02-08T03:07:20.205074+0800 allocated 0 alloc_commit 0 extents [])
2023-02-10T11:05:09.826+0800 7f1f97b71280 20 bluefs _replay 0x20000:  op_dir_link  db/2524750.sst to 2524752
2023-02-10T11:05:09.832+0800 7f1f97b71280 -1 //source/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7f1f97b71280 time 2023-02-10T11:05:09.827662+0800
//source/ceph/src/os/bluestore/BlueFS.cc: 1419: FAILED ceph_assert(r == q->second->file_map.end())

Refer to other operations that update the node and add a log entry at the
same time, such as rename. Fixed this by taking log lock and node lock
at the begining function(follow lock ordering, so log lock first.),
i.e. N_LD -> LND

Fixes: https://tracker.ceph.com/issues/56210
Signed-off-by: ethanwu <ethanwu@synology.com>
(cherry picked from commit c55f737)
… compaction

Test for https://tracker.ceph.com/issues/56210

Signed-off-by: ethanwu <ethanwu@synology.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit a74d02b)
@ifed01
Copy link
Contributor Author

ifed01 commented May 23, 2023

jenkins test api

@ljflores ljflores added this to the reef milestone May 23, 2023
@ljflores
Copy link
Contributor

@yuriw yuriw merged commit a68f5be into ceph:reef May 25, 2023
10 of 11 checks passed
@ifed01 ifed01 deleted the wip-ifed-bluefs-duplicate-dir-link-ree branch May 25, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants