New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: fix crash when exporting unlinked dir #44335
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your analysis is correct.
After one directory was removed, it would unlink the dentry from its parent and then created one new stray dentry in the ~/mdsX/stray/
. But for the stray dentry, since there had one extra caps=1
pin ref, so it was skipped being purged by the MDS. This is why the MDS will always could fetch it from the disk and this stray dentry didn't disappear.
9085cd8
to
f916f39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me!
For anyone who would like to reproduce this on master: Console 1 MON=1 OSD=1 MDS=2 MGR=1 ../src/vstart.sh --new --debug
./bin/ceph fs set a max_mds 2
mkdir mnt
sudo mount -t ceph <mon addr from ceph.conf>:/ mnt -o name=admin,secret=<from keyring> Console 2 cd mnt
mkdir pin
setfattr -n ceph.dir.pin -v 1 pin
mkdir pin/to-be-deleted
cd pin/to-be-deleted
rmdir ../to-be-deleted Keep console 2 in this state to hold the cap on the deleted dir. Back to console 1: ./bin/ceph daemon mds.a flush journal # mds.a is rank 1 in my case
./bin/ceph mds fail a:1 # Just to make sure cache is cleared
./bin/ceph fs set a max_mds 1 On the current master (c1c7d3b), this sequence would trigger another crash on rank 0. So, to reproduce the bug fixed by this PR, I need to manually restart rank 0 to let rank 1 proceed. More about the crash of rank 0: (gdb session)
Maybe we need another PR to fix this one. |
Perhaps its a combination of client holding a cap ref on a deleted inode and reducing max_mds (to 1) that's probably uncommon. Anyhow - thank you for the detailed explanation in the tracker. Do you think a test case can be written to verify this? |
Now cephadm would reduce max_mds on every upgrade. I would expect this to be a lot more common.
I'm not familiar with the testing infrastructure. But I think yes, given that the steps in my above comment could reproduce the crash stably (I have tried it about 5 times). But we may need to fix the rank 0 seg fault first before we can automatically verify this fix. |
You could start taking a look at tests under +1 for fixing the other crash, which is a necessity to verify this fix. |
I made a PR #44477 to fix the crash of rank 0. But the relevant code has not changed for years. Could you reproduce the same seg fault as mine? Or it is just because I use gcc from Ubuntu 20.04 |
@huww98 could you update this pr with a test case? I'll plan to include this and #44477 in my test branch. |
I will try it today. But I don't have the resources to run the tests to ensure it can reproduce the crash. |
Sure. I'll include this PR (and the supporting PR #44477) in my test branch. |
|
||
self.mount_a.run_shell(["mkdir", "pin"]) | ||
|
||
self._force_migrate("pin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test fails with:
2022-02-05T21:48:59.852 INFO:tasks.cephfs_test_runner:test_migrate_unlinked_dir (tasks.cephfs.test_strays.TestStrays) ... ERROR
2022-02-05T21:48:59.853 INFO:tasks.cephfs_test_runner:
2022-02-05T21:48:59.853 INFO:tasks.cephfs_test_runner:======================================================================
2022-02-05T21:48:59.854 INFO:tasks.cephfs_test_runner:ERROR: test_migrate_unlinked_dir (tasks.cephfs.test_strays.TestStrays)
2022-02-05T21:48:59.854 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-02-05T21:48:59.855 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-02-05T21:48:59.855 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_09fc2e9e470435f1e6dbf462a51853391f4a4cc0/qa/tasks/cephfs/cephfs_test_case.py", line 365, in _wait_subtrees
2022-02-05T21:48:59.855 INFO:tasks.cephfs_test_runner: while proceed():
2022-02-05T21:48:59.856 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_3094160cc590b786a43d55faaf7c99d5de71ce56/teuthology/contextutil.py", line 133, in __call__
2022-02-05T21:48:59.856 INFO:tasks.cephfs_test_runner: raise MaxWhileTries(error_msg)
2022-02-05T21:48:59.856 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: reached maximum tries (15) after waiting for 30 seconds
2022-02-05T21:48:59.856 INFO:tasks.cephfs_test_runner:
2022-02-05T21:48:59.857 INFO:tasks.cephfs_test_runner:The above exception was the direct cause of the following exception:
2022-02-05T21:48:59.857 INFO:tasks.cephfs_test_runner:
2022-02-05T21:48:59.857 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-02-05T21:48:59.858 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_09fc2e9e470435f1e6dbf462a51853391f4a4cc0/qa/tasks/cephfs/test_strays.py", line 710, in test_migrate_unlinked_dir
2022-02-05T21:48:59.858 INFO:tasks.cephfs_test_runner: self._force_migrate("pin")
2022-02-05T21:48:59.858 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_09fc2e9e470435f1e6dbf462a51853391f4a4cc0/qa/tasks/cephfs/test_strays.py", line 609, in _force_migrate
2022-02-05T21:48:59.859 INFO:tasks.cephfs_test_runner: self._wait_subtrees([(rpath, rank)], rank=rank, path=rpath)
2022-02-05T21:48:59.859 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_09fc2e9e470435f1e6dbf462a51853391f4a4cc0/qa/tasks/cephfs/cephfs_test_case.py", line 378, in _wait_subtrees
2022-02-05T21:48:59.859 INFO:tasks.cephfs_test_runner: raise RuntimeError("rank {0} failed to reach desired subtree state".format(rank)) from e
2022-02-05T21:48:59.859 INFO:tasks.cephfs_test_runner:RuntimeError: rank 1 failed to reach desired subtree state
A directory gets migrated to the pinned mds rank if its not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I just create an empty file in the pin directory. Hopes it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL at the failed test. It a straightforward fix.
@huww98 ping? |
9b64106
to
8e8fbcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
The log said Does this mean I managed to reproduce the bug but failed to fix it? I may not have time to dig into the logs recently. Please help me if possible. |
OK, no problem - I'll take a look. |
We are close to the next release. I think we need a warning in the release note that the administrators should clean up any unlinked stray dirs before reducing max_mds (or initiating an automatic upgrade with cephadm). |
From dev@ceph.io - https://lists.ceph.io/hyperkitty/list/dev@ceph.io/message/CKYDFTLXUJ5ZBY7PXSLPJMZALYDWNWXE/ |
@huww98 BTW, I found another issue when migrating strays. At times, when (say) rank1 mds migrates a stray dentry to rank0 (by sending a rename operation), rank1 might, at a later point, purge the stray dentry itself. rank0 would send discover messages to rank1 to load dir/dentry/inodes into its cache (as replicas), however, when it tries to discover the stray entry itself (the dir that was deleted), rank1 does not send back the inode of the deleted dir and rank0 ends sending discover messages to rank1 endlessly. |
@huww98 This PR is blocked for testing/merging due to the issue mentioned above. To unblock this, I think its fine to remove the check here - https://github.com/ceph/ceph/pull/44335/files#diff-550a84c6df2db675f1eee5834738001a189b4f0939b7013868c9eed284a79417R737 We could add it back once the stray migration issue is fully fixed. Does that sound reasonable? This way tests would validate that the MDS does not crash. Could you update this PR, so that I can run in through tests? |
When fetch() an unlinked dir, we set fnode->version = 1, but leave projected_version = 0. This will trigger the assert `dir->get_projected_version() == dir->get_version()` in Migrator::encode_export_dir(). projected_version should equal to `fnode->version` unless this dir is projected. Fix this by introducing a new helper CDir::set_fresh_fnode(), which will ensure versions are correctly set. Fixes: https://tracker.ceph.com/issues/53597 Signed-off-by: 胡玮文 <huww98@outlook.com>
Signed-off-by: 胡玮文 <huww98@outlook.com>
8e8fbcc
to
9558a6a
Compare
@vshankar Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When fetch() an unlinked dir, we set fnode->version = 1, but leave projected_version = 0. This will trigger the assert
dir->get_projected_version() == dir->get_version()
in Migrator::encode_export_dir().projected_version should equal to
fnode->version
unless this dir is projected.Fix this by introducing a new helper CDir::set_fresh_fnode(), which will ensure versions are correctly set.
Fixes: https://tracker.ceph.com/issues/53597
Signed-off-by: 胡玮文 huww98@outlook.com
I'm the reporter of the above tracker ticket. I just read the code and logs, but not tested this patch. Please help me test it if possible.
And it is a bit strange if I'm not missing something. This bug seems pretty easy to reproduce. We have above 100 of such directories with just less than 30 clients. So why didn't anybody report it before?
example crash logs when reducing max_mds from 2 to 1:
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox