Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
mds: remove boost::pool usage and use tcmalloc directly #12792
The switch to mempool will probably mean dropping boost::pool here anyway (unless we want to add special boost::pool support to the mempool infrastructure). The only reason not to do that is concern about performance and memory utiliztion. It wouldn't surprise me if tcmalloc does just as well on the performance side. I would do some testing to make sure the overall memory footprint doesn't expand, though--that's the original problem we were trying to fix, and ~7 years ago or wahtever it was significant. The allocator has probably come a long way since then, but we need to check.
My test case is:
for ((i=1;i<=1000;i++)) do mdtest -d /mnt/ceph -I 10000 -z 3 -C rm -rf /mnt/ceph/\#test-dir.0/ done
With gperftools-2.4(tcmalloc-4.2.6), the test case does not pass. But with gperftools-2.3(tcmalloc-4.2.4), the test case doed pass. So, I doubt that tcmalloc cause the crash.
@LiuHongdong thanks for testing this change on different tcmalloc version. I only tested it on gperftools-2.1 before.
Currently I haven't figured out how this crash is related to tcmalloc or boost::pool. It looks more like a logic bug to me. Anyway, I will try to reproduce this crash by using gperftools-2.4.
BTW, can you give me more information?
referenced this pull request
Feb 10, 2017
@LiuHongdong This crash doesn't seem to be related with TCMALLOC. I think it might be the new version of TCMALLOC which has some new memory reclaim mechanism or similar stuff, that could expose this crash more easily and obviously.
The root cause is that in StrayManager::_purge_stray_logged, inode will delete itself firstly,
I have create a new PR to track and fix this crash, pls check it here #13347
I also tested the new fix with some test cases and they are all passed.
@jcsp Changes had been made and tests had been ran for few days on our real env. So far so good. Pls help to review the change.
BTW, I see the jenkins build failed but it is not related to this change. Could you pls help to check? Thanks.