New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: remove boost::pool usage and use tcmalloc directly #12792

Merged
merged 1 commit into from Apr 24, 2017

Conversation

Projects
None yet
4 participants
@david-z
Member

david-z commented Jan 5, 2017

@david-z

This comment has been minimized.

Member

david-z commented Jan 5, 2017

@jcsp Could you pls take a look? Thanks.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 5, 2017

Can you talk about why this is useful and what performance differences you saw?

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 5, 2017

Oh, never mind I see that there is an explanation in the tracker ticket

@LiuHongdong

Without boost pool , rm a directory, mds will crash.

assert((o->lru_list == & lru_pintail) || (o->lru_list ==&lru_top) || (o->lru_list == &lru_b))
@david-z

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jan 24, 2017

Hmm, so I don't want to merge this without working out how it intersects with the possible switch to using mempools (src/common/mempool.cc) in the MDS (cc @liewegas)

@liewegas

This comment has been minimized.

Member

liewegas commented Jan 24, 2017

The switch to mempool will probably mean dropping boost::pool here anyway (unless we want to add special boost::pool support to the mempool infrastructure). The only reason not to do that is concern about performance and memory utiliztion. It wouldn't surprise me if tcmalloc does just as well on the performance side. I would do some testing to make sure the overall memory footprint doesn't expand, though--that's the original problem we were trying to fix, and ~7 years ago or wahtever it was significant. The allocator has probably come a long way since then, but we need to check.

@liewegas

This comment has been minimized.

Member

liewegas commented Jan 24, 2017

And assuming it's fine with tcmalloc, I would rip out boost::pool entirely.. don't bother with an #ifdef.

@LiuHongdong

My test case is:

 for ((i=1;i<=1000;i++))
 do
    mdtest -d /mnt/ceph -I 10000 -z 3 -C
	rm -rf /mnt/ceph/\#test-dir.0/
done

With gperftools-2.4(tcmalloc-4.2.6), the test case does not pass. But with gperftools-2.3(tcmalloc-4.2.4), the test case doed pass. So, I doubt that tcmalloc cause the crash.

@david-z

This comment has been minimized.

Member

david-z commented Feb 4, 2017

@LiuHongdong thanks for testing this change on different tcmalloc version. I only tested it on gperftools-2.1 before.

Currently I haven't figured out how this crash is related to tcmalloc or boost::pool. It looks more like a logic bug to me. Anyway, I will try to reproduce this crash by using gperftools-2.4.

BTW, can you give me more information?

  1. Is this crash reproduced constantly with gperftools-2.4?
  2. Which ceph version are you using?
@LiuHongdong

Yes, This crash is reproduced constantly with gperftools-2.4 and 2.5 .
And I tested ceph from 9.0.0 to 10.2.3, all the version crashed.

@david-z

This comment has been minimized.

Member

david-z commented Feb 10, 2017

@LiuHongdong This crash doesn't seem to be related with TCMALLOC. I think it might be the new version of TCMALLOC which has some new memory reclaim mechanism or similar stuff, that could expose this crash more easily and obviously.

The root cause is that in StrayManager::_purge_stray_logged, inode will delete itself firstly,
in->mdcache->remove_inode(in);
then it tries to drop dentry via this deleted inode.
in->mdcache->touch_dentry_bottom(dn);

I have create a new PR to track and fix this crash, pls check it here #13347

I also tested the new fix with some test cases and they are all passed.

  1. boost::pool with TCMALLOC 4.2.6
  2. only TCMALLOC 4.2.6
  3. boost::pool with TCMALLOC 4.1.2
  4. only TCMALLOC 4.1.2
@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 15, 2017

@david-z if you would still like to work on this, please could you rebase this code on master, and update it to simply remove the boost::pool usage instead of ifdeffing it.

@david-z

This comment has been minimized.

Member

david-z commented Apr 20, 2017

@jcsp Sorry for the late reply. I will work on this, but let me test it on our real env more. I will update the change soon.

@david-z david-z closed this Apr 24, 2017

Zhi Zhang
mds: remove boost::pool usage and use tcmalloc directly
Signed-off-by: Zhi Zhang <zhangz.david@outlook.com>

@david-z david-z reopened this Apr 24, 2017

@david-z david-z changed the title from mds: add the option to use tcmalloc directly to mds: remove boost::pool usage and use tcmalloc directly Apr 24, 2017

@david-z

This comment has been minimized.

Member

david-z commented Apr 24, 2017

@jcsp Changes had been made and tests had been ran for few days on our real env. So far so good. Pls help to review the change.

BTW, I see the jenkins build failed but it is not related to this change. Could you pls help to check? Thanks.

/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:2214: corrupt_scrub_erasure:  test no = yes
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:2214: corrupt_scrub_erasure:  return 1
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/osd/osd-scrub-repair.sh:45: run:  return 1
...
166/167 Test #146: osd-scrub-repair.sh .....................***Failed  906.08 sec
@jcsp

This comment has been minimized.

Contributor

jcsp commented Apr 24, 2017

retest this please (jenkins)

@jcsp

jcsp approved these changes Apr 24, 2017

@jcsp jcsp merged commit 9ca080d into ceph:master Apr 24, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment