Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults with tcmalloc #1023

Closed
jlippuner opened this issue Nov 23, 2013 · 7 comments
Closed

Segfaults with tcmalloc #1023

jlippuner opened this issue Nov 23, 2013 · 7 comments

Comments

@jlippuner
Copy link

I've been seeing some segfaults in my HPX application (right now only using futures and async). Sometimes the program would just run as expected, but more often than not it would terminate in a segfault without an HPX error message. Investigating the issue revealed that the segfaults came from tcmalloc. So I recompiled HPX with the system allocator and now I am not seeing those segfaults any longer.

I was wondering whether this is a known issue with tcmalloc. I am using tcmalloc that comes with gperftools 2.0 on a 64-bit Linux machine.

@hkaiser
Copy link
Member

hkaiser commented Nov 23, 2013

Can you show us the code which is segfaulting? Could it be that you're seeing stack overflows?

@jlippuner
Copy link
Author

Unfortunately, my code is fairly big and I don't have a small code example that triggers the segfaults at this point. Is there an easy way I could check for stack overflows (e.g. increase the stack size)? If necessary, I will try to boil down my code to a small example that reproduces the problem.

@hkaiser
Copy link
Member

hkaiser commented Nov 23, 2013

You can increase the default stack size by adding --hpx:ini=hpx.stacks.small_size=0x10000 (or larger) to the command line (0x8000 is the default, see here: http://stellar-group.github.io/hpx/docs/html/hpx/manual/init/configuration/config_defaults.html).

@jlippuner
Copy link
Author

Increasing the stack size helps, but it does not get rid of the segfaults entirely. I'm still seeing occasional segfaults (mostly at the end when all the data objects are destroyed), but they show up much less frequently than when I use the default stack size.

Does the fact that I need to specify a larger stack size to reduce the likelyhood of a segfault indicate that I'm doing something wrong in my code?

@hkaiser
Copy link
Member

hkaiser commented Nov 27, 2013

Increasing the stack size helps, but it does not get rid of the segfaults entirely. I'm still
seeing occasional segfaults (mostly at the end when all the data objects are destroyed),
but they show up much less frequently than when I use the default stack size.

That's a strong indication that you're fighting stack overflows. Unfortunately, HPX has currently no means of either detecting that a segfault is caused by a stack overflow or to use stacks which grow on demand. That's something which needs to be implemented at some point...

The other problems you're seeing could be caused by the 'after 588' problem (as described by #987). We're working on this feverishly, but it turns out to be a hard nut to crack. However I expect to make some progress soon.

Does the fact that I need to specify a larger stack size to reduce the likelihood of a
segfault indicate that I'm doing something wrong in my code?

It just means that you're using quite some stack space in your code. HPX is meant to be used with threads as short as possible (optimally ~50-100 microsecs), which implies that not too much in terms of stack requirements is going on, normally. If you know what actions cause the most problems you can tell HPX to run only those on larger stacks. Leaving the default stack size as small as possible is certainly a good thing in order to minimize overall system requirement (just think of havening millions of active threads, each with its own stack). Please see the macros at https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/actions/action_support.hpp#L1124. Sorry, no documentation yet...

@hkaiser
Copy link
Member

hkaiser commented Nov 28, 2013

Do you think we can close this now?

@jlippuner
Copy link
Author

Thanks for the explanation. I'm closing this for now, although I have not yet been able to run my code successfully with tcmalloc. Hopefully it will work once the other issues you mentioned are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants