Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce MAX_TERMINATED_THREADS default, improve memory use on manycore… #2656

Merged
merged 4 commits into from Jul 27, 2017

Conversation

biddisco
Copy link
Contributor

… cpus

The default value of 1000 is too high and causes problems on many core machines. there does not seem to be any great advantage to having a high number of terminated tasks waiting for cleanup. Stack usage explodes when this number is large.

@hkaiser
Copy link
Member

hkaiser commented May 29, 2017

@biddisco What new default value for this configuration constant did you have in mind?

@hkaiser
Copy link
Member

hkaiser commented May 29, 2017

What new default value for this configuration constant did you have in mind?

@biddisco Sorry, I should have asked: how did you determine the new value you propose. reducing it down to 25 seems to be a radical change...

@biddisco
Copy link
Contributor Author

On our 32 core nodes, the previous value of 1000 could leave 32,000 uncleaned up tasks (and their stacks).
Including High priority queues as we do, pushes that up to 64k tasks unclaimed.

Moving to KNL, we will be using upwards of 300K tasks in the cleanup zone, and with large stack sizes, this is a big problem.

All tasks must be cleaned up sooner or later and the previous limit of 1000 was pulled from a hat anyway. We tested with 100 and found memory issues went away, but a value of 25 seems more realistic. I would prefer lower than that since larger values provide no gain other than a small saving of detour into the cleanup code. Smaller values are more likely to allow cache coherent reuse of memory than larger values.

@hkaiser
Copy link
Member

hkaiser commented May 29, 2017

I would prefer lower than that since larger values provide no gain other than a small saving of detour into the cleanup code.

Thanks for this explanation of your rationale.

The main reason why this delayed cleanup was put in place to begin with was to avoid acquiring a kernel lock in the scheduler. In the end the selected default value should give a reasonable compromise between having to acquire a kernel lock in the scheduler and the memory overheads eventually imposed by not-cleaned-up threads.

Are we sure that having to acquire a kernel lock every 25 executed tasks wouldn't cause too much overhead? May be we should change the cleanup logic in a way that it would be executed whenever we have to acquire that lock for other reasons anyways?

@sithhell
Copy link
Member

sithhell commented Jul 3, 2017

Any plans on how to move forward with this?

@biddisco
Copy link
Contributor Author

biddisco commented Jul 7, 2017

  • Add a CMake option to set the number, and make the default 100 (say)
    or
  • Add a command line option to set it at runtime with a default of 100 (say)

The first is easier, the second gives more flexibility. I will do the first unless several of you prefer the second

@hkaiser
Copy link
Member

hkaiser commented Jul 7, 2017

  • Add a CMake option to set the number, and make the default 100 (say)
    or
  • Add a command line option to set it at runtime with a default of 100 (say)

The first is easier, the second gives more flexibility. I will do the first unless several of you prefer the second

Why not do the second with a cmake option to set the default?

@biddisco biddisco closed this Jul 11, 2017
@biddisco biddisco deleted the terminated_threads branch July 11, 2017 20:31
@hkaiser
Copy link
Member

hkaiser commented Jul 11, 2017

@biddisco Why did you close this? Doesn't this give you any performance benefit anymore?

@biddisco
Copy link
Contributor Author

This is absolutely critical. Without this patch we cannot run our code. However, I made a reasonable request for a change and it was not accepted, and I must work on other deadlines, so when I get time, I will revisit this, but for now the PR is going nowhere and can be deleted.

@hkaiser hkaiser restored the terminated_threads branch July 12, 2017 11:32
@hkaiser
Copy link
Member

hkaiser commented Jul 12, 2017

I'll reopen this and might be able to find some time to work on it.

@hkaiser hkaiser reopened this Jul 12, 2017
@hkaiser
Copy link
Member

hkaiser commented Jul 15, 2017

@biddisco: please review

…ad value

- flyby: fixed stupid macro expansion bug in runtime configuration initialization
@biddisco
Copy link
Contributor Author

Nitpicking: I have added one commit because a boolean option like HPX_WITH_CUDA might not be available and so HPX_HAVE_CUDA will not be set. With options that are not optional, such as MAX__TERMINATED_THREADS - there is not need for WITH/HAVE - this breaks convention in naming.
There is a define and it must be set to something, so simply

HPX_SCHEDULER_MAX_TERMINATED_THREADS=blah

is enough

@biddisco
Copy link
Contributor Author

There were some changes to the STRINGIZE function that caused a conflict when I merged master, so I have rebased onto master and cleaned the problem. Please check that the vars I've used are correct. I might have broken something

@biddisco
Copy link
Contributor Author

I'm not sure if the final commit was lost yesterday, or maybe I pushed it to the wrong repo. Please check the use of BOOST_STRINGIZE that I have changed to HPX_PP_STRINGIZE which I believe is the correct new usage.

@hkaiser
Copy link
Member

hkaiser commented Jul 25, 2017

@biddisco: BOOST_STRINGIZE(x) must be replaced by HPX_PP_STRINGIZE(HPX_PP_EXPAND(x)) - I thought I made this change, did it get lost as well?

@biddisco
Copy link
Contributor Author

Fixed.

@@ -239,6 +239,8 @@ namespace hpx { namespace util
"min_add_new_count = ${HPX_THREAD_QUEUE_MIN_ADD_NEW_COUNT:10}",
"max_add_new_count = ${HPX_THREAD_QUEUE_MAX_ADD_NEW_COUNT:10}",
"max_delete_count = ${HPX_THREAD_QUEUE_MAX_DELETE_COUNT:1000}",
"max_terminated_threads = ${HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS:"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - where does this var come from "HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS"

@hkaiser
Copy link
Member

hkaiser commented Jul 25, 2017

@biddisco: this is used to check the execution environment. The syntax:

${HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS:1000}

means: 'check the env for HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS and use the corresponding value if found, otherwise use 1000 (see here for more details).

@biddisco
Copy link
Contributor Author

Anyone want to review/merge this. the review button is greyed out for me since I started it.

@hkaiser hkaiser merged commit 62b1224 into master Jul 27, 2017
@hkaiser hkaiser deleted the terminated_threads branch July 27, 2017 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants