New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spline threading #1082

Merged
merged 20 commits into from Oct 1, 2018

Conversation

Projects
None yet
4 participants
@ye-luo
Copy link
Contributor

ye-luo commented Sep 21, 2018

This PR enables nested threading mostly in the Spline SPO.

  1. Remove unused USE_VECTOR_ML and PRECOMPUTE_L code path.
  2. Enable threading in C2C and C2R adoptors.
  3. checkAffinity binary is added for checking affinity.

No performance impact if nested threading is not turned on.

@wafflebot wafflebot bot added the in progress label Sep 21, 2018

{
typedef std::complex<TT> ComplexT;
// protect last
last = last<0 ? kPoints.size() : (last>kPoints.size() ? kPoints.size() : last);

This comment has been minimized.

@markdewing

markdewing Sep 21, 2018

Contributor

Would it be clearer to avoid the nested ternary operators by replacing the second one with std::min?

last = last < 0 ? kPoints.size() : std::min(last, kPoints.size());

This comment has been minimized.

@prckent

prckent Sep 21, 2018

Contributor

I would prefer this was written explicitly with no ternary operators.

This comment has been minimized.

@ye-luo

ye-luo Sep 21, 2018

Contributor

I tried that. min doesn't support mixing int,size_t

This comment has been minimized.

@markdewing

markdewing Sep 21, 2018

Contributor

Is there some other way to signal that 'last' is unset and should be set to kPoints.size()?

This comment has been minimized.

@ye-luo

ye-luo Sep 21, 2018

Contributor

This is temporal. I have not settled R2R and hybrid. Once these are done. first and last becomes required and there is no need to provide the default value. C++ only allows default argument value as constexpr. I tried kPoints.size() and got compiler error.

<< std::endl;
#pragma omp parallel

This comment has been minimized.

@markdewing

markdewing Sep 21, 2018

Contributor

This output might be better enclosed in a test for nested parallelism (omp_get_nested), and explicitly say something like 'Nested parallelism enabled' (or disabled). And then only print the number of threads in each level if nesting is enabled.

This comment has been minimized.

@ye-luo

ye-luo Sep 21, 2018

Contributor

There are two cases
OMP_NESTED=TRUE
OMP_NUM_THREADS=16,1
and
OMP_NESTED=FALSE
OMP_NUM_THREADS=16
Do you consider the first case nested parallelism enabled?
Another tricky thing is OMP_NESTED will be deprecated in OpenMP 5 as well as omp_get_nested.
Using parallel regions is the only solid way to test threading at every level.

For a more elaborated output, I can do the change.

This comment has been minimized.

@markdewing

markdewing Sep 21, 2018

Contributor

From the standpoint of settings, I would still consider nested parallelism active even if there were only 1 thread in the inner region. My concern is being able to discern the two cases from the output - if the user is wondering why there is only 1 thread in the inner region - is it because OMP_NUM_THREADS was set that way, or because OMP_NESTED was set that way.
I didn't know about the nested functions and variables being deprecated - that does complicate the decision. Do you know what is supposed to replace it ?

This comment has been minimized.

@markdewing

markdewing Sep 21, 2018

Contributor

I found the openmp forum thread where you asked about this topic
http://forum.openmp.org/forum/viewtopic.php?f=23&t=2028

One option would be to test the number of threads at the inner level and if it is 1, print that nesting is not enabled.
Alternately, using omp_get_nested could be used for now, and changed once it's clear where the next version of openmp is heading.

This comment has been minimized.

@ye-luo

ye-luo Sep 21, 2018

Contributor

In 5.0, you will still need OMP_NESTED to control the nested threading. It is deprecated but not removed.
Nothing to replace it because OMP_NUM_THREADS=list OMP_MAX_ACTIVE_LEVELS should be sufficient.
It is not clear to me how to control exactly in the future. From the code, using two parallel region to feel the environment is solid.

This comment has been minimized.

@ye-luo

ye-luo Sep 21, 2018

Contributor

Now print "OMP nested threading disabled or only 1 thread on the 2nd level" if the second level size is one.

@prckent
Copy link
Contributor

prckent left a comment

Standard output should not change unless nested threading is utilized.

@prckent

This comment has been minimized.

Copy link
Contributor

prckent commented Sep 24, 2018

If this transitions from experimental/development work to something more concrete it will need docs, tests. For the time being, the standard output should not change visibly to users unless activated.

@ye-luo

This comment has been minimized.

Copy link
Contributor

ye-luo commented Sep 27, 2018

I agree with docs and tests but I insist changing

  OMP_NUM_THREADS      = 16

to

  OMP 1st level threads = 16
  Nested OpenMP threading disabled

The old output is actually misleading because it looks like printing environment variable but actually it is not. It is printing the number of available threads from the OpenMP runtime.

@PDoakORNL
Copy link
Contributor

PDoakORNL left a comment

I wouldn't insist on my changes but consider them.

I think this would be fine to go in now as an iteration of threaded spline support. Its helpful to look at how spline threading, batching, and perhaps small problems where we don't need this sort of threading can be accommodated. If there is even a performance hit on CPU evaluation of smaller problems.

Show resolved Hide resolved src/Utilities/UtilityFunctions.h
Show resolved Hide resolved src/QMCApp/QMCMain.cpp
@prckent

This comment has been minimized.

Copy link
Contributor

prckent commented Sep 28, 2018

General comment motivated by a discussion with @jtkrogel - I think we need to watch out more carefully than we have been for performance regressions on small problems.

@ye-luo

This comment has been minimized.

Copy link
Contributor

ye-luo commented Sep 29, 2018

In general, small problem should NOT use nested threading.
There should be no performance regression if nested threading is not turned on.
My scheme only computes a few more integers in the inter level.

@prckent

prckent approved these changes Oct 1, 2018

@prckent prckent merged commit f55c2d5 into QMCPACK:develop Oct 1, 2018

2 checks passed

qmcpack rhea Build finished.
Details
qmcpack rhea (gpu) Build finished.
Details

@wafflebot wafflebot bot removed the in progress label Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment