Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter Filtration in LM Optimizer #4126

Merged
merged 36 commits into from
Aug 25, 2022

Conversation

LeonOtis
Copy link
Contributor

Proposed changes

This is the remainder of the changes originally proposed in #4061 . This PR allows the wave function parameters optimized by the linear method to be filtered based on the amount of noise in the parameter gradients. The LM will then optimize only a subset of the parameters while the others can be handled by accelerated descent in a hybrid optimization.

The main changes are in the LM engine and the QMCFixedSampleLinearOptimizeBatched driver. There are a few additional user inputs for turning on parameter filtration, which are currently being handled similarly to the many other driver inputs. The parameter filtration should now be compatible with both MPI and OMP parallelism. Thanks again for all the feedback so far and I am always happy to make further improvements.

What type(s) of changes does this code introduce?

  • New feature

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Lawrencium computing cluster

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'
  • Yes. Code added or changed in the PR has been clang-formatted
  • No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • Yes. Documentation has been added (if appropriate)

@markdewing
Copy link
Contributor

Test this please

Copy link
Contributor

@PDoakORNL PDoakORNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm reading this very wrong there are serious fundamental problems with the concurrency logic of the batched linear optimization driver and this PR. Please mark this as [WIP] until the concurrency issues in it are fixed.

One of the maintainers should help you work through this in detail.

Have you run this with opt_num_crowds > 1? I would expect something interesting to happen.

src/QMCDrivers/WFOpt/EngineHandle.h Outdated Show resolved Hide resolved
src/QMCDrivers/WFOpt/QMCCostFunctionBatched.cpp Outdated Show resolved Hide resolved
src/formic/utils/lmyengine/engine.cpp Show resolved Hide resolved
src/QMCDrivers/WFOpt/EngineHandle.h Outdated Show resolved Hide resolved
src/QMCDrivers/tests/test_EngineHandle.cpp Outdated Show resolved Hide resolved
src/QMCDrivers/WFOpt/EngineHandle.h Outdated Show resolved Hide resolved
@LeonOtis
Copy link
Contributor Author

LeonOtis commented Aug 1, 2022

Thanks for the feedback. The new set of changes have the history arrays in the LM engine sized according to the number of samples on an MPI rank and different threads should store data in them using the index offsets. Tests I've done with multiple threads have worked, but there could be cases I've overlooked. I'm happy to do further rounds of fixes as needed.

@markdewing
Copy link
Contributor

markdewing commented Aug 2, 2022

Do you have an example input file for this feature? Ideally there should be a test added that also serves as an example. However, parameter filtering gets used when scaling up the number of parameters than can be handled and a test that is small enough to be a good test might not be a good example. It also sounds like parameter filtering gets used in combination with hybrid optimization. It would be good to have a more complete example to see how the feature works in use. Attaching files to the PR is good enough for now.

@LeonOtis
Copy link
Contributor Author

LeonOtis commented Aug 2, 2022

example_input.zip

Sure, here are input files for a quick example with the carbon trimer. The actual optimization results won't be very meaningful with low sample numbers and filtering an already small number of parameters, but the example_filter_hybrid.xml file shows the new inputs in the context of a hybrid optimization. Parameter filtration can be used for the LM on its own just by using the LM section of the hybrid input as a standalone, but I would generally expect the results to be inferior to allowing descent to optimize any parameters the LM leaves alone. I think this PR should already have some changes to the documentation to explain the new inputs, but I can add more if needed.

@LeonOtis
Copy link
Contributor Author

LeonOtis commented Aug 8, 2022

Added a small fix to the legacy driver to correct a problem @ye-luo noticed with PR #4075. At this point, what else is needed before this PR can be merged? I'm happy to revise the handling of threading more if desired. I'm also willing to come up with tests if needed though I wanted to be sure the threading issues were settled before trying to design any.

@ye-luo
Copy link
Contributor

ye-luo commented Aug 8, 2022

@LeonOtis could you make a new branch from develop and cherry pick the last fix commit you pushed and then make a separate PR.

@LeonOtis
Copy link
Contributor Author

LeonOtis commented Aug 9, 2022

@ye-luo Ok, I've opened PR #4161 for that small change.

@LeonOtis
Copy link
Contributor Author

@ye-luo The hybrid method should work with the batched driver. I've now added a test for it on H4, similar to what I previously added for the legacy driver.

@ye-luo
Copy link
Contributor

ye-luo commented Aug 12, 2022

@ye-luo The hybrid method should work with the batched driver. I've now added a test for it on H4, similar to what I previously added for the legacy driver.

Thank you. Could you make a separate PR with the new commits to develop? Since it works with the develop already.

@LeonOtis
Copy link
Contributor Author

@ye-luo The hybrid method should work with the batched driver. I've now added a test for it on H4, similar to what I previously added for the legacy driver.

Thank you. Could you make a separate PR with the new commits to develop? Since it works with the develop already.

Ok, I've opened a new PR #4172 that just adds the test.

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeonOtis I did some cleaning, please pull before continuing editing. Once my requests are addressed, I think this PR can get in. I close my eyes on run_three_shift and everything inside formic.

src/QMCDrivers/WFOpt/QMCFixedSampleLinearOptimizeBatched.h Outdated Show resolved Hide resolved
src/QMCDrivers/WFOpt/QMCCostFunctionBatched.cpp Outdated Show resolved Hide resolved
ye-luo
ye-luo previously approved these changes Aug 19, 2022
Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an issue of explicitly using threads. That is not a good pattern. However, I'd like that being addressed in a later PR.

src/QMCDrivers/WFOpt/EngineHandle.h Show resolved Hide resolved
le_der_samp[j + 1] = static_cast<FullPrecValue>(dhpsioverpsi_array.getValue(j, local_index)) +
le_der_samp[0] * static_cast<FullPrecValue>(dlogpsi_array.getValue(j, local_index));
}
int ip = omp_get_thread_num();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to move away from thread_num

@ye-luo
Copy link
Contributor

ye-luo commented Aug 19, 2022

Ping @PDoakORNL @markdewing since you gave review comments as well.

Copy link
Contributor

@PDoakORNL PDoakORNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the thread safety issues I previously was worried have been cleaned up but the subtypes of EngineHandle seem to be trouble and perhaps only one is really relevant to this PR.

DescentEngineHandle/Descent engine seems incoherent, if this is just here to get things compiling keep the unfinished code in your private branch and remove it from the PR throw a runtime exception if anyone tries to use it.

That would get us to smaller issues like the anonymous struct.

@@ -280,6 +278,7 @@ void QMCCostFunctionBatched::checkConfigurations(EngineHandle& handle)
std::vector<int> samples_per_crowd_offsets(opt_num_crowds + 1);
FairDivide(rank_local_num_samples_, opt_num_crowds, samples_per_crowd_offsets);

handle.prepareSampling(NumOptimizables, rank_local_num_samples_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see this method should not take the EngineHandle reference handle as an argument because most of the classes using that BaseClass are incompatible with the batched concurrency model and don't infact have the same sematics for the prepareSampling and takeSample Methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not getting this part "don't infact have the same sematics for the prepareSampling and takeSample Methods."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make any sense if engine handle is of subtype NullEngineHandle and DescentEngineHandle in which case omp_max_num_threads is still used to prepareStorage and the num_samples parameter to the prepareStorage is simply ignored.
And In the case of the DescentEngineHandle it ignores the base_sample_index input.
I would call this having different semantics.

@@ -92,20 +93,24 @@ class DescentEngineHandle : public EngineHandle
void takeSample(const std::vector<FullPrecReal>& energy_list,
const RecordArray<Value>& dlogpsi_array,
const RecordArray<Value>& dhpsioverpsi_array,
int ib) override
int base_sample_index) override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this isn't used?

OptimizerType current_optimizer_type_;
//String inputs are listed outside the struct
///name of the current optimization method, updated by processOptXML before run
std::string MinMethod;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming of all these private variables does not follow the coding standards of QMCPACK.

src/QMCDrivers/WFOpt/QMCFixedSampleLinearOptimizeBatched.h Outdated Show resolved Hide resolved
le_der_samp[0] * static_cast<FullPrecValue>(dlogpsi_array.getValue(j, local_index));
}
int ip = omp_get_thread_num();
engine_.takeSample(ip, der_rat_samp, le_der_samp, le_der_samp, 1.0, 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this relates to all the replica based accesses in the DescentEngine itself is really cryptic. Since this oenpmp thread id which isn't guaranteed to be any particular number especially in the batched context yet it is used as the replica id in the engine, which seems like it is very important.

src/QMCDrivers/WFOpt/EngineHandle.h Show resolved Hide resolved
Copy link
Contributor

@PDoakORNL PDoakORNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok another iteration to go here. Some of these broken Engines/EngineHandles and elements of formic could just be documented including todo's But by the end of this series of PR's the illogical and/or wrong code needs to be gone. Especially since sizable amounts of it appear to be new.

@@ -280,6 +278,7 @@ void QMCCostFunctionBatched::checkConfigurations(EngineHandle& handle)
std::vector<int> samples_per_crowd_offsets(opt_num_crowds + 1);
FairDivide(rank_local_num_samples_, opt_num_crowds, samples_per_crowd_offsets);

handle.prepareSampling(NumOptimizables, rank_local_num_samples_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make any sense if engine handle is of subtype NullEngineHandle and DescentEngineHandle in which case omp_max_num_threads is still used to prepareStorage and the num_samples parameter to the prepareStorage is simply ignored.
And In the case of the DescentEngineHandle it ignores the base_sample_index input.
I would call this having different semantics.

src/QMCDrivers/WFOpt/QMCFixedSampleLinearOptimizeBatched.h Outdated Show resolved Hide resolved
src/QMCDrivers/WFOpt/QMCFixedSampleLinearOptimizeBatched.h Outdated Show resolved Hide resolved
src/QMCDrivers/tests/test_EngineHandle.cpp Outdated Show resolved Hide resolved
src/formic/utils/lmyengine/engine.cpp Outdated Show resolved Hide resolved
void resetParamNumber(int new_num)
{
_num_params = new_num;
int NumThreads = omp_get_max_threads();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is an abstraction to get the number of threads, see src/Concurrency/Info.hpp but the optimization has no guarantee it will run on that many threads so this just isn't an effective way to allocate concurrenct workspace. It's possible this will work for awhile but it is at least a memory use bomb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4209 tracks the issue

@ye-luo
Copy link
Contributor

ye-luo commented Aug 25, 2022

Test this please

@ye-luo ye-luo merged commit 4b57d30 into QMCPACK:develop Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants