Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ratio evaluation in 1RDM estimator #1672

Merged
merged 8 commits into from
Jun 28, 2019

Conversation

jtkrogel
Copy link
Contributor

This PR addresses #975.

I did a series of performance profiling runs by rejigging the NiO performance tests to include the 1RDM. These tests showed that a ~4x performance penalty is found for production system sizes when using 1 non-local sample per atom. Higher sampling densities would result in proportional cost increases (2x more samples leads to 2x slowdown). Below is a plot showing the BlockCPU time in DMC vs. number of atoms/samples relative to a run of the same size containing no estimator.

baseline_reltime_vs_atoms_samples

By inserting timers into the 1RDM eval, I was able to see the fraction of time spent on each sub-component. Components include ratio eval, spo eval (labeled "basis" below), and matrix products. As a function of system size, the ratio eval clearly dominates (approaches 100% of the overall time):

baseline_1rdm_fractions

With the performance improvements in this PR (solely due to Ye's evaluateRatiosAlltoOne function), the added cost over and estimator-free run drops to about 30% regardless of system size:

optimized_reltime_vs_atoms_samples

The balance between different evaluation components is now much closer with matrix products and spo evals becoming competitive with ratio evals as the leading source of efficiency loss:

optimized_1rdm_fractions

In production runs, the spo eval and matrix products are likely to become even more significant since the performance tests are limited to an artificially small basis that comprises just the occupied states. In a production setting, spo evals are likely to grow by 2-3x and matrix products possibly by 4-10x. The performance analysis will be redone in a more production setting once support has been added for independent spin up/down bases (which will further strain the spo eval). In any event, the ratio contribution seems to now be handled.

@qmc-robot
Copy link

Can one of the maintainers verify this patch?

@jtkrogel
Copy link
Contributor Author

Should have mentioned: the new code passes the existing 1RDM VMC tests for diamond.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 27, 2019

Could you include your timers as well?
You can find examples in #1664.

{
PosType& Rp = Pq.R[p];
for (int m = 0; m < samples; ++m, ++nm)
Matrix_t& P_nm = *Psi_nm[s];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to fix now but keep in mind of replacing pointers with containers when you are refactoring old codes.

@jtkrogel
Copy link
Contributor Author

Sure, I will add the timers.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

I wrote a wiki page for adding timers https://github.com/QMCPACK/qmcpack/wiki/How-to-add-timers
Please update accordingly.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

use timer_level_fine only. Timers can nest even within the same level. medium is reserved for sections inside drivers.

@prckent
Copy link
Contributor

prckent commented Jun 28, 2019

@ye-luo Please can your move the timers info to the manual and remove from the wiki. The how to add tests section can also be moved/merged. Reason: Trying to have a single place for development info, and this is currently the manual.

@jtkrogel
Copy link
Contributor Author

Is the use of enum + initializer function strictly necessary, or just use of timer_level_fine? I'm missing the usefulness of the former here.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

@prckent
@markdewing and I are still evolving the wiki page and then will move to the manual.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

@jtkrogel There is no initializer function. It is a definition of static const class member data.
Having an array of timers is better than a set of individual timers. Having enum is better than directly using integer indices.

Please do both the enum and timer_level_fine change. Also use scoped timer to replace start and stop embracing the whole function body.

@jtkrogel
Copy link
Contributor Author

@ye-luo The setup_timers function is what I meant by initializer function. IMHO, this (enum, name map, timer list) just adds extra code and layers of naming (which reduces readability) over individual timers. From this point of view, I see no advantage, but if it is the required pattern then I will follow it.

Your documentation does not specify the type definition of myTimers. What is it?

@markdewing
Copy link
Contributor

For the timers, it feels slightly neater to have a single member variable storing all the timers, rather than a number of member variables. This does come at the cost of a little bit more code and an extra enum.

@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

@jtkrogel I updated the wiki. Defining TimerNames to the class is only necessary when multiple constructors exist. In your case, it can be defined local to the constructor.

It is cleaner to have enum and a vector of timers.

@jtkrogel
Copy link
Contributor Author

@ye-luo Done.

src/QMCHamiltonians/DensityMatrices1B.cpp Outdated Show resolved Hide resolved
src/QMCHamiltonians/DensityMatrices1B.cpp Outdated Show resolved Hide resolved
src/QMCHamiltonians/DensityMatrices1B.cpp Outdated Show resolved Hide resolved
src/QMCHamiltonians/DensityMatrices1B.cpp Outdated Show resolved Hide resolved
src/QMCHamiltonians/DensityMatrices1B.cpp Outdated Show resolved Hide resolved
src/QMCHamiltonians/DensityMatrices1B.h Outdated Show resolved Hide resolved
@ye-luo
Copy link
Contributor

ye-luo commented Jun 28, 2019

Okay to test

@prckent
Copy link
Contributor

prckent commented Jun 28, 2019

Really good to see. Dramatic speedup.

@ye-luo ye-luo merged commit 1055f9a into QMCPACK:develop Jun 28, 2019
@jtkrogel jtkrogel added the ECP label Aug 14, 2019
@jtkrogel jtkrogel deleted the ecp_dm_fast branch September 16, 2019 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants