Optimize ratio evaluation in 1RDM estimator #1672

jtkrogel · 2019-06-27T20:57:30Z

This PR addresses #975.

I did a series of performance profiling runs by rejigging the NiO performance tests to include the 1RDM. These tests showed that a ~4x performance penalty is found for production system sizes when using 1 non-local sample per atom. Higher sampling densities would result in proportional cost increases (2x more samples leads to 2x slowdown). Below is a plot showing the BlockCPU time in DMC vs. number of atoms/samples relative to a run of the same size containing no estimator.

By inserting timers into the 1RDM eval, I was able to see the fraction of time spent on each sub-component. Components include ratio eval, spo eval (labeled "basis" below), and matrix products. As a function of system size, the ratio eval clearly dominates (approaches 100% of the overall time):

$baseline_1rdm_fractions$

With the performance improvements in this PR (solely due to Ye's evaluateRatiosAlltoOne function), the added cost over and estimator-free run drops to about 30% regardless of system size:

The balance between different evaluation components is now much closer with matrix products and spo evals becoming competitive with ratio evals as the leading source of efficiency loss:

$optimized_1rdm_fractions$

In production runs, the spo eval and matrix products are likely to become even more significant since the performance tests are limited to an artificially small basis that comprises just the occupied states. In a production setting, spo evals are likely to grow by 2-3x and matrix products possibly by 4-10x. The performance analysis will be redone in a more production setting once support has been added for independent spin up/down bases (which will further strain the spo eval). In any event, the ratio contribution seems to now be handled.

qmc-robot · 2019-06-27T20:59:04Z

Can one of the maintainers verify this patch?

jtkrogel · 2019-06-27T21:01:41Z

Should have mentioned: the new code passes the existing 1RDM VMC tests for diamond.

ye-luo · 2019-06-27T21:08:25Z

Could you include your timers as well?
You can find examples in #1664.

ye-luo · 2019-06-27T21:09:35Z

src/QMCHamiltonians/DensityMatrices1B.cpp

    {
-      PosType& Rp = Pq.R[p];
-      for (int m = 0; m < samples; ++m, ++nm)
+      Matrix_t& P_nm = *Psi_nm[s];


Not necessary to fix now but keep in mind of replacing pointers with containers when you are refactoring old codes.

jtkrogel · 2019-06-28T10:28:21Z

Sure, I will add the timers.

ye-luo · 2019-06-28T14:06:22Z

I wrote a wiki page for adding timers https://github.com/QMCPACK/qmcpack/wiki/How-to-add-timers
Please update accordingly.

ye-luo · 2019-06-28T14:07:56Z

use timer_level_fine only. Timers can nest even within the same level. medium is reserved for sections inside drivers.

prckent · 2019-06-28T14:11:36Z

@ye-luo Please can your move the timers info to the manual and remove from the wiki. The how to add tests section can also be moved/merged. Reason: Trying to have a single place for development info, and this is currently the manual.

jtkrogel · 2019-06-28T14:16:15Z

Is the use of enum + initializer function strictly necessary, or just use of timer_level_fine? I'm missing the usefulness of the former here.

ye-luo · 2019-06-28T14:19:15Z

@prckent
@markdewing and I are still evolving the wiki page and then will move to the manual.

ye-luo · 2019-06-28T14:22:45Z

@jtkrogel There is no initializer function. It is a definition of static const class member data.
Having an array of timers is better than a set of individual timers. Having enum is better than directly using integer indices.

Please do both the enum and timer_level_fine change. Also use scoped timer to replace start and stop embracing the whole function body.

jtkrogel · 2019-06-28T14:31:00Z

@ye-luo The setup_timers function is what I meant by initializer function. IMHO, this (enum, name map, timer list) just adds extra code and layers of naming (which reduces readability) over individual timers. From this point of view, I see no advantage, but if it is the required pattern then I will follow it.

Your documentation does not specify the type definition of myTimers. What is it?

markdewing · 2019-06-28T14:48:16Z

For the timers, it feels slightly neater to have a single member variable storing all the timers, rather than a number of member variables. This does come at the cost of a little bit more code and an extra enum.

ye-luo · 2019-06-28T14:52:30Z

@jtkrogel I updated the wiki. Defining TimerNames to the class is only necessary when multiple constructors exist. In your case, it can be defined local to the constructor.

It is cleaner to have enum and a vector of timers.

jtkrogel · 2019-06-28T15:58:54Z

@ye-luo Done.

src/QMCHamiltonians/DensityMatrices1B.cpp

src/QMCHamiltonians/DensityMatrices1B.h

ye-luo · 2019-06-28T16:38:36Z

Okay to test

prckent · 2019-06-28T18:56:26Z

Really good to see. Dramatic speedup.

jtkrogel added 2 commits June 27, 2019 08:48

add timers to 1rdm

7fff4b5

optimize ratio eval in 1rdm estimator

e5d6b42

ye-luo reviewed Jun 27, 2019

View reviewed changes

jtkrogel added 2 commits June 28, 2019 06:37

merge timers

a0238cf

cleanup timers

84087f7

meet timer requirements

9f976ab

ye-luo requested changes Jun 28, 2019

View reviewed changes

jtkrogel added 2 commits June 28, 2019 12:22

completely remove old timers

8741f81

meet even pickier timer requirements

c713ab8

ye-luo approved these changes Jun 28, 2019

View reviewed changes

Merge branch 'develop' into ecp_dm_fast

5b266b3

ye-luo merged commit 1055f9a into QMCPACK:develop Jun 28, 2019

jtkrogel added the ECP label Aug 14, 2019

jtkrogel mentioned this pull request Aug 14, 2019

Efficiency of 1RDM needs to be improved #975

Closed

jtkrogel deleted the ecp_dm_fast branch September 16, 2019 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ratio evaluation in 1RDM estimator #1672

Optimize ratio evaluation in 1RDM estimator #1672

jtkrogel commented Jun 27, 2019

qmc-robot commented Jun 27, 2019

jtkrogel commented Jun 27, 2019

ye-luo commented Jun 27, 2019

ye-luo Jun 27, 2019

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

ye-luo commented Jun 28, 2019

prckent commented Jun 28, 2019

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

ye-luo commented Jun 28, 2019 •

edited

Loading

jtkrogel commented Jun 28, 2019

markdewing commented Jun 28, 2019

ye-luo commented Jun 28, 2019 •

edited

Loading

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

prckent commented Jun 28, 2019

Optimize ratio evaluation in 1RDM estimator #1672

Optimize ratio evaluation in 1RDM estimator #1672

Conversation

jtkrogel commented Jun 27, 2019

qmc-robot commented Jun 27, 2019

jtkrogel commented Jun 27, 2019

ye-luo commented Jun 27, 2019

ye-luo Jun 27, 2019

Choose a reason for hiding this comment

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

ye-luo commented Jun 28, 2019

prckent commented Jun 28, 2019

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

ye-luo commented Jun 28, 2019 • edited Loading

jtkrogel commented Jun 28, 2019

markdewing commented Jun 28, 2019

ye-luo commented Jun 28, 2019 • edited Loading

jtkrogel commented Jun 28, 2019

ye-luo commented Jun 28, 2019

prckent commented Jun 28, 2019

ye-luo commented Jun 28, 2019 •

edited

Loading

ye-luo commented Jun 28, 2019 •

edited

Loading