-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ratio evaluation in 1RDM estimator #1672
Conversation
Can one of the maintainers verify this patch? |
Should have mentioned: the new code passes the existing 1RDM VMC tests for diamond. |
Could you include your timers as well? |
{ | ||
PosType& Rp = Pq.R[p]; | ||
for (int m = 0; m < samples; ++m, ++nm) | ||
Matrix_t& P_nm = *Psi_nm[s]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary to fix now but keep in mind of replacing pointers with containers when you are refactoring old codes.
Sure, I will add the timers. |
I wrote a wiki page for adding timers https://github.com/QMCPACK/qmcpack/wiki/How-to-add-timers |
use timer_level_fine only. Timers can nest even within the same level. medium is reserved for sections inside drivers. |
@ye-luo Please can your move the timers info to the manual and remove from the wiki. The how to add tests section can also be moved/merged. Reason: Trying to have a single place for development info, and this is currently the manual. |
Is the use of enum + initializer function strictly necessary, or just use of timer_level_fine? I'm missing the usefulness of the former here. |
@prckent |
@jtkrogel There is no initializer function. It is a definition of static const class member data. Please do both the enum and timer_level_fine change. Also use scoped timer to replace start and stop embracing the whole function body. |
@ye-luo The setup_timers function is what I meant by initializer function. IMHO, this (enum, name map, timer list) just adds extra code and layers of naming (which reduces readability) over individual timers. From this point of view, I see no advantage, but if it is the required pattern then I will follow it. Your documentation does not specify the type definition of myTimers. What is it? |
For the timers, it feels slightly neater to have a single member variable storing all the timers, rather than a number of member variables. This does come at the cost of a little bit more code and an extra enum. |
@jtkrogel I updated the wiki. Defining TimerNames to the class is only necessary when multiple constructors exist. In your case, it can be defined local to the constructor. It is cleaner to have enum and a vector of timers. |
@ye-luo Done. |
Okay to test |
Really good to see. Dramatic speedup. |
This PR addresses #975.
I did a series of performance profiling runs by rejigging the NiO performance tests to include the 1RDM. These tests showed that a ~4x performance penalty is found for production system sizes when using 1 non-local sample per atom. Higher sampling densities would result in proportional cost increases (2x more samples leads to 2x slowdown). Below is a plot showing the BlockCPU time in DMC vs. number of atoms/samples relative to a run of the same size containing no estimator.
By inserting timers into the 1RDM eval, I was able to see the fraction of time spent on each sub-component. Components include ratio eval, spo eval (labeled "basis" below), and matrix products. As a function of system size, the ratio eval clearly dominates (approaches 100% of the overall time):
With the performance improvements in this PR (solely due to Ye's evaluateRatiosAlltoOne function), the added cost over and estimator-free run drops to about 30% regardless of system size:
The balance between different evaluation components is now much closer with matrix products and spo evals becoming competitive with ratio evals as the leading source of efficiency loss:
In production runs, the spo eval and matrix products are likely to become even more significant since the performance tests are limited to an artificially small basis that comprises just the occupied states. In a production setting, spo evals are likely to grow by 2-3x and matrix products possibly by 4-10x. The performance analysis will be redone in a more production setting once support has been added for independent spin up/down bases (which will further strain the spo eval). In any event, the ratio contribution seems to now be handled.