Delayed update on CPU #1170

ye-luo · 2018-11-25T21:29:27Z

Closes #1152
This feature can be enabled with all the QMC methods implemented in QMCPACK
via <slaterdeterminant delay_rank="32">

Manual, unit test and integration tests are added. The manual needs further update to add references.

Performance tests are set with delay_rank="32". Currently the input files are generated on the fly.
If they needed to be stored in the repo, I will add them after the review in order to reduce the amount of files.

Fix VMC and DMC TODO fix RMC and CorrelatedSampling.

Options: -d -n num_els -k delay -i iteration

prckent

Thank you.

Some quick and incomplete feedback:

Structs with functionality should be classes
Code disabled behind preprocessor macros should be either unified behind an DELAYED_DEBUG setting or removed. ( We should have a note in the coding standards about this. )
~~Tests should include delays that are not a factor of the electron count.~~ EDIT: I see this is done. Perhaps a delay of 3 so that there are 2 updates and one partial in the diamond 2x2x1 case?
The documentation should reference the delayed update and Fahy papers, where mentioned. There is also a statement that the delay should be a multiple of the SIMD length; I think this is only a performance recommendation.
I am not keen on increasing the number of codes in Sandbox. These are undocumented and a bit of a wasteland. We might discuss a better location. Is this a utility that could be run to assess the delay?

ye-luo · 2018-11-26T14:40:51Z

Will do. I will be more strict on public/private.
Could you comment in-source? I'm confused about which macro you refer to.
The diamond test has 8 electrons in each spin and the delay was made 5.
Will do.
Yes. It can be used to access the delay. However, it is not recommended for users because it cannot accurately reflect the code performance due to the lack of SPO evaluation. When it is moved to miniQMC, it will be a useful tool. Its setting is consistent with the other code in the Sandbox. I prefer leaving it in Sandbox.

markdewing · 2018-11-26T17:16:54Z

src/QMCWaveFunctions/Fermion/SlaterDetBuilder.cpp

-  adet->set(firstIndex,lastIndex-firstIndex);
+  if( delay_rank<=0 || delay_rank>lastIndex-firstIndex )
+  {
+    APP_ABORT("SlaterDetBuilder::putDeterminant delay_rank must be positive and no larger than the electron count within a determinant!\n");


The error message should print the values of delay_rank and lastIndex-firstIndex.

PDoakORNL · 2018-11-27T15:26:32Z

I do not want approve this until your delayed update work is hidden by an abstraction that preserves the previous functionality. Am I missing that, at first read I just see Dirac determinant that now works differently it doesn't look like I can retrieve row update behavior. I can not A/B the two different methods for the same commit. A feature completely replacing a previous feature should demonstrate identical or better test compliance on the same commit. The old feature can then be removed if necessary.

Work like this should be behind a feature switch preserving exisiting implementation and merged several times a week, this will allow other work to be done and force the new feature to sync with other work done in the codebase instead of motivating developers to block other work.

ye-luo · 2018-11-27T15:33:12Z

The previous rank-1 is within this implementation. it is IDENTICAL to the old code if rank_delay=1. There is no need to have two implementation if one can do both work. All the existing unit tests are still valid and rank_delay>1 unit test is also added.

prckent · 2018-11-27T16:38:16Z

Comments focus on making DelayedUpdate more accessible to our future selves:

Doxygen comments needed for all functions in DelayedUpdate.h
Describe all variables in the state machine in DelayedUpdate.h
Describe how the state machine works in DelayedUpdate.h (in the description of one of the functions would be fine)
Is there a tested reason for the large functions with inline hints in DelayedUpdate.h? The moment these are non-trivial in size or have function calls such as BLAS, they should probably be in a .cpp. We do this a lot across the application.
Add comments about the logic in DelayedUpdate.h. e.g. What does delay_count=0 mean/do?
Please can you comment the unit test. How are delays checked? (The obvious thing to do here is update one set of matrices rank 1 and another with a delay, and check the ratios, gradients etc.)
set should be called something more descriptive. e.g. resetFirstParticleAndDelay
This looks to be direct translation of Tyler's algorithm+implementation (great). Any sneaky extra tricks for efficiency not in that paper should be highlighted.

More general Q:

Do we ever resize a determinant in the entire application? (Not relevant for this PR, but it is extra complexity that might not be used.)

ye-luo · 2018-11-27T20:27:02Z

Hopefully I put sufficient documentation.

inline is only a hint to the compiler, it is the compiler's decision to inline a function or not. inline a function with some BLAS call in it doesn't mean inline the BLAS functions but only put the line of BLAS call in the caller routine.
DelayedUpdate is a template function. Putting some of it in .cpp is messy. All the functions here are quite light weight.
What is the set() function you refer to? The one in the DiracDeterminant class? That is a virtual function and I'm expecting to change the class spaghetti in a separate PR.
The math may look familiar to you but this is NOT Tyler's algorithm+implementation at all. I started with the Fahy's variant and solve all the drawback in Tyler's algorithm+implementation. There is no look-ahead needed which is impractical to use in DMC. There is no factorization of any matrix to compute the determinant. There is no fake acceptance needed. My algorithm has zero lines of code dealing with rejecting move. You will see more detail in the upcoming paper. This is first time having the full algorithm working for VMC/DMC/RMC. Although I also start originally from SMW identity like Tyler did, 'direct translation' is a wrong quote.

prckent · 2018-11-27T21:12:51Z

Thanks Ye. Yes set() is part of a larger mess.

I think short-diamondC_2x1x1_pp-delayed_update-vmc_sdj-1-16* needs test labels added. They report as "quality unknown". Happily this version already runs a smidge faster than the non-delayed version, at least on oxygen.

prckent · 2018-11-27T21:14:02Z

What are the rules around calling completeUpdates ?

ye-luo · 2018-11-27T21:47:17Z

We should not reference performance on small tests unless it is 2x or more. Once the PR is merged, the performance tests will reflect the performance gain.
tags are not controlled by me and I think we will visit later when categorizing more tests.
I added comment in WaveFunctionComponent.h about completeUpdates. It must be called after each substep or step during pbyp move. For each step, we will evaluate hamiltonian and the inverse matrix must be updated. The algorithm allows moving electron in any order but doesn't allow delay the same electron twice. For this reason, it must be called also after each substep.
To the previous question: we never resize a determinant in the entire application.

ye-luo · 2018-11-28T15:00:20Z

Can we merge this? The change in this PR is quite local and clean.

prckent · 2018-11-28T15:19:06Z

Will discuss with @PDoakORNL this afternoon. Comment to follow.

ye-luo and others added 30 commits October 30, 2017 00:08

Implement delayed update.

3a763cc

Use FP for matrix inversion + a bit cleaning.

dc376ea

Engage timer in restore for a reject move.

2b435f3

Merge branch 'remove-ParticleBase' into delayed-update

1912b88

Merge branch 'remove-ParticleBase' into delayed-update

573d513

add delay switch in acceptMove for Tmove.

8eaa60f

Add completeUpdates to deplete delayed updates.

2c52eb3

Fix VMC and DMC TODO fix RMC and CorrelatedSampling.

Merge branch 'remove-ParticleBase' into delayed-update

ebc4df7

Merge branch 'rebuild-master' into delayed-update

896dad4

Complete adding completeUpdates.

b5b9da4

Old way is better for recomputing.

ce052c0

Merge branch 'rebuild-master' into delayed-update

93ca05f

Merge branch 'rebuild-master' into delayed-update

13ee23d

Timer not needed for reject.

fb2265e

Need to call completeUpdates at every substep.

fcdf409

Merge branch 'rebuild-master' into delayed-update

dc8e596

Merge branch 'rebuild-master' into delayed-update

44fb110

Change acceptMove calls and add PS's name.

a4288bd

Revert adding bool to acceptMove.

c388e6e

Minor changes.

79aed0b

curRatio cannot be used affter acceptMove.

78bbb9f

Make one electron update brief.

605f4c4

Add a unit test for delayed update.

a5869b5

Fix cplx MP build.

af40516

Add an miniapp to test delayed update using DiracMatrix.

1ee1047

Options: -d -n num_els -k delay -i iteration

Correct typo.

1e99df2

Add the last accept call.

b244754

Printout more for post processing.

1aa3dba

Add timer on the last accept call.

e4c5666

Use psiM0 for debugging.

3233af2

ye-luo added 2 commits November 25, 2018 14:27

Merge remote-tracking branch 'github/develop' into delayed-update-merge

e318bc4

Remove unused argument in completeUpdates.

a1f1760

ghost assigned ye-luo Nov 25, 2018

ghost added the in progress label Nov 25, 2018

prckent requested changes Nov 26, 2018

View reviewed changes

ye-luo requested review from atillack and PDoakORNL November 26, 2018 14:49

ye-luo added 2 commits November 26, 2018 09:48

Change struct to class.

19289dd

Add references and tweak words in manual.

2415dac

markdewing reviewed Nov 26, 2018

View reviewed changes

More elaborate error message.

2ecd856

prckent mentioned this pull request Nov 27, 2018

Remove and cleanup OMP naming #1178

Merged

ye-luo added 2 commits November 27, 2018 14:06

Update documentation in DelayedUpdate and its test.

7dc2cc6

Merge remote-tracking branch 'github/develop' into delayed-update-merge

7b8c6cc

ye-luo added 2 commits November 27, 2018 15:38

Update comment.

3b5b236

Merge remote-tracking branch 'github/develop' into delayed-update-merge

95ec0ee

prckent approved these changes Nov 28, 2018

View reviewed changes

PDoakORNL merged commit 90b9f83 into QMCPACK:develop Nov 28, 2018

ghost removed the in progress label Nov 28, 2018

ye-luo deleted the delayed-update-merge branch November 29, 2018 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed update on CPU #1170

Delayed update on CPU #1170

ye-luo commented Nov 25, 2018 •

edited

Loading

prckent left a comment •

edited

Loading

ye-luo commented Nov 26, 2018 •

edited

Loading

markdewing Nov 26, 2018

PDoakORNL commented Nov 27, 2018 •

edited

Loading

ye-luo commented Nov 27, 2018 •

edited

Loading

prckent commented Nov 27, 2018

ye-luo commented Nov 27, 2018 •

edited

Loading

prckent commented Nov 27, 2018 •

edited

Loading

prckent commented Nov 27, 2018

ye-luo commented Nov 27, 2018 •

edited

Loading

ye-luo commented Nov 28, 2018

prckent commented Nov 28, 2018 •

edited

Loading

Delayed update on CPU #1170

Delayed update on CPU #1170

Conversation

ye-luo commented Nov 25, 2018 • edited Loading

prckent left a comment • edited Loading

Choose a reason for hiding this comment

ye-luo commented Nov 26, 2018 • edited Loading

markdewing Nov 26, 2018

Choose a reason for hiding this comment

PDoakORNL commented Nov 27, 2018 • edited Loading

ye-luo commented Nov 27, 2018 • edited Loading

prckent commented Nov 27, 2018

ye-luo commented Nov 27, 2018 • edited Loading

prckent commented Nov 27, 2018 • edited Loading

prckent commented Nov 27, 2018

ye-luo commented Nov 27, 2018 • edited Loading

ye-luo commented Nov 28, 2018

prckent commented Nov 28, 2018 • edited Loading

ye-luo commented Nov 25, 2018 •

edited

Loading

prckent left a comment •

edited

Loading

ye-luo commented Nov 26, 2018 •

edited

Loading

PDoakORNL commented Nov 27, 2018 •

edited

Loading

ye-luo commented Nov 27, 2018 •

edited

Loading

ye-luo commented Nov 27, 2018 •

edited

Loading

prckent commented Nov 27, 2018 •

edited

Loading

ye-luo commented Nov 27, 2018 •

edited

Loading

prckent commented Nov 28, 2018 •

edited

Loading