Initial LCAO vgl batched implementation using GEMM #4407

amandadumi · 2023-01-20T21:10:49Z

Proposed changes

This pull request introduces progress towards a batched implementation for the LCAO code. A CPU-only GEMM is introduced for the AO->MO transformation for phi, dphi and d2phi. The API is left unchanged and instead a conversion between old data types and the OffloadVGLArray is done to allow the SPOSet base class to maintain compatibility with the spline code. A separate PR could work towards unifying the API for LCAO and splines.

What type(s) of changes does this code introduce?

New feature
Testing changes (e.g. new unit/integration/performance tests)

Does this introduce a breaking change?

No

What systems has this change been tested on?

He and EtOH which which are incorporated as unit tests.

Checklist

Yes. This PR is up to date with current the current state of 'develop'
Yes. Code added or changed in the PR has been clang-formatted
Yes. This PR adds tests to cover any new code, or to catch a bug that is being fixed
Yes. Documentation has been added (if appropriate)

Lcao vgl offload merge

ye-luo · 2023-01-22T03:03:10Z

Test this please

ye-luo

Please take a look at my changes and let me know if you have questions.

ye-luo · 2023-01-22T03:05:25Z

src/QMCWaveFunctions/tests/test_MO.cpp

+  //elec.update();
+  elec.R[0][0] = 0.0001;
+  elec.R[0][1] = 0.0;
+  elec.R[0][2] = 0.0;


elec.R[0] = {0.0001, 0.0, 0.0}

I have better idea how to do this throughout the code base.

ye-luo · 2023-01-22T03:06:37Z

src/QMCWaveFunctions/tests/test_MO.cpp

+  // auto elec2 = elec.makeClone();
+  sposet->evaluateVGL(elec, 0, psiref_0, dpsiref_0, d2psiref_0);
+
+  REQUIRE(std::real(psiref_0[0]) == Approx(-0.001664403313));


Please change all the value checks from REQUIRE to CHECK

I have better idea how to do this throughout the code base.

ye-luo · 2023-01-22T04:28:31Z

Test this please

ye-luo

LGTM. I changed a lot in the code. So need non-ANL approval.

amandadumi · 2023-01-24T23:41:35Z

Hello Ye, thank you for the review and changes. After taking a look, your changes make sense in terms of removing some of the intermediate objects and more readable/compact initialization of other objects.

PDoakORNL

There is a design issue that OffloadMWVGLArrays need to be multiwalker resources owned around the DiracDeterminantBatched level but are getting defined in method scopes per call.
see: phi_v_g_l in LCAOrbitalSet::mw_evaluateVGL and basis_mw in LCAOrbitalSet::mw_evaluateVGLImplGEMM
There are a bunch of small things here that make this hard to read and are adding to tech debt.

Requiring manipulation of an XMLTree to write a test is awful but beyond the scope of this PR.

PDoakORNL · 2023-01-20T21:39:19Z

src/QMCWaveFunctions/BasisSetBase.h

-  using vgl_type   = VectorSoaContainer<T, OHMMS_DIM + 2>;
-  using vgh_type   = VectorSoaContainer<T, 10>;
-  using vghgh_type = VectorSoaContainer<T, 20>;
+  using value_type        = T;


Type and _type are redundant here the naming convention is types are leading capital mixed case so:
Value implies a type already.
Other types here would be properly named Value Vgl, Vgh, Vghgh

I guess there SoaBasisSetBase::value_type being used. Need to do renaming outside this PR.

PDoakORNL · 2023-01-25T21:57:22Z

src/QMCWaveFunctions/LCAO/LCAOrbitalSet.cpp

+                                                   std::vector<ValueType>& ratios,
+                                                   std::vector<GradType>& grads) const
+{
+  assert(this == &spo_list.getLeader());


I think this needs test coverage.

Will add once I have the shared resource set up.

src/QMCWaveFunctions/tests/test_MO.cpp

src/QMCWaveFunctions/LCAO/LCAOrbitalSet.h

PDoakORNL · 2023-01-25T22:45:48Z

src/QMCWaveFunctions/LCAO/LCAOrbitalSet.h

@@ -30,10 +30,11 @@ namespace qmcplusplus
 struct LCAOrbitalSet : public SPOSet


This is a class

src/QMCWaveFunctions/LCAO/SoaLocalizedBasisSet.cpp

src/QMCWaveFunctions/LCAO/LCAOrbitalSet.cpp

prckent · 2023-01-26T16:18:18Z

What are the thoughts on getting useful performance either with or without updating the old data structures? If the old structures have the wrong layout, any (re)mapping is an undesired cost that will presumably have to be dealt with "sometime".

prckent · 2023-01-26T16:34:15Z

Happy to merge but definitely interested in the answer to my above question.

I would not be surprised if changing the data layouts in the old code improved its performance. If changes like this are needed, please keep an eye on the carbon molecule performance tests.

ye-luo · 2023-01-26T16:36:59Z

Test this please

ye-luo · 2023-01-26T16:42:32Z

What are the thoughts on getting useful performance either with or without updating the old data structures? If the old structures have the wrong layout, any (re)mapping is an undesired cost that will presumably have to be dealt with "sometime

We are still exploring right now. The new layout makes sense only when walkers are batched I assume. Before this PR, we only have one type of implementation (splines) in the new layout. Now we have two. There will be refactoring in the use side as well. Before it becomes clear what are the most suitable APIs, I'd like to hold massive update to the rest of the code base.

Addresses QMCPACK#4666. mw_evaluate_notranspose and evaluate_notranspose don't produce bitwise identical results. The problem started by QMCPACK#4407 which introduce a batched evaluation and caused bitwise matrix value checking failure. I printed diff and found the value is at the epsilon of float or double. So no concern. The intention of the assertion was to catch any missing D2H transfer earlier and there is no need of recomputing. So the solution is removing the re-computation.

amandadumi and others added 26 commits January 13, 2023 12:40

addining initial lcao mw function from spo

7aa3e32

comment framework, build vgl temp matrix

2d53994

Add blas call and block diagonal MO coeffs

04b0615

adding vglanddetratiosgrads function and notes

1fb2dc9

Add impl2 and move transpose of data there

b6078dd

Format and delete comments

069bdb1

cleaning functions of extra comments and declarations

8cd6b4c

Adding mw_evaluateVGL to SoaLocalizedBasis

a2e10c3

resolving compilation errors

9439898

rough gemm information

1fa8711

Fix compilation

e5a8413

test mw start

3ec7f15

lcao_vgl_mw from Amanda

523c670

resized some vectors and added a few reference values

9fca749

added second walker to test

a0e58cf

added larger test and fixed gemm call

e5b485b

fixed array size (AO/MO problem)

1a51180

fixed grad copy

2737c07

minor cleaning

5ea35e7

renaming

d25acfb

clang-format

0e0fa6b

add GTO test

32758f7

Merge pull request #13 from kgasperich/lcao_vgl_offload_3

bebebcf

Lcao vgl offload merge

comment removal and impl2 -> mw_impl function rename

ddb042f

Merge branch 'develop' into lcao_vgl_offload

2b170bf

Remove temp_vgl

838af8d

ye-luo force-pushed the lcao_vgl_offload branch from 8b08510 to e9231a4 Compare January 22, 2023 03:01

ye-luo reviewed Jan 22, 2023

View reviewed changes

ye-luo changed the title ~~Lcao vgl offload~~ Intial LCAO vgl batched implementation using GEMM Jan 22, 2023

Simplify LCAOrbitalSet::mw_evaluateVGLImplGEMM

602599f

ye-luo force-pushed the lcao_vgl_offload branch from e9231a4 to 602599f Compare January 22, 2023 04:27

ye-luo reviewed Jan 24, 2023

View reviewed changes

PDoakORNL requested changes Jan 25, 2023

View reviewed changes

ye-luo added 3 commits January 25, 2023 23:12

Replace xmlChar with castCharToXMLChar

e35086d

Remove typedef MWVGLArray in SPOSet.

c4b54d6

Flip if case and reduce lines.

c43f8f3

PDoakORNL approved these changes Jan 26, 2023

View reviewed changes

Merge branch 'develop' into lcao_vgl_offload

7f96fa1

prckent enabled auto-merge January 26, 2023 16:34

prckent merged commit d324656 into QMCPACK:develop Jan 26, 2023

ye-luo changed the title ~~Intial LCAO vgl batched implementation using GEMM~~ Initial LCAO vgl batched implementation using GEMM Sep 29, 2023

ye-luo mentioned this pull request Sep 29, 2023

Fix assertion failure #4752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial LCAO vgl batched implementation using GEMM #4407

Initial LCAO vgl batched implementation using GEMM #4407

amandadumi commented Jan 20, 2023

ye-luo commented Jan 22, 2023

ye-luo left a comment

ye-luo Jan 22, 2023

ye-luo Jan 24, 2023

ye-luo Jan 22, 2023

ye-luo Jan 24, 2023

ye-luo commented Jan 22, 2023

ye-luo left a comment

amandadumi commented Jan 24, 2023

PDoakORNL left a comment

PDoakORNL Jan 20, 2023

ye-luo Jan 25, 2023 •

edited

Loading

PDoakORNL Jan 25, 2023

ye-luo Jan 26, 2023

PDoakORNL Jan 25, 2023

prckent commented Jan 26, 2023

prckent commented Jan 26, 2023

ye-luo commented Jan 26, 2023

ye-luo commented Jan 26, 2023

		@@ -30,10 +30,11 @@ namespace qmcplusplus
		struct LCAOrbitalSet : public SPOSet

Initial LCAO vgl batched implementation using GEMM #4407

Initial LCAO vgl batched implementation using GEMM #4407

Conversation

amandadumi commented Jan 20, 2023

Proposed changes

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

ye-luo commented Jan 22, 2023

ye-luo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ye-luo commented Jan 22, 2023

ye-luo left a comment

Choose a reason for hiding this comment

amandadumi commented Jan 24, 2023

PDoakORNL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ye-luo Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prckent commented Jan 26, 2023

prckent commented Jan 26, 2023

ye-luo commented Jan 26, 2023

ye-luo commented Jan 26, 2023

ye-luo Jan 25, 2023 •

edited

Loading