Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMC densities are incorrect in GPU code (all blocks) #925

Closed
kayahans opened this issue Jul 12, 2018 · 5 comments
Closed

DMC densities are incorrect in GPU code (all blocks) #925

kayahans opened this issue Jul 12, 2018 · 5 comments

Comments

@kayahans
Copy link
Contributor

kayahans commented Jul 12, 2018

I have tested DMC spindensity estimator with GPU and CPU codes in versions 3.1.1 and current development version. Using this simple script on a single twist calculation:
#! /usr/bin/env python
from hdfreader import read_hdf
filepath = 'dmc.g000.s002.stat.h5'
h = read_hdf(filepath)
print h.SpinDensity.u.value.sum(1)

Here is the output for each case (it should be 16 for each block):
cpu-311 [ 16.00550901 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. ]
cpu-develop [ 15.99373525 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. ]
gpu-311 [ 0.12613781 0.12376447 0.1187906 0.11754321 0.11837651 0.11945592
0.11947338 0.11771658 0.1161087 0.11632172 0.11595293 0.11496973
0.11204708 0.11384222 0.11595724 0.11784532 0.11707219 0.11672886
0.11673123 0.11725266 0.11504593 0.11623489 0.11404658 0.11113407
0.11399625 0.11532021 0.11689632 0.11657726 0.11866984 0.11862309
0.12080899 0.12248093 0.12308827 0.12301397 0.12414156 0.12455991
0.1260973 0.12549918 0.12576123 0.12229608 0.12230717 0.1243257
0.12264589 0.11970655 0.1169964 0.1155186 0.11759816 0.11708864
0.11888322 0.11944484 0.11959139 0.11958027 0.11924531 0.11960212
0.12210179 0.12327472 0.12791366 0.12837054 0.12975742 0.12908994
0.12739059 0.12762298 0.12740625 0.12845524 0.12944754 0.12874439
0.12956315 0.12529014 0.1173576 0.11685809 0.11804023 0.11790164
0.11870594 0.1219199 0.12362629 0.12639787 0.12991182 0.13421265
0.1366212 0.13808321 0.14078649 0.14502023 0.14926596 0.14701942
0.14722245 0.14588587 0.14709172 0.14586687 0.14376388 0.14476116
0.14450989 0.14293135 0.14153992 0.13755695 0.13658413 0.13216134
0.13168911 0.13209738 0.1257418 0.12365869]
gpu-develop [ 0.12589822 0.12634761 0.12618541 0.12901541 0.1265045 0.12512411
0.12428065 0.12446284 0.12658088 0.12669741 0.12754324 0.12602593
0.12457375 0.12424426 0.12469649 0.12487053 0.12459129 0.12548022
0.12348875 0.12313658 0.12467747 0.12322966 0.12358252 0.12349504
0.12455383 0.12961697 0.1279566 0.12621988 0.12669181 0.12627727
0.12721579 0.12955098 0.13028077 0.1308403 0.12846138 0.12648746
0.12946213 0.1345646 0.13568029 0.13684765 0.13983087 0.13783411
0.13441611 0.13930966 0.13973249 0.1409633 0.14030851 0.13808229
0.13573585 0.13413562 0.13208948 0.13133561 0.12956655 0.13034794
0.1299758 0.12805457 0.12738646 0.12788442 0.12909631 0.13078968
0.12964241 0.12868171 0.12743678 0.12507229 0.12360047 0.12153591
0.11878197 0.11774096 0.12004414 0.12102948 0.12462876 0.12414033
0.12482961 0.12432123 0.12538308 0.1271849 0.1286157 0.13050765
0.13043534 0.13311337 0.13414315 0.13229672 0.13075239 0.1317358
0.13016314 0.13125571 0.13140014 0.13193945 0.13322854 0.13198081
0.12813557 0.13073914 0.12879043 0.12825164 0.13052043 0.13012468
0.1271668 0.12530829 0.12205993 0.1209168 ]

In contrast, spin density estimator in VMC was fine in all cases, producing no such errors. Please let me know if you would like to have access to the test files.

@jtkrogel jtkrogel added the bug label Jul 12, 2018
@jtkrogel
Copy link
Contributor

Also of note: on the CPU side there is some irregularity (bug) in the weights for the first block.

@PDoakORNL PDoakORNL self-assigned this Jul 12, 2018
@prckent prckent changed the title Broken DMC density with GPU code DMC densities are incorrect in CPU code (first block) and GPU code (all blocks) Jul 18, 2018
@prckent
Copy link
Contributor

prckent commented Jul 18, 2018

Updated the title to note that there are at least two bugs here:

  1. The normalization of the density in the first block of the DMC code (CPU implementation) is incorrect
  2. The DMC density normalization is incorrect for all blocks in the GPU code. (The normalization looks to be off by a factor ~128 which depending on the run might be the total walker weight).

These have been broken since at least 3.1.1. Since these parts of the code have been very static, the bugs have likely existed since <2017, i.e. pre GitHub. It remains to be determined if the problem is only normalization related, particularly GPU side.

@PDoakORNL
Copy link
Contributor

I'm hoping to fix this as I port the GPU code to the SoA runtime branch of qmcpack. As of right now I have no special insight into its presence in the AoS based GPU implementation. If it needs to be fixed in the AoS implementation we should probably reassign as I'm less familiar with that pathway through the code.

@prckent
Copy link
Contributor

prckent commented Jul 18, 2018

I created a separate issue for the CPU aspects.

@prckent prckent changed the title DMC densities are incorrect in CPU code (first block) and GPU code (all blocks) DMC densities are incorrect in GPU code (all blocks) Jul 18, 2018
@prckent prckent added this to the V3.5.1 Release milestone Aug 1, 2018
@PDoakORNL PDoakORNL added the to do label Aug 9, 2018
@jtkrogel jtkrogel mentioned this issue Sep 11, 2018
3 tasks
@prckent prckent modified the milestones: V3.5.1 Release, V3.6.0 Release Nov 7, 2018
@prckent prckent added the gpu label Nov 8, 2018
@prckent prckent modified the milestones: V3.7.0 Release, V3.8.0 Release Mar 15, 2019
@PDoakORNL PDoakORNL removed this from the V3.8.0 Release milestone Jun 27, 2019
@prckent prckent added wontfix and removed to do labels Oct 11, 2019
@prckent
Copy link
Contributor

prckent commented Oct 11, 2019

Currently there is no plan to fix this since the new GPU capabilities that are being built should not have the same problem. If truly critical it could be fixed, but the lifetime of this code and its unique bugs is now short.

The plan is to delete the legacy CUDA implementation, delete the legacy AoS CPU implementation, and concentrate on adding GPU capabilities to the "SoA" CPU code in a fully compatible manner.

@prckent prckent closed this as completed Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants