New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMC densities are incorrect in GPU code (all blocks) #925

Open
kayahans opened this Issue Jul 12, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@kayahans
Contributor

kayahans commented Jul 12, 2018

I have tested DMC spindensity estimator with GPU and CPU codes in versions 3.1.1 and current development version. Using this simple script on a single twist calculation:
#! /usr/bin/env python
from hdfreader import read_hdf
filepath = 'dmc.g000.s002.stat.h5'
h = read_hdf(filepath)
print h.SpinDensity.u.value.sum(1)

Here is the output for each case (it should be 16 for each block):
cpu-311 [ 16.00550901 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. ]
cpu-develop [ 15.99373525 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. 16. 16. 16. 16. 16.
16. 16. ]
gpu-311 [ 0.12613781 0.12376447 0.1187906 0.11754321 0.11837651 0.11945592
0.11947338 0.11771658 0.1161087 0.11632172 0.11595293 0.11496973
0.11204708 0.11384222 0.11595724 0.11784532 0.11707219 0.11672886
0.11673123 0.11725266 0.11504593 0.11623489 0.11404658 0.11113407
0.11399625 0.11532021 0.11689632 0.11657726 0.11866984 0.11862309
0.12080899 0.12248093 0.12308827 0.12301397 0.12414156 0.12455991
0.1260973 0.12549918 0.12576123 0.12229608 0.12230717 0.1243257
0.12264589 0.11970655 0.1169964 0.1155186 0.11759816 0.11708864
0.11888322 0.11944484 0.11959139 0.11958027 0.11924531 0.11960212
0.12210179 0.12327472 0.12791366 0.12837054 0.12975742 0.12908994
0.12739059 0.12762298 0.12740625 0.12845524 0.12944754 0.12874439
0.12956315 0.12529014 0.1173576 0.11685809 0.11804023 0.11790164
0.11870594 0.1219199 0.12362629 0.12639787 0.12991182 0.13421265
0.1366212 0.13808321 0.14078649 0.14502023 0.14926596 0.14701942
0.14722245 0.14588587 0.14709172 0.14586687 0.14376388 0.14476116
0.14450989 0.14293135 0.14153992 0.13755695 0.13658413 0.13216134
0.13168911 0.13209738 0.1257418 0.12365869]
gpu-develop [ 0.12589822 0.12634761 0.12618541 0.12901541 0.1265045 0.12512411
0.12428065 0.12446284 0.12658088 0.12669741 0.12754324 0.12602593
0.12457375 0.12424426 0.12469649 0.12487053 0.12459129 0.12548022
0.12348875 0.12313658 0.12467747 0.12322966 0.12358252 0.12349504
0.12455383 0.12961697 0.1279566 0.12621988 0.12669181 0.12627727
0.12721579 0.12955098 0.13028077 0.1308403 0.12846138 0.12648746
0.12946213 0.1345646 0.13568029 0.13684765 0.13983087 0.13783411
0.13441611 0.13930966 0.13973249 0.1409633 0.14030851 0.13808229
0.13573585 0.13413562 0.13208948 0.13133561 0.12956655 0.13034794
0.1299758 0.12805457 0.12738646 0.12788442 0.12909631 0.13078968
0.12964241 0.12868171 0.12743678 0.12507229 0.12360047 0.12153591
0.11878197 0.11774096 0.12004414 0.12102948 0.12462876 0.12414033
0.12482961 0.12432123 0.12538308 0.1271849 0.1286157 0.13050765
0.13043534 0.13311337 0.13414315 0.13229672 0.13075239 0.1317358
0.13016314 0.13125571 0.13140014 0.13193945 0.13322854 0.13198081
0.12813557 0.13073914 0.12879043 0.12825164 0.13052043 0.13012468
0.1271668 0.12530829 0.12205993 0.1209168 ]

In contrast, spin density estimator in VMC was fine in all cases, producing no such errors. Please let me know if you would like to have access to the test files.

@jtkrogel jtkrogel added the bug label Jul 12, 2018

@jtkrogel

This comment has been minimized.

Contributor

jtkrogel commented Jul 12, 2018

Also of note: on the CPU side there is some irregularity (bug) in the weights for the first block.

@PDoakORNL PDoakORNL self-assigned this Jul 12, 2018

@prckent prckent changed the title from Broken DMC density with GPU code to DMC densities are incorrect in CPU code (first block) and GPU code (all blocks) Jul 18, 2018

@prckent

This comment has been minimized.

Contributor

prckent commented Jul 18, 2018

Updated the title to note that there are at least two bugs here:

  1. The normalization of the density in the first block of the DMC code (CPU implementation) is incorrect
  2. The DMC density normalization is incorrect for all blocks in the GPU code. (The normalization looks to be off by a factor ~128 which depending on the run might be the total walker weight).

These have been broken since at least 3.1.1. Since these parts of the code have been very static, the bugs have likely existed since <2017, i.e. pre GitHub. It remains to be determined if the problem is only normalization related, particularly GPU side.

@PDoakORNL

This comment has been minimized.

Contributor

PDoakORNL commented Jul 18, 2018

I'm hoping to fix this as I port the GPU code to the SoA runtime branch of qmcpack. As of right now I have no special insight into its presence in the AoS based GPU implementation. If it needs to be fixed in the AoS implementation we should probably reassign as I'm less familiar with that pathway through the code.

@prckent

This comment has been minimized.

Contributor

prckent commented Jul 18, 2018

I created a separate issue for the CPU aspects.

@prckent prckent changed the title from DMC densities are incorrect in CPU code (first block) and GPU code (all blocks) to DMC densities are incorrect in GPU code (all blocks) Jul 18, 2018

@prckent prckent added this to the V3.5.1 Release milestone Aug 1, 2018

@PDoakORNL PDoakORNL added the to do label Aug 9, 2018

@jtkrogel jtkrogel referenced this issue Sep 11, 2018

Open

CUDA incompatibility #1059

0 of 3 tasks complete

@prckent prckent modified the milestones: V3.5.1 Release, V3.6.0 Release Nov 7, 2018

@prckent prckent added the gpu label Nov 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment