DMC LocalECP incorrect in GPU code on titan using CUDA 9.1 #1440

jtkrogel · 2019-03-13T13:59:57Z

Disagreement between CPU and GPU DMC total energies was observed for a water molecule in periodic boundary conditions (8 A cubic cell, CASINO pseudopotentials, Titan at OLCF, QMCPACK 3.6.0). Issue originally reported by Andrea Zen. Original inputs and outputs: TEST_DMC.zip

From the attached outputs, the VMC energies agree, while the DMC energies differ by about 0.3 Ha:

#VMC
>qmca -q e *s001*scalar*
dmc_cpu  series 1  LocalEnergy           =  -17.183577 +/- 0.007486 
dmc_gpu  series 1  LocalEnergy           =  -17.152789 +/- 0.018592 

#DMC
>qmca -q e *s002*scalar*
dmc_cpu  series 2  LocalEnergy           =  -17.220971 +/- 0.000968 
dmc_gpu  series 2  LocalEnergy           =  -16.869061 +/- 0.003256

The difference is entirely attributable to the local part of the ECP:

#DMC
>qmca -q l *s002*scalar*
dmc_cpu  series 2  LocalECP              =  -41.436580 +/- 0.021199 
dmc_gpu  series 2  LocalECP              =  -41.026695 +/- 0.028982

Note: the DMC error bars are not statistically meaningful here (10 blocks), but the difference is large enough to support this conclusion.

The oddity here is that the error is only seen in DMC and it is limited to a single potential energy term. This may indicate a bug in LocalECP that surfaces with increased walker count on the GPU (1 walker/gpu in VMC, 320 walkers/gpu in DMC). Likely, a series of VMC runs with increasing number of walkers will show this.

The text was updated successfully, but these errors were encountered:

prckent · 2019-03-13T14:58:25Z

The local ECP kenel is one that is known to not be reproducibile between runs, i.e. is buggy. Something to do with walker and GPU thread/block count. Previously the differences have been small enough to be ignorable; this problem indicates it must be fixed. There are a couple of issues on this.

You don't state explicitly, but is the non-local ECP term correct?

jtkrogel · 2019-03-13T15:02:31Z

The non-local ECP term appears to be correct.

prckent · 2019-03-19T21:16:51Z

To save time debugging this, for the next 3 weeks the necessary pwscf file is at
https://ftp.ornl.gov/filedownload?ftp=e;dir=WATER
Replace WATER with uP24qpBh6M3N

prckent · 2019-03-19T21:24:22Z

I did some VMC experimentation. On a single Kepler GPU with a fixed seed and either 1 or 320 walkers, I was able to reproduce the previously noticed non-determinism with just a few moves. i.e. Multiple runs of the executable generate slightly different results. From this short run and my current inputs we can't say if the energies are "bad" but the local electron ion and electron-electron terms are not repeatable. The much harder to compute kinetic energy and non-local electron-ion are repeatable (?!).

prckent · 2019-03-20T15:15:03Z

VMC runs with 320 walkers are essentially the same, i.e. no 0.3 Ha shift.

All inputs and outputs from test including wavefunction: https://ftp.ornl.gov/filedownload?ftp=e;dir=ICE
Replace ICE with uP21fJWh6csV

  <qmc method="vmc" move="pbyp" gpu="yes">
    <parameter name="blocks">      40 </parameter>
    <parameter name="substeps">    1 </parameter>
    <parameter name="steps">       100 </parameter>
    <parameter name="warmupSteps">  500 </parameter>
    <parameter name="usedrift">     no </parameter>
    <parameter name="timestep">    0.3 </parameter>
    <parameter name="walkers">    320 </parameter>
  </qmc>

qmca -e 0 vmc*.dat

vmc_cuda  series 1
  LocalEnergy           =          -17.1638 +/-           0.0011
  Variance              =            0.4991 +/-           0.0063
  Kinetic               =            13.508 +/-            0.018
  LocalPotential        =           -30.672 +/-            0.018
  ElecElec              =           11.1265 +/-           0.0097
  LocalECP              =           -41.409 +/-            0.019
  NonLocalECP           =           -1.3970 +/-           0.0095
  IonIon                =              1.01 +/-             0.00
  LocalEnergy_sq        =           295.097 +/-            0.036
  BlockWeight           =          32000.00 +/-             0.00
  BlockCPU              =             1.248 +/-            0.018
  AcceptRatio           =           0.47567 +/-          0.00017
  Efficiency            =           1908.34 +/-             0.00
  TotalTime             =             49.91 +/-             0.00
  TotalSamples          =           1280000 +/-                0

vmc_omp  series 1
  LocalEnergy           =          -17.1718 +/-           0.0012
  Variance              =            0.5031 +/-           0.0092
  Kinetic               =            13.510 +/-            0.016
  LocalPotential        =           -30.682 +/-            0.016
  ElecElec              =           11.1155 +/-           0.0087
  LocalECP              =           -41.408 +/-            0.017
  NonLocalECP           =           -1.3964 +/-           0.0094
  IonIon                =              1.01 +/-             0.00
  LocalEnergy_sq        =           295.375 +/-            0.039
  BlockWeight           =          32000.00 +/-             0.00
  BlockCPU              =            1.0728 +/-           0.0024
  AcceptRatio           =           0.47613 +/-          0.00015
  Efficiency            =           1885.79 +/-             0.00
  TotalTime             =             42.91 +/-             0.00
  TotalSamples          =           1280000 +/-                0

prckent · 2019-03-20T18:49:51Z

@jtkrogel where and how were you able to produce the cpu-gpu energy shift? machine, qmcpack version, software versions, node/mpi/thread counts etc.

In my DMC tests so far I have not found such a sizable shift.

jtkrogel · 2019-03-20T19:12:49Z

The results are from runs performed by Andrea Zen (@zenandrea) on Titan with QMCPACK 3.6.0 on 4 nodes, 1 mpi task per node, 1 thread per mpi task (see files job_qmcpack_gpu-titan, input_dmcgpu.xml, and out_dmcgpu in TEST_DMC.zip).

The build details, as far as I know, are according to our build_olcf_titan.sh script, but with changes to the boost and fftw libraries as follows: boost/1.62.0 fftw/3.3.4.11). Presumably with the real AoS code.

@zenandrea, please check if I have missed something.

zenandrea · 2019-03-21T10:52:07Z

Dear @jtkrogel and @prckent,
almost everything as you told, but I used fftw/3.3.4.8, which is loaded as default.
I confirm that I compiled for real AoS code.

In particular, this is my compilations script:

export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-pgi PrgEnv-gnu
module load cudatoolkit/9.1.85_3.10-1.0502.df1cc54.3.1
module load cray-hdf5-parallel
module load cmake3
module load fftw
export FFTW_HOME=$FFTW_DIR/..
module load boost/1.67.0
export CC=cc
export CXX=CC
mkdir build_titan_gpu
cd build_titan_gpu
cmake -DQMC_CUDA=1 ..
cmake -DQMC_CUDA=1 ..
make -j 8
ls -l bin/qmcpack

prckent · 2019-03-21T14:50:38Z

Thanks. Nothing unreasonable in the above. It should work without problems.

FFTW would not cause the failures. If FFTW were wrong - and I don't recall a single case ever where it has been - the kinetic energy and Monte Carlo walk in general would also be wrong.

prckent · 2019-03-22T22:54:51Z

I have reproduced this problem using the current develop version and with builds that pass the unit and diamond and LiH integration tests. I used the updated build script #1472 i.e. Nothing out of the ordinary.

Using 1MPI, 16 OMP threads and 0/1 GPU I have a 0.6 Hartree (!) difference in the DMC energies (series 2 & 3 below), while the VMC energies agree. The difference is in the local part of the pseudopotential. The analysis below is not done carefully, but it is interesting that the kinetic energy and acceptance ratio appear to match between CPU and GPU.

A 4 node run shows a slightly smaller disagreement between the codes.

qmca -q ev ../titan_orig*/*.scalar.dat

                            LocalEnergy               Variance           ratio
../titan_orig_1mpi/qmc_cpu  series 1  -17.176063 +/- 0.016221   0.595062 +/- 0.154097   0.0346
../titan_orig_1mpi/qmc_cpu  series 2  -17.219573 +/- 0.002273   0.461457 +/- 0.003292   0.0268
../titan_orig_1mpi/qmc_cpu  series 3  -17.220429 +/- 0.001601   0.490561 +/- 0.007181   0.0285

../titan_orig_1mpi/qmc_gpu  series 1  -17.155363 +/- 0.025336   0.467373 +/- 0.056839   0.0272
../titan_orig_1mpi/qmc_gpu  series 2  -16.647208 +/- 0.000720   1.010610 +/- 0.005110   0.0607
../titan_orig_1mpi/qmc_gpu  series 3  -16.639882 +/- 0.001205   1.026227 +/- 0.007102   0.0617

pk7@titan-ext4:/lustre/atlas/ ... /Zen_water_problem/titan_orig_1mpi> qmca ../titan_orig_1mpi/qmc_cpu.s003.scalar.dat

../titan_orig_1mpi/qmc_cpu  series 3
  LocalEnergy           =          -17.2187 +/-           0.0020
  Variance              =            0.4878 +/-           0.0063
  Kinetic               =            13.587 +/-            0.024
  LocalPotential        =           -30.805 +/-            0.025
  ElecElec              =            11.115 +/-            0.015
  LocalECP              =           -41.502 +/-            0.031
  NonLocalECP           =            -1.425 +/-            0.016
  IonIon                =              1.01 +/-             0.00
  LocalEnergy_sq        =           296.972 +/-            0.073
  BlockWeight           =         634774.40 +/-          1923.92
  BlockCPU              =            302.38 +/-             1.12
  AcceptRatio           =          0.993562 +/-         0.000029
  Efficiency            =              0.93 +/-             0.00
  TotalTime             =           1511.88 +/-             0.00
  TotalSamples          =           3173872 +/-                0
pk7@titan-ext4:/lustre/atlas/ ... /Zen_water_problem/titan_orig_1mpi> qmca ../titan_orig_1mpi/qmc_gpu.s003.scalar.dat

../titan_orig_1mpi/qmc_gpu  series 3
  LocalEnergy           =          -16.6399 +/-           0.0012
  Variance              =            1.0262 +/-           0.0071
  Kinetic               =            13.533 +/-            0.019
  LocalPotential        =           -30.173 +/-            0.019
  ElecElec              =            11.032 +/-            0.012
  LocalECP              =           -40.787 +/-            0.025
  NonLocalECP           =           -1.4246 +/-           0.0066
  IonIon                =              1.01 +/-             0.00
  LocalEnergy_sq        =           277.912 +/-            0.042
  BlockWeight           =         638124.30 +/-          1124.31
  BlockCPU              =            26.026 +/-            0.039
  AcceptRatio           =          0.993609 +/-         0.000016
  Efficiency            =             14.94 +/-             0.00
  TotalTime             =            260.26 +/-             0.00
  TotalSamples          =           6381243 +/-                0

prckent · 2019-03-22T23:39:55Z

Also worth noting that the DMC energy is above the VMC one...

prckent · 2019-03-25T15:54:43Z

Attempting to bracket the problem:

QMCPACK v.3.1.1 (August 2017) also has the error. i.e. It is not a recently introduced bug in our source code.
Using the lastest develop version but with no Jastrow in the wavefunction the bug persists.

Still puzzling is why our existing carbon diamond or LiH tests don't trigger this bug.

prckent · 2019-03-25T16:52:05Z

Using the BFD potentials from examples/molecules/H2O the problem persists. This rules out handling of CASINO format potentials. Again the DMC energy is above the VMC energy on the GPU while the CPU result appears OK.

                           LocalEnergy               Variance           ratio
../titan_orig_1mpi_noj_bfd/qmc_cpu  series 1  -17.017532 +/- 0.053990   3.475117 +/- 0.377453   0.2042
../titan_orig_1mpi_noj_bfd/qmc_cpu  series 2  -17.257461 +/- 0.003199   3.439663 +/- 0.020524   0.1993
../titan_orig_1mpi_noj_bfd/qmc_cpu  series 3  -17.271529 +/- 0.003633   3.671973 +/- 0.031433   0.2126

../titan_orig_1mpi_noj_bfd/qmc_gpu  series 1  -16.898081 +/- 0.064148   3.766366 +/- 0.306030   0.2229
../titan_orig_1mpi_noj_bfd/qmc_gpu  series 2  -16.694704 +/- 0.005017   4.001500 +/- 0.038960   0.2397
../titan_orig_1mpi_noj_bfd/qmc_gpu  series 3  -16.687953 +/- 0.002943   4.170178 +/- 0.020878   0.2499

prckent · 2019-03-25T17:54:19Z

Persists with no MPI, -DQMC_MPI=0

prckent · 2019-03-25T23:49:51Z

By varying the number of walkers I was able to break VMC (good suggestion by @jtkrogel ). The bug is back to looking like a bad kernel.

prckent · 2019-03-26T18:00:43Z

The linked VMC test gives incorrect results on titan.
titan_vmc_only.zip 146.46 MB https://ftp.ornl.gov/filedownload?ftp=e;dir=FRUIT
Replace FRUIT with uP10HwMh8qGU

Puzzlingly, these same files give correct results on oxygen (Intel Xeon + Kepler + clang6 +cuda 10.0 currently). A naively incorrect kernel would give reproducible errors.

atillack · 2019-03-26T19:10:32Z

@prckent I can reproduce your numbers on Titan.

atillack · 2019-03-26T20:14:35Z

@prckent When I go back to Cuda 7.5 (using Gcc 4.9.3 and an older version of QMCPACK) I get the correct results:

qmc_gpu series 1
LocalEnergy = -17.1716 +/- 0.0021
Variance = 0.490 +/- 0.017
Kinetic = 13.481 +/- 0.025
LocalPotential = -30.652 +/- 0.025
ElecElec = 11.129 +/- 0.013
LocalECP = -41.424 +/- 0.029
NonLocalECP = -1.364 +/- 0.014
IonIon = 1.01 +/- 0.00
LocalEnergy_sq = 295.354 +/- 0.074
BlockWeight = 2560.00 +/- 0.00
BlockCPU = 0.310562 +/- 0.000093
AcceptRatio = 0.47525 +/- 0.00029
Efficiency = 16660.91 +/- 0.00
TotalTime = 19.57 +/- 0.00
TotalSamples = 161280 +/- 0

So this could be an issue with the Cuda installation on Titan...

prckent · 2019-03-26T21:09:42Z

@atillack Interesting. If you are using a standalone workstation with CUDA 7.5 (!), the question is whether you can break VMC by e.g. varying the number of walkers, or if running Andrea's original DMC case still breaks.

jtkrogel · 2019-03-27T11:32:43Z

@atillack Is there a specific build config + QMCPACK version you can recommend that does not display the problem on Titan? This may represent a practical way @zenandrea can get correct production runs sooner.

atillack · 2019-03-27T16:04:53Z

@jtkrogel QMPACK 3.5.0

Here are the modules I have loaded (for gcc/4.9.3, module unload gcc; module load gcc/4.9.3 after "module swap PrgEnv-pgi PrgEnv-gnu" works):

Currently Loaded Modulefiles:

eswrap/1.3.3-1.020200.1280.0
craype-network-gemini
craype/2.5.13
cray-mpich/7.6.3
craype-interlagos
lustredu/1.4
xalt/0.7.5
git/2.13.0
module_msg/0.1
modulator/1.2.0
hsi/5.0.2.p1
DefApps
cray-libsci/16.11.1
udreg/2.3.2-1.0502.10518.2.17.gem
ugni/6.0-1.0502.10863.8.28.gem
pmi/5.0.12
dmapp/7.0.1-1.0502.11080.8.74.gem
gni-headers/4.0-1.0502.10859.7.8.gem
xpmem/0.1-2.0502.64982.5.3.gem
dvs/2.5_0.9.0-1.0502.2188.1.113.gem
alps/5.2.4-2.0502.9774.31.12.gem
rca/1.0.0-2.0502.60530.1.63.gem
atp/2.1.1
PrgEnv-gnu/5.2.82
cray-hdf5/1.10.0.3
cmake3/3.9.0
fftw/3.3.4.8
boost/1.62.0
subversion/1.9.3
cudatoolkit/7.5.18-1.0502.10743.2.1
gcc/4.9.3

atillack · 2019-03-27T16:38:05Z

@prckent @jtkrogel I just looked into the Cuda 9 changelog and found this wonderful snippet:

The compiler has transitioned to a new code-generation back end for Kepler GPUs.
PTXAS now includes a new option --new-sm3x-opt=false that allows developers to continue using the legacy back end. Use ptxas --help to get more information about these command-line options.

This at least may explain what is going on. I am not sure how to pass down this parameter to ptxas though ...

Edit: Testing now.

atillack · 2019-03-27T17:59:53Z

@prckent @jtkrogel Cuda 7.5 is still the temporary solution. The ptxas flag (-Xptxas --new-sm3x-opt=false can be put in CUDA_NVCC_FLAGS) only helps to get results halfway to the correct number on Cuda 9.1 on Titan:

qmc_gpu series 1
LocalEnergy = -16.9815 +/- 0.0021
Variance = 0.797 +/- 0.015
Kinetic = 13.483 +/- 0.022
LocalPotential = -30.465 +/- 0.022
ElecElec = 11.125 +/- 0.012
LocalECP = -41.235 +/- 0.025
NonLocalECP = -1.362 +/- 0.013
IonIon = 1.01 +/- 0.00
LocalEnergy_sq = 289.167 +/- 0.073
BlockWeight = 2560.00 +/- 0.00
BlockCPU = 0.302379 +/- 0.000059
AcceptRatio = 0.47550 +/- 0.00025
Efficiency = 12570.17 +/- 0.00
TotalTime = 24.49 +/- 0.00
TotalSamples = 207360 +/- 0

atillack · 2019-03-27T18:43:00Z

@prckent @jtkrogel After talking with our Nvidia representatives, there is a code generation regression in 9.1 which is fixed in 9.2. So on Titan, it seems the only work-around is to use 7.5 for the time being.

If a newer version than QMCPACK 3.5.0 is needed some (minor) code changes are needed in order to compile with Cuda 7.5:

lines containing cudamemadvise need to be commented out in QMCWaveFunctions/EinsplineSetCuda.cpp
"#include <nvml.h>" needs to be commented out in Platforms/devices.h
CMake/GNUCompilers.cmake needs to be changed to accept compilers after 4.8 (second line 5.0 needs changing to 4.8 like in older version of QMCPACK)

atillack · 2019-03-27T18:53:02Z

@prckent @jtkrogel Another data point. I also get correct results if the Cuda 9.1 toolkit is loaded when executing QMCPACK that was compiled with Cuda 7.5. This does seem to point to the code generation being the issue.

atillack · 2019-03-28T17:12:25Z

@prckent @jtkrogel On Summit using Cuda 9.2 the correct results are also obtained:

qmc_gpu series 1
LocalEnergy = -17.1707 +/- 0.0020
Variance = 0.489 +/- 0.016
Kinetic = 13.480 +/- 0.025
LocalPotential = -30.651 +/- 0.025
ElecElec = 11.128 +/- 0.013
LocalECP = -41.421 +/- 0.029
NonLocalECP = -1.364 +/- 0.014
IonIon = 1.01 +/- 0.00
LocalEnergy_sq = 295.321 +/- 0.073
BlockWeight = 2560.00 +/- 0.00
BlockCPU = 0.179291 +/- 0.000020
AcceptRatio = 0.47529 +/- 0.00029
Efficiency = 29197.52 +/- 0.00
TotalTime = 11.47 +/- 0.00
TotalSamples = 163840 +/- 0

zenandrea · 2019-03-28T17:20:06Z

Dear @atillack @prckent @jtkrogel
it seems very likely that the source of the issue was the cudatoolkit version 9.1.
Shall we ask the OLCF's system administrators if they can install the 9.2 version?

Maybe there might be other packages other than qmcpack affected by this kind of problem!

prckent · 2019-03-28T17:24:55Z

@zenandrea Please ask - I am not sure that 9.2 will be installed given that Titan has only a few more months of accessibility, but other packages are certainly at risk. Are you able to move to Summit or is your time only on Titan?

This is a scary problem and I am not keen on recommending use of older software.

zenandrea · 2019-03-28T17:35:58Z

@prckent I have half the resources on titan and half on summit.
I'm going to ask straight away.

atillack · 2019-03-28T17:43:09Z

@prckent @zenandrea As Cuda 9.1's behavior was seen as mostly a performance regression, the Nvidia folks are looking at our kernel giving bad numbers under 9.1 to see if there's a possible workaround.

@zenandrea It's a good idea to ask but like Paul I am uncertain if this will happen in time to be useful. In the interim, with small code changes (see post above) it is possible to compile a current version of QMCPACK on Titan with Cuda 7.5 but this only works with GCC 4.9.3 as otherwise modules are missing.

prckent · 2019-03-28T17:52:57Z

I am still open to the idea that we have illegal/buggy code, and that different CUDA versions, GPUs, etc. expose the problem in different ways. However "bad generated code" is the best explanation given the established facts. What is so strange still is that all the difficult and costly parts of the calculation involving the wavefunction are correct.

ye-luo · 2019-03-28T19:36:31Z

I have a solution to use 7.5 with the current QMCPACK. Will PR soon.

atillack · 2019-03-28T19:45:23Z

@ye-luo Thanks!

ye-luo · 2019-03-28T23:12:17Z

I failed to find a clean solution through the source because I need to hack cmake.
To enable our production needs, I'm making all the build variants and will put it in a place anyone can access.

prckent · 2019-03-29T11:26:48Z

I'll note that an initialization bug similar to #1518 could explain these problems.

ye-luo · 2019-03-29T14:07:22Z

I checked. Un fortunately #1518 is not related to this bug.

atillack · 2019-03-29T19:14:17Z

@prckent The problem seems contained to Titan. Cuda 9.1 on Summit also gives the correct results:

qmc_gpu series 1
LocalEnergy = -17.1703 +/- 0.0020
Variance = 0.496 +/- 0.017
Kinetic = 13.479 +/- 0.024
LocalPotential = -30.650 +/- 0.024
ElecElec = 11.128 +/- 0.012
LocalECP = -41.420 +/- 0.028
NonLocalECP = -1.365 +/- 0.013
IonIon = 1.01 +/- 0.00
LocalEnergy_sq = 295.316 +/- 0.071
BlockWeight = 2560.00 +/- 0.00
BlockCPU = 0.182407 +/- 0.000021
AcceptRatio = 0.47538 +/- 0.00028
Efficiency = 27168.89 +/- 0.00
TotalTime = 12.40 +/- 0.00
TotalSamples = 174080 +/- 0

ye-luo · 2019-03-29T21:30:57Z

I put both v3.6 and 3.7 binaries at
/lustre/atlas/world-shared/mat189/qmcpack_binaries_titan
They should last till the retirement of Titan.

To workaround the bug in CUDA 9.1 which gives wrong results.
The following steps are taken to compile CudaCoulomb.cu with CUDA 7.5.
After building QMCPACK CUDA version,

From the build folder, cd src/QMCHamiltonians
find -name qmcham_generated_CudaCoulomb.cu.o.RELEASE.cmake and open it with an editor.
touch ./CMakeFiles/qmcham.dir/qmcham_generated_CudaCoulomb.cu.o.RELEASE.cmake
Modify CUDA_HOST_COMPILER from /opt/cray/craype/2.5.13/bin/cc to /opt/gcc/4.9.3/bin/gcc
Replace all cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1 to cudatoolkit7.5/7.5.18-1.0502.10743.2.1
Type make -j32 and you see "Built target qmcham". If CMake is triggered, repleat step 2-4 because CMake overwrites qmcham_generated_CudaCoulomb.cu.o.RELEASE.cmake.
cd ../QMCApp ; sh CMakeFiles/qmcpack.dir/link.txt

prckent · 2019-05-03T14:31:10Z

I modified the title for posterity to record the actual determined problem.
Should we close this issue? @zenandrea are the new binaries working for you?

zenandrea · 2019-05-06T10:15:32Z

Dear @prckent, new binaries seems to work well on the cases I have tested so far.

jtkrogel added the bug label Mar 13, 2019

prckent added this to the V3.7.0 Release milestone Mar 15, 2019

zenandrea mentioned this issue Mar 22, 2019

Failure of tests in Titan with either CPU or GPU #1467

Closed

prckent changed the title ~~DMC LocalECP incorrect in GPU code~~ DMC LocalECP incorrect in GPU code on titan Mar 28, 2019

prckent mentioned this issue Mar 29, 2019

Fix parseCasino #1518

Merged

prckent modified the milestones: V3.7.0 Release, V3.8.0 Release Apr 25, 2019

prckent changed the title ~~DMC LocalECP incorrect in GPU code on titan~~ DMC LocalECP incorrect in GPU code on titan using CUDA 9.1 May 3, 2019

ye-luo closed this as completed May 13, 2019

DMC LocalECP incorrect in GPU code on titan using CUDA 9.1 #1440

DMC LocalECP incorrect in GPU code on titan using CUDA 9.1 #1440

Comments

jtkrogel commented Mar 13, 2019 • edited

prckent commented Mar 13, 2019 • edited

jtkrogel commented Mar 13, 2019

prckent commented Mar 19, 2019

prckent commented Mar 19, 2019

prckent commented Mar 20, 2019

prckent commented Mar 20, 2019 • edited

jtkrogel commented Mar 20, 2019

zenandrea commented Mar 21, 2019

prckent commented Mar 21, 2019

prckent commented Mar 22, 2019

prckent commented Mar 22, 2019

prckent commented Mar 25, 2019

prckent commented Mar 25, 2019

prckent commented Mar 25, 2019

prckent commented Mar 25, 2019

prckent commented Mar 26, 2019

atillack commented Mar 26, 2019

atillack commented Mar 26, 2019 • edited

prckent commented Mar 26, 2019

jtkrogel commented Mar 27, 2019

atillack commented Mar 27, 2019

atillack commented Mar 27, 2019 • edited

atillack commented Mar 27, 2019

atillack commented Mar 27, 2019

atillack commented Mar 27, 2019

atillack commented Mar 28, 2019

zenandrea commented Mar 28, 2019

prckent commented Mar 28, 2019

zenandrea commented Mar 28, 2019

atillack commented Mar 28, 2019

prckent commented Mar 28, 2019

ye-luo commented Mar 28, 2019

atillack commented Mar 28, 2019

ye-luo commented Mar 28, 2019

prckent commented Mar 29, 2019 • edited

ye-luo commented Mar 29, 2019 • edited

atillack commented Mar 29, 2019 • edited

ye-luo commented Mar 29, 2019

prckent commented May 3, 2019

zenandrea commented May 6, 2019

jtkrogel commented Mar 13, 2019 •

edited

prckent commented Mar 13, 2019 •

edited

prckent commented Mar 20, 2019 •

edited

atillack commented Mar 26, 2019 •

edited

atillack commented Mar 27, 2019 •

edited

prckent commented Mar 29, 2019 •

edited

ye-luo commented Mar 29, 2019 •

edited

atillack commented Mar 29, 2019 •

edited