Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad GPU results #17

Closed
adam-m-jcbs opened this issue Oct 14, 2016 · 7 comments
Closed

Bad GPU results #17

adam-m-jcbs opened this issue Oct 14, 2016 · 7 comments
Assignees

Comments

@adam-m-jcbs
Copy link
Member

Many of the results from GPU-accelerated unit-test code appear to be wrong. As a concrete example, I've built an accelerated and CPU-only executable of the test_react unit test.

Build and execute an accelerated binary, move output for later comparison (note that I've supressed the output of commands):

cd $MICROPHYSICS_HOME/unit_test/test_react
make COMP=PGI NETWORK_DIR=ignition_simple ACC=t -j6
./main.Linux.PGI.acc.exe inputs_ignition.BS
mv react_ignition_test_react.BS react_ignition_test_react.BS.ACC

Build and execute a CPU-only binary:

make COMP=PGI NETWORK_DIR=ignition_simple -j6
./main.Linux.PGI.exe inputs_ignition.BS

If I now compare the two output files, we see they're very different:

fcompare.Linux.gfortran.exe --infile1 react_ignition_test_react.BS --infile2 react_ignition_test_react.BS.ACC

            variable name            absolute error            relative error
                                        (||A - B||)         (||A - B||/||A||)
 ----------------------------------------------------------------------
 level =  1
 density                           0.2384185791E-06          0.1192092896E-15
 temperature                       0.6854534149E-06          0.9792191642E-15
 Xnew_carbon-12                    0.9999999997              0.9999999999    
 Xnew_oxygen-16                    0.7999999999              0.9999999999    
 Xnew_magnesium-24                 0.9999999997               9.999436761    
 Xold_carbon-12                    0.9999999997              0.9999999999    
 Xold_oxygen-16                    0.7999999999              0.9999999999    
 Xold_magnesium-24                 0.9999999997               9.999999997    
 wdot_carbon-12                    0.2812178371E-03           1.000000000    
 wdot_oxygen-16                    0.1110223025E-14           1.000000000    
 wdot_magnesium-24                 0.2812178371E-03           1.000000000    
 rho_Hnuc                          0.3150192097E+24           1.000000000 

So while many networks and integrators seem to be able to compile and run without crashing, it's not clear how many are generating correct physical results. I've seen a similar issue with the VBDF integrator, so it doesn't appear to be specific to an integrator or network. These results are from bender, which has PGI 16.9 and a GeForce GTX 960 GPU (with CUDA 8.0 drivers and CUDA 7.5 compilers).

@zingale
Copy link
Member

zingale commented Oct 14, 2016

the fact that the density is different is telling -- nothing should be changing the density in this unit test.

although it is roundoff-level different

@dwillcox
Copy link
Member

dwillcox commented Oct 15, 2016

I've compared temp_zone and dens_zone calculated inside the loop between PGI serial (debug) and gfortran serial. (with aprox13's input, haven't tried it for ignition_simple)

Printing those variables with 15 sf shows they can differ by about 1E-7, which is the absolute difference in density you see above. My guess is that the roundoff error differs between the log10 or power operations implemented for PGI vs GNU.

Doing the same for PGI-serial vs PGI-acc, I see smaller differences, but at least one difference nonetheless, e.g.

dens_zone = 3414548.873833601

vs.

dens_zone = 3414548.873833600

That suggests the difference you see in density, as Mike said, is roundoff, and that the integrator may not necessarily be doing anything to the density.

@zingale
Copy link
Member

zingale commented Oct 15, 2016

good -- forgot that we are doing the exponentiation there.

@adam-m-jcbs
Copy link
Member Author

Some sleuthing indicates that xn_zone contains junk on the GPU. After adding a print statement to main.f90 and doing

make COMP=PGI  NDEBUG=t   -j6
./main.Linux.PGI.exe inputs_3alpha.BS

indicates that all xn_zone values are bounded by 0 <= xn_zone <= 1.0, as they should be. However,

make COMP=PGI ACC=t NDEBUG=t   -j6
./main.Linux.PGI.acc.exe inputs_3alpha.BS.ACC

indicates bad output, such as

...
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             4           13    1584893192.466650     
 j, kk, xn_zone:             1           15   1.1651353297957938E-004
 j, kk, xn_zone:             1           15   1.1651353297957938E-004
...

I'll continue investigating the origin of this, but wanted to note it in the issue thread.

@adam-m-jcbs
Copy link
Member Author

A quick note for the record: Max and I looked into this in depth and the origin of the issue appears to be the fact that 1) we're using pf on the GPU without ever having it in a data statement (seems PGI should've complained) and 2) pf has a Fortran character array (not supported by PGI on GPU) and a bound procedure (also not something I would expect to work on the GPU, though we don't actually try to use the procedure or character array). It's not clear how, but using this type on the GPU seems to be messing with memory, which may be why xn_zone contains garbage. Will look into this more tomorrow.

@zingale
Copy link
Member

zingale commented Oct 19, 2016

oh fun.

@adam-m-jcbs
Copy link
Member Author

adam-m-jcbs commented Oct 28, 2016

After the code was changed to not use pf, the error appears to have gone away. GPU and CPU comparison now yields

fcompare react_ignition_test_react.VBDF react_ignition_test_react.VBDF.ACC/

            variable name            absolute error            relative error
                                        (||A - B||)         (||A - B||/||A||)
 ----------------------------------------------------------------------
 level =  1
 density                            0.000000000               0.000000000    
 temperature                        0.000000000               0.000000000    
 Xnew_carbon-12                    0.1465605415E-10          0.4396816244E-10
 Xnew_oxygen-16                     0.000000000               0.000000000    
 Xnew_magnesium-24                 0.1465605415E-10          0.4396775531E-10
 Xold_carbon-12                     0.000000000               0.000000000    
 Xold_oxygen-16                     0.000000000               0.000000000    
 Xold_magnesium-24                  0.000000000               0.000000000    
 wdot_carbon-12                    0.1465605415E-09          0.4748322189E-05
 wdot_oxygen-16                     0.000000000               0.000000000    
 wdot_magnesium-24                 0.1465605415E-09          0.4748322189E-05
 rho_Hnuc                          0.1581619200E+18          0.4574365704E-05

Relative errors are at most about 5e-6 between CPU and GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants