Substitution of NAG libraries #1

lazersos · 2017-07-20T14:22:25Z

The NAG libraries often cause problems with portability of code. NAG is not freely available. Simple changes in release number (R23 vs. R24) can break codes. It is proposed that we fork a branch of SPEC where the NAG libraries will be replaced one by one. Once this code has been sufficiently tested it can be merged with the master branch. Where possible routines will be replaced by BLAS or LAPACK libraries as highly optimized versions of these libraries are available and we can free include unoptimized versions with SPEC so that at least basic functionality is always possible.

lazersos · 2017-07-20T17:56:02Z

I have had a brief discussion with Josh Breslau, and he and I might be able to eliminate the NAG routines in SPEC.
Let us construct a list of routines which need to be replaced as soon as possible, and those which can wait. For example, the routine
http://www.nag.co.uk/numeric/FL/manual19/pdf/F02/f02ebf_fl19.pdf
is only used to compute the eigenvalues of the hessian matrix as a diagnostic, and it is hardly ever used. So this can just be commented out for now.
Also, I believe that Joaquim and I both have access to some NAG library. So, it is only the routines that both of us do not have access to that are the immediate problem.
Joaquim: can you identify which NAG routines are causing the problem?
Josh and I will replace routines, temporarily comment out routines that are not urgent and so on until we get rid of NAG completely.
Regards, Stuart

@jloizu can you identify which NAG routines are causing the problem?

lazersos · 2017-07-20T18:24:27Z

Here is an un-authoritative list of subroutines and the NAG routines inside them. Clearly we have our work cut out for us. I'm pushing a branch to the repo today for replacement of the NAG routines.
casing.h: D01EAF (replaced with DCUHRE, what I use for virtual casing in the absence of NAG)
dforce.h: F07ADF (replaced: This is just Lapack DGETRF)
dforce.h: F07AJF (replaced: This is just Lapack DGETRI)
dforce.h: F01ADF (replaced: This is just Lapack DPOTRI with 'L' option)
hesian.h: F02EBF (replaced: DGEEV from Lapack)
hesian.h: F04ATF
hesian.h: F03AAF
jo00aa.h: D01BCF (replaced: CDGQF from IQPACK)
ma02aa.h: C05PBF
ma02aa.h: C05PCF
ma02aa.h: E04UEF
ma02aa.h: E04UFF
ma02aa.h: C05PBF
mp00ac.h: F04AEF
mp00ac.h: F04ABF
newton.h: C05NDF (replaced with hybrd from MINPACK)
newton.h: C05PDF (replaced with hybrj from MINPACK)
numrec.h: C06FUF
numrec.h: C06FAF
numrec.h: C06GBF
numrec.h: C06EBF
pc00aa.h: E04DGF
pc00aa.h: E04DKF
pc00aa.h: E04DGF
pp00ab.h: D02BJF
preset.h: C06FUF
preset.h: D01BCF
tr00ab.h: F04ATF
tr00ab.h: F04AEF
tr00ab.h: F04JGF
wa00aa.h: C05NBF
wa00aa.h: D03EAF

zhucaoxiang · 2017-07-21T02:08:20Z

I created a branch no_nag and replaced C05NDF and C05PDF in newton.h with hybrd and hybrj from MINPACK.

I can compile it without any errors. Further debugs need to compare the two versions. I will talk with Stuart tomorrow.

There are a lot of Fortran source code here. We can replace NAG subroutines with corresponding source codes.

lazersos · 2017-07-21T07:16:09Z

@StellaratorCoils I've also done similar on a branch called NAG_REPLACE. Which I've pushed. Specifically the routines in casing and dforce. We should merge and choose one branch name to develop under.

lazersos · 2017-07-24T18:59:25Z

The routine C06FUF is doing an FFT. Usually if you want to do FFT's with free software, I'd suggest going with FFTW which is available on the PPPL cluster. I would suspect IPP has this as well since the GENE code also takes advantage of this library. FFTW is VERY fast compared to other libraries so I don't really see any downside to using it. Thoughts?

jbreslau · 2017-07-24T19:09:55Z

I'd be happy to use FFTW for this. Would you recommend I start a new branch to make the change, or work within an existing one?

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Mon, Jul 24, 2017 at 2:59 PM, Samuel Lazerson ***@***.***> wrote: The routine C06FUF is doing an FFT. Usually if you want to do FFT's with free software, I'd suggest going with FFTW <http://www.fftw.org> which is available on the PPPL cluster. I would suspect IPP has this as well since the GENE code also takes advantage of this library. FFTW is VERY fast compared to other libraries so I don't really see any downside to using it. Thoughts? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620BFk2feeNrU_SFLFDWS742P97NSlks5sROmNgaJpZM4OeLaa> .

lazersos · 2017-07-24T19:23:17Z

@jbreslau Go ahead and work in the NAG_REPLACE branch. I think preset.h is the first place to attempt to implement it. For sanity sake I'd probably use the legacy interface.

jbreslau · 2017-07-24T19:47:00Z

I'm getting build errors for this branch because mpif90 can't find the hdf5 module (I loaded the default hdf5-serial in the intel environment). What module environment do you use to build the code? Thanks, -Josh

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Mon, Jul 24, 2017 at 3:23 PM, Samuel Lazerson ***@***.***> wrote: @jbreslau <https://github.com/jbreslau> Go ahead and work in the NAG_REPLACE branch. I think preset.h is the first place to attempt to implement it. For sanity sake I'd probably use the legacy interface. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620AD7vxxEVGScaKg1KD9tss3QsfQtks5sRO8lgaJpZM4OeLaa> .

jbreslau · 2017-07-25T21:01:19Z

I've committed a revision to the NAG_REPLACE branch that replaces the C06FUF calls in numrec.h with the fftw/3.3.4 equivalents. I haven't completely eliminated the NAG call (it's still in preset) because I first need to make sure the trig coefficients computed there aren't used anywhere else. The revised version runs slightly faster and produces what appear to me to be very slightly different results on the test case l2stell.sp. Someone with more experience with SPEC might want to take a look at the output and verify that the difference is acceptable...

jloizu · 2017-07-25T21:32:00Z

I am currently writing routines to make precise (quantitative) tests to, for example, check that two outputs are exactly the same in the case they should be the same, or to give a degree of reproducibility in the case where small differences are expected to arise. This should take me a few more days. I will keep you up to date.

jbreslau · 2017-07-26T14:21:56Z

I have finished removing all references to C06FUF from my working copy of NAG_REPLACE, but the system won't let me commit the changes; I get this message:

% svn commit -m '...'
...
Sending bnorml.h
Sending coords.h
Sending curent.h
Sending dforce.h
svn: E160024: Commit failed (details follow):
svn: E160024: File 'dforce.h' is out of date; try updating
svn: E160024: resource out of date; try updating

I have tried updating, but the message persists. Does anyone know how to get around this issue?

zhucaoxiang · 2017-07-26T14:25:28Z

@jbreslau You may need to use git rather than svn. Something like

git status
git commit -am "some update notes"
git pull origin NAG_REPLACE
git push origin NAG_REPLACE

jbreslau · 2017-07-26T14:28:51Z

I can't seem to use git commands for my working version; it isn't recognized as a git repository.

lazersos · 2017-07-26T14:37:46Z

@jbreslau You need to module load git for it to work properly. How did you get the code in the first place?

jbreslau · 2017-07-26T14:45:37Z

I believe I followed the instructions on Github for accessing repositories using svn.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Wed, Jul 26, 2017 at 10:37 AM, Samuel Lazerson ***@***.***> wrote: @jbreslau <https://github.com/jbreslau> You need to module load git for it to work properly. How did you get the code in the first place? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620BEGcVPbU7eO3pKkl98G5R1OZvA2ks5sR086gaJpZM4OeLaa> .

jbreslau · 2017-07-26T15:08:50Z

I've found a workaround and commit the changes to the NAG_REPLACE branch.

It looks like there's just a handful of C06*** NAG routines remaining, all in numrec.h subroutines (slowft and invft) that never get called. Do these NAG calls need to be replaced, or can we just delete (or at least comment out) the subroutines?

lazersos · 2017-07-28T07:42:43Z

@jbreslau For the intel compiler I did have to module load fftw/3.3.4 as the fftw module was not .f03 compatible.

The code runs for the l2stel example but for some reason the iteration numbers and timing isn't being displayed correctly.

ellis003:5084 mpirun -np 2 ~/bin/xspec l2stell
xspech :            : 
       :  compiled  : date    = Fri Jul 28 03:34:36 EDT 2017 ; 
       :            : dir     = /u/slazerso/src/SPEC ; 
       :            : macros  = macros ; 
       :            : f90     = /usr/pppl/intel/11.0/081/bin/intel64/ifort ; 
       :            : flags   =  -r8 -mcmodel=large -O3 -m64 -unroll0 -fno-alias
  -ip -traceback  ; 
xspech :            : 
xspech :       0.00 : begin execution ; ncpu=  2 ; calling global:readin ;
readin :            : 
readin :       0.00 : date=2017/07/28 , 03:38:46 ; machine precision= 1.11E-16 ; vsmall= 1.11E-14 ; small= 1.11E-12 ;
readin :            : 
readin :       0.00 : ext = l2stell                                                                                             
readin :            : 
readin :            : 
readin :       0.00 : Igeometry=  3 ; Istellsym=  1 ;
readin :            : Lfreebound=  0 ; phiedge=  2.000000000000000E+00 ; curtor=  1.038123580200000E-09 ; curpol=  0.000000000000000E+00 ;
readin :            : gamma=  0.000000000000000E+00 ;
readin :            : Nfp=  5 ; Nvol=  2 ; Mvol=  2 ; Mpol=  4 ; Ntor=  4 ;
readin :            : pscale=  1.00000E-03 ; Ladiabatic= 0 ; Lconstraint=  1 ; mupf: tol,its=  1.00E-12 , 128 ;
readin :            : Lrad =   4,  4,
readin :            : 
readin :       0.00 : Linitialize=  0 ; Lzerovac= 0 ; Ndiscrete= 2 ;
readin :            : Nquad=  -1 ; iMpol=  -4 ; iNtor=  -4 ;
readin :            : Lsparse= 0 ; Lsvdiota= 0 ; imethod= 3 ; iorder= 2 ; iprecon= 1 ; iotatol= -1.00000E+00 ;
readin :            : Lextrap= 0 ; Mregular= -1 ;
readin :            : 
readin :       0.00 : LBeltrami= 4 ; Linitgues= 1 ; Lposdef= 0 ;
readin :            : 
readin :       0.00 : Lfindzero= 2 ;
readin :            : escale=  0.00000E+00 ; opsilon=  1.00000E+00 ; pcondense=  4.000 ; epsilon=  1.00000E+00 ; wpoloidal= 1.0000 ; upsilon=  1.00000E+00 ;
readin :            : forcetol=  1.00000E-12 ; c05xmax=  1.00000E-06 ; c05xtol=  1.00000E-12 ; c05factor=  1.00000E-04 ; LreadGF= F ; 
readin :            : mfreeits=   0 ; gBntol=  1.00000E-06 ; gBnbld=  6.66000E-01 ;
readin :            : vcasingeps=  1.00000E-12 ; vcasingtol=  1.00000E-08 ; vcasingits=  8 ; vcasingper=  1 ;
readin :            : 
readin :       0.00 : odetol=  1.00E-07 ; nPpts=     0 ;
readin :            : LHevalues= F ; LHevectors= F ; Lperturbed= 0 ; dpp= -1 ; dqq= -1 ; Lcheck=  0 ; Ltiming= F ;
readin :            : 
readin :            : myid=  0 ; Rscale= 1.000000000000000E+01 ;
preset :            : myid=  0 ; Mrad=  4 : Lrad=  4  4
preset :       0.03 : LBsequad= F , LBnewton= F , LBlinear= T ;
preset :            : 
preset :       0.03 : Nquad=  -1 ; mn=   41 ; NGdof=    81 ; NAdof=   497,   534,
preset :            : 
preset :       0.03 : Nt=    32 ; Nz=    32 ; Ntz=     1024 ;
newton :       1.87 :         0  0 ; |f|= 9.00612E-03 ; time=      1.78s ; log|BB|e= -4.66
newton :            :              ;                                     ; log|II|o= -2.63
newton :      50.14 :         0  0 ; |f|= 9.00612E-03 ; time=**********s ; log|BB|e= -4.66
newton :            :              ;                                     ; log|II|o= -2.63
newton :      51.95 :         0  0 ; |f|= 8.81971E-03 ; time=**********s ; log|BB|e= -4.74
newton :            :              ;                                     ; log|II|o= -2.64
newton :      53.80 :         0  0 ; |f|= 8.44867E-03 ; time=**********s ; log|BB|e= -4.97
newton :            :              ;                                     ; log|II|o= -2.67
newton :      55.88 :         0  0 ; |f|= 7.71444E-03 ; time=**********s ; log|BB|e= -4.64
newton :            :              ;                                     ; log|II|o= -2.72
newton :      57.54 :         0  0 ; |f|= 6.28513E-03 ; time=**********s ; log|BB|e= -4.16
newton :            :              ;                                     ; log|II|o= -2.82
newton :      59.08 :         0  0 ; |f|= 3.70632E-03 ; time=**********s ; log|BB|e= -3.86
newton :            :              ;                                     ; log|II|o= -2.95
newton :      60.63 :         0  0 ; |f|= 1.61237E-03 ; time=**********s ; log|BB|e= -3.97
newton :            :              ;                                     ; log|II|o= -3.30
newton :      62.73 :         0  0 ; |f|= 9.16191E-04 ; time=**********s ; log|BB|e= -4.61
newton :            :              ;                                     ; log|II|o= -3.34
newton :      65.37 :         0  0 ; |f|= 2.18780E-03 ; time=**********s ; log|BB|e= -4.31
newton :            :              ;                                     ; log|II|o= -2.90
newton :      67.38 :         0  0 ; |f|= 5.01981E-04 ; time=**********s ; log|BB|e= -5.02
newton :            :              ;                                     ; log|II|o= -3.49
newton :      69.62 :         0  0 ; |f|= 4.87692E-04 ; time=**********s ; log|BB|e= -4.77
newton :            :              ;                                     ; log|II|o= -3.56
newton :      71.14 :         0  0 ; |f|= 2.18769E-04 ; time=**********s ; log|BB|e= -4.98
newton :            :              ;                                     ; log|II|o= -3.90
newton :      72.88 :         0  0 ; |f|= 1.03312E-04 ; time=**********s ; log|BB|e= -5.27
newton :            :              ;                                     ; log|II|o= -4.20
newton :      74.66 :         0  0 ; |f|= 1.54223E-05 ; time=**********s ; log|BB|e= -5.96
newton :            :              ;                                     ; log|II|o= -4.94
newton :      76.87 :         0  0 ; |f|= 1.53018E-05 ; time=**********s ; log|BB|e= -6.06
newton :            :              ;                                     ; log|II|o= -5.06
newton :      79.03 :         0  0 ; |f|= 6.43484E-06 ; time=**********s ; log|BB|e= -6.28
newton :            :              ;                                     ; log|II|o= -5.31
newton :      80.89 :         0  0 ; |f|= 1.05709E-06 ; time=**********s ; log|BB|e= -7.06
newton :            :              ;                                     ; log|II|o= -6.13
newton :      82.87 :         0  0 ; |f|= 1.02794E-07 ; time=**********s ; log|BB|e= -8.34
newton :            :              ;                                     ; log|II|o= -7.05
newton :      84.81 :         0  0 ; |f|= 2.57378E-08 ; time=**********s ; log|BB|e= -9.03
newton :            :              ;                                     ; log|II|o= -7.65
newton :      86.73 :         0  0 ; |f|= 3.96645E-09 ; time=**********s ; log|BB|e=-10.15
newton :            :              ;                                     ; log|II|o= -8.46
newton :      88.25 :         0  0 ; |f|= 3.35880E-10 ; time=**********s ; log|BB|e=-10.73
newton :            :              ;                                     ; log|II|o= -9.52
newton :      89.93 :         0  0 ; |f|= 1.49578E-11 ; time=**********s ; log|BB|e=-11.78
newton :            :              ;                                     ; log|II|o=-10.87
newton :      91.38 :         0  0 ; |f|= 1.74249E-12 ; time=**********s ; log|BB|e=-12.95
newton :            :              ;                                     ; log|II|o=-11.80
newton :      92.64 :         0  0 ; |f|= 2.49069E-13 ; time=**********s ; log|BB|e=-13.99
newton :            :              ;                                     ; log|II|o=-12.63
newton :      93.92 :         0  0 ; |f|= 4.02880E-14 ; time=**********s ; log|BB|e=-14.53
newton :            :              ;                                     ; log|II|o=-13.38
newton :      95.11 :         0  0 ; |f|= 6.12860E-15 ; time=**********s ; log|BB|e=-15.32
newton :            :              ;                                     ; log|II|o=-14.25
newton :            :
newton :      95.19 : finished ; input error    ; ic05p*f= 1 ; its=      0 ,   0 ;
xspech :      96.33 : #freeits=  0 ; |f|= 6.12860E-15 ; time=      1.08s ; log|BB|e=-15.32
xspech :            :              ;                                     ; log|II|o=-14.25
xspech :            :
xspech :      96.62 : myid=  0 : time=    1.61m =   0.03h =  0.00d ;
ending :            : 
ending :      96.62 : myid=  0 ; completion ; time=     96.62s =     1.61m =   0.03h =  0.00d ; date= 2017/07/28 ; time= 03:40:22 ; ext = l2stell                                                     
ending :            :

jbreslau · 2017-07-28T12:35:36Z

Yes, I debugged the changes using fftw/3.3.4 under intel. I'm aware of the formatting bug; I'll fix it today if I get a chance.

jbreslau · 2017-07-28T13:35:31Z

I've looked into the formatting issue. The offending lines are being written by subroutine fcn2() in newton.h. The way that subroutine is written, the values of nFcalls, nDcalls, and lastcpu will always be undefined (and apparently initialized to zero) when the write statement is reached. I think this should be fixed by someone who knows what this subroutine is supposed to be doing.

SRHudson · 2017-08-01T16:53:09Z

Is this formatting issue still a problem? Please let me know what branch.

SRHudson · 2017-08-01T16:58:37Z

Can anyone find a routine for solving a system of real symmetric linear equations? The routine mp00ac calls the NAG routine F04AEF, which does not exploit symmetry. Note that the matrix is NOT positive definite. Otherwise I could use NAG F04ABF.

At present, most of the computation time (according to my way of computing the timing) is taken by mp00ac. I would think that a routine that exploits symmetry would be faster.

Josh: could you, when you have time, please make it a priority to replace F04AEF with a routine for solving real, symmetric linear equations with multiple right hand sides?

jbreslau · 2017-08-01T16:59:43Z

It's not a formatting issue. It's a bug in fcn2. The time elapsed is not computed correctly because the variable storing the starting time is never initialized.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Tue, Aug 1, 2017 at 12:53 PM, Dr. S.R. Hudson ***@***.***> wrote: Is this formatting issue still a problem? Please let me know what branch. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620GMd-8LVo54nP1KnfxOHWFpoP4Foks5sT1f2gaJpZM4OeLaa> .

jbreslau · 2017-08-01T17:02:15Z

There should be something in LAPACK. I'll check it out.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Tue, Aug 1, 2017 at 12:58 PM, Dr. S.R. Hudson ***@***.***> wrote: Can anyone find a routine for solving a system of real symmetric linear equations? The routine mp00ac calls the NAG routine F04AEF, which does not exploit symmetry. Note that the matrix is NOT positive definite. Otherwise I could use NAG F04ABF. At present, most of the computation time (according to my way of computing the timing) is taken by mp00ac. I would think that a routine that exploits symmetry would be faster. Josh: could you, when you have time, please make it a priority to replace F04AEF with a routine for solving real, symmetric linear equations with multiple right hand sides? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620IXD2kOYd-Qkwazx0ZwM_WwNErNoks5sT1k9gaJpZM4OeLaa> .

jbreslau · 2017-08-01T20:23:25Z

I just commited version 109, which replaces both calls to F04AEF from mp00ac.h in the NAG_REPLACE branch with calls to LAPACK routine DSYSVX, the expert driver for iterative linear solves of systems with general real symmetric matrices. For the l2stell test case, this speeds up execution by over 30%.

There is an equivalent LAPACK routine for symmetric positive definite matrices, which should probably be substituted for F04ABF in the same routine in the interest of de-NAG-ification, but I wouldn't expect much speedup from that.

SRHudson · 2017-08-01T21:30:19Z

I am having trouble checking out NAG_REPLACE. I get the message
"Your branch is ahead of 'origin/NAG_REPLACE' by 15 commits.
(use "git push" to publish your local commits)"
and I don't see any of the changes that have been made. I have made no changes, at least not intentionally. Perhaps I need to delete my local branch and re-checkout NAG_REPLACE?

zhucaoxiang · 2017-08-01T22:10:51Z

I think it's because that you made a lot changes recently and some of them were committed to your local branch NAG_REPLACE. You can try this to compare your local branch with the public branch. git diff NAG_REPLACE origin/NAG_REPLACE Then you will see the differences and decide whether you want to delete your local stuffs. If you want to, you can try git pull --hard origin/NAG_REPLACE

…

On 1 Aug 2017, at 5:30 PM, Dr. S.R. Hudson ***@***.***> wrote: I am having trouble checking out NAG_REPLACE. I get the message "Your branch is ahead of 'origin/NAG_REPLACE' by 15 commits. (use "git push" to publish your local commits)" and I don't see any of the changes that have been made. I have made no changes, at least not intentionally. Perhaps I need to delete my local branch and re-checkout NAG_REPLACE? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

SRHudson · 2017-08-02T18:24:33Z

I THINK I HAVE MESSED UP NAG_REPLACE!!
I tried
git pull origin NAG_REPLACE
and it seems to have performed a "merge" and I can't find any of the work that Caoxiang and Josh have been doing. I hope I didn't delete anything.

zhucaoxiang · 2017-08-02T18:26:11Z

I can have a look. And even it's messed, we can simply revert. Don't worry!

SRHudson · 2017-08-02T19:18:14Z

Problem solved (thanks to Caoxiang and Josh).

SRHudson · 2017-08-02T21:02:55Z

I just tried to compile xspec on NAG_REPLACE, and I get the following error:
global.F90(6665): error #5102: Cannot open include file 'fftw3.f03'
Please advise . . .

zhucaoxiang · 2017-08-02T21:04:05Z

you could type

module load fftw/3.3.4

SRHudson · 2017-08-02T21:15:24Z

Ok, module load fftw/3.3.4 works. What about
undefined reference to fftw_execute_dft' undefined reference to fftw_plan_dft_2d'
Note that I only get these errors when compiling ">make dspec".

SRHudson · 2017-08-02T22:20:40Z

Josh: I fixed the counters and timing variables in newton, fcn1 and fcn2. Please pull NAG_REPLACE.

zhucaoxiang · 2017-08-03T00:13:33Z

Everything works fine for me when I typed make dspec

jbreslau · 2017-08-03T13:40:02Z

Thanks, that fixed the problem. I pushed a fix to the Makefile to link the fftw library when building dspec; it should work now.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Wed, Aug 2, 2017 at 6:20 PM, Dr. S.R. Hudson ***@***.***> wrote: Josh: I fixed the counters and timing variables in newton, fcn1 and fcn2. Please pull NAG_REPLACE. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620G-fB_wDk1f6jL4fbDvGOCNbj0Lrks5sUPY4gaJpZM4OeLaa> .

jbreslau · 2017-08-03T14:14:11Z

I notice that there are some more f04aef calls in tr00ab.h. Can any assumptions be made about the symmetry or lack thereof of the matrix they are factoring? The linear solver could be replaced with DGESVX, DSYSVX, or DPOSVX depending on its properties...

jbreslau · 2017-08-03T15:45:46Z

Also, do we have a test case where Lposdef.eq.1, so I can test the replacement of F04ABF in mp00ac.h with the equivalent LAPACK call?

SRHudson · 2017-08-03T17:03:38Z

It seems that I can construct xspec and dspec on NAG_REPLACE.
I will investigate tr00ab but it seems that there is no obvious reason why the relevant matrix will be symmetric.
Perhaps we can ignore Lposdef=1. Originally, I enforced the boundary conditions that the normal magnetic field be zero on the interfaces directly, and in this case the matrix were very frequently (but not always) positive definite. However, the source was horrendous. Later, I introduced the constraints using Lagrange multipliers, see Eqn(14) in
http://w3.pppl.gov/~shudson/Spec/matrix.pdf
The source is a lot lot cleaner, and this is valuable. As often happens when using Lagrange multipliers, the positive-definiteness was destroyed, and more degrees of freedom are introduced (namely the Lagrange multipliers), so the code is actually a little slower as a result. For now, I suggest eliminating the Lposdef=1 option to keep the code short. I don't this this option will be used in the foreseeable future.

SRHudson · 2017-08-03T21:12:51Z

Josh: I pushed my change to tr00ab that comments out (using ifdef LSPARSE) all the code related to the real-space calculation of the transformation to straight fieldline coordinates (which provides the rotational-transform). Please pull . . .
Also, I am getting lots of errors with the gaussian abscissae construction . . . Please check and/or let's discuss tomorrow.

jbreslau · 2017-08-03T21:20:52Z

Please point me to the input file that's giving you problems with the gaussian abscissae, and I'll try to debug.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Thu, Aug 3, 2017 at 5:12 PM, Dr. S.R. Hudson ***@***.***> wrote: Josh: I pushed my change to tr00ab that comments out (using ifdef LSPARSE) all the code related to the real-space calculation of the transformation to straight fieldline coordinates (which provides the rotational-transform). Please pull . . . Also, I am getting lots of errors with the gaussian abscissae construction . . . Please check and/or let's discuss tomorrow. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620MgWDxIgrmN_xgh_VUw7-SrCANkCks5sUjfTgaJpZM4OeLaa> .

SRHudson · 2017-08-10T12:56:58Z

Josh: please replace the eigenvalue / eigenvector routine in hesian, F02EBF, at your earliest convenience. After doing so, Joaquim should merge the NAG_REPLACE branch with master so that we can use the same code to examine the eigenvalues of the "hessian".

jbreslau · 2017-08-10T13:35:38Z

I had already started looking into this, but found that the routine was not called for the test case. Am I right in thinking I just need to set Lcheck to 5 in the input file to activate it?

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Thu, Aug 10, 2017 at 8:56 AM, Dr. S.R. Hudson ***@***.***> wrote: Josh: please replace the eigenvalue / eigenvector routine in hesian, F02EBF, at your earliest convenience. After doing so, Joaquim should merge the NAG_REPLACE branch with master so that we can use the same code to examine the eigenvalues of the "hessian". — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620JNrg-haKwuVtTy5oTvLENZXQixSks5sWv4agaJpZM4OeLaa> .

SRHudson · 2017-08-10T13:39:54Z

No, it is not to do with Lcheck. Lcheck=5 will test the analytic construction of the hessian with a finite difference estimate.

To call F02EBF, choose either LHevalues = T or LHevectors = T.

jbreslau · 2017-08-10T13:43:30Z

It looks like Sam already replaced F02EBF in the NAG_REPLACE branch with the LAPACK routine DGEEV on July 24th.

…

______________________________________________________________________ Joshua Breslau Princeton Plasma Physics Laboratory Tel.: (609) 243-2677 Office: A-129 E-mail: jbreslau@pppl.gov Princeton, NJ 08543 Fax: (609) 243-2662

On Thu, Aug 10, 2017 at 9:39 AM, Dr. S.R. Hudson ***@***.***> wrote: No, it is not to do with Lcheck. Lcheck=5 will test the analytic construction of the hessian with a finite difference estimate. To call F02EBF, choose either LHevalues = T or LHevectors = T. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac620GmvntWvrAR72gEN-6FkxE_ggC7jks5sWwgqgaJpZM4OeLaa> .

SRHudson · 2017-08-10T13:45:55Z

That was fast. Well done Josh! I knew I could depend on you to get this done quickly!

Joaquim: would it be appropriate to merge NAG_REPLACE with master?

jloizu · 2017-08-10T14:17:50Z

I have not worked on that branch (NAG_REPLACE). This means, if you guys think that it is a stable branch, and that it is a good moment to merge it into master, then make a "pull and merge request". Then, I will pull that branch, and make some tests to be sure that it compiles, runs testcases, and produces "the same" results as the current master branch. Only then I would merge it.

I can do this today, once you open the pull request (which is essentially a sign that you are waiting for people to pull, test, and merge).

jloizu · 2017-08-10T17:17:42Z

OK, I had to modify the Makefile to be able to compile in IPP (so far I could only make it work on the local workstation, somehow the fftw gives me problems on the cluster - I need to explore this further).

I tested the l2stell.sp case with both branches (master and NAG_REPLACE). Applying my comparison routine to both outputs, I get

Dabs : 9.5923e-14
Drel : 4.2528e-09
Estimate for df : 2.1636e-18
The two outputs can be considered the same

and thus I consider that in principle the two branches can be merged after I solve the compilation issue in the IPP cluster and the conflicts between the two branches are resolved (has to be done by hand, but that should be quite easy).

I am going tomorrow morning on holiday, so I am not sure I can solve the compilation issue. Since the master versus NAG_REPLACE comparison was positive, I think you can go on if you whish to merge and not wait for 3 weeks.

Just one comment: the l2stell.sp run becomes slower with the NAG_REPLACE branch. I used the timing option of @SRHudson and got total of 23sec (master) versus 35 (nag), the difference being in ~10 extra seconds spent in ma00aa.

SRHudson · 2017-08-10T17:26:03Z

I made what i thought were several speed improvements to ma00aa, which later was merged with master. So it makes sense that master is faster than the NAG_REPLACE branch.
This might make it tricky to merge . . . ? Can git determine which files and edits are most recent?

jloizu · 2017-08-10T17:31:55Z

I see what you mean. It may be more tricky than I thought, indeed. Unfortunately in this case GIT says that it cannot do it automatically. We have to do it by hand, I think. @zhucaoxiang , @jbreslau , @lazersos, what do you think?

jloizu · 2017-08-10T17:39:40Z

I just saw (in the pull-request analysis) that ma00aa is NOT among the "conflicting files". So we only need to take care of the other files.

lazersos added enhancement help wanted labels Jul 20, 2017

lazersos assigned lazersos and jloizu Jul 20, 2017

lazersos assigned cses and jbreslau Jul 21, 2017

jloizu mentioned this issue Jul 24, 2017

Compiling and running SPEC in IPP #2

Closed

jloizu mentioned this issue Aug 10, 2017

Nag replace #21

Merged

jloizu closed this as completed Nov 29, 2017

jloizu mentioned this issue Jan 18, 2018

Execution error with dspec #29

Closed

Substitution of NAG libraries #1

Substitution of NAG libraries #1

Comments

lazersos commented Jul 20, 2017

lazersos commented Jul 20, 2017 • edited Loading

lazersos commented Jul 20, 2017 • edited Loading

zhucaoxiang commented Jul 21, 2017 • edited Loading

lazersos commented Jul 21, 2017

lazersos commented Jul 24, 2017

jbreslau commented Jul 24, 2017 via email

lazersos commented Jul 24, 2017

jbreslau commented Jul 24, 2017 via email

jbreslau commented Jul 25, 2017 • edited Loading

jloizu commented Jul 25, 2017

jbreslau commented Jul 26, 2017

zhucaoxiang commented Jul 26, 2017

jbreslau commented Jul 26, 2017

lazersos commented Jul 26, 2017

jbreslau commented Jul 26, 2017 via email

jbreslau commented Jul 26, 2017 • edited Loading

lazersos commented Jul 28, 2017

jbreslau commented Jul 28, 2017

jbreslau commented Jul 28, 2017

SRHudson commented Aug 1, 2017

SRHudson commented Aug 1, 2017

jbreslau commented Aug 1, 2017 via email

jbreslau commented Aug 1, 2017 via email

jbreslau commented Aug 1, 2017

SRHudson commented Aug 1, 2017

zhucaoxiang commented Aug 1, 2017 via email

SRHudson commented Aug 2, 2017

zhucaoxiang commented Aug 2, 2017

SRHudson commented Aug 2, 2017

SRHudson commented Aug 2, 2017

zhucaoxiang commented Aug 2, 2017

SRHudson commented Aug 2, 2017

SRHudson commented Aug 2, 2017

zhucaoxiang commented Aug 3, 2017

jbreslau commented Aug 3, 2017 via email

jbreslau commented Aug 3, 2017

jbreslau commented Aug 3, 2017

SRHudson commented Aug 3, 2017

SRHudson commented Aug 3, 2017

jbreslau commented Aug 3, 2017 via email

SRHudson commented Aug 10, 2017

jbreslau commented Aug 10, 2017 via email

SRHudson commented Aug 10, 2017

jbreslau commented Aug 10, 2017 via email

SRHudson commented Aug 10, 2017

jloizu commented Aug 10, 2017

jloizu commented Aug 10, 2017

SRHudson commented Aug 10, 2017

jloizu commented Aug 10, 2017

jloizu commented Aug 10, 2017

lazersos commented Jul 20, 2017 •

edited

Loading

lazersos commented Jul 20, 2017 •

edited

Loading

zhucaoxiang commented Jul 21, 2017 •

edited

Loading

jbreslau commented Jul 25, 2017 •

edited

Loading

jbreslau commented Jul 26, 2017 •

edited

Loading