Verify numerical primitives #129

dev-zero · 2018-12-18T13:28:28Z

Out of interest I was looking into the code of CP2K to check how the L2-norm of vectors is calculated.
I got the 3 equivalent forms sqrt(x(1)*x(1) + ...), sqrt(x(1)**2 + ... and sqrt(dot_product(x,x)) but not many usages of the F2008 built-in norm2.
Being curious I looked at the machine code a typical compiler generates (see https://godbolt.org/z/c5H1Ih for a comparison of the different variants) and noticed that the built-in norm2 is much more elaborate.

Not being an expert on assembler my guess is that this is the implementation of the standard-optional It is recommended that the processor compute the result without undue overflow or underflow. (see 13.7.123 of the Fortran standard).

A quick search in the DBCSR source code led me to the following (and it is likely not the only place):

dbcsr/src/ops/dbcsr_operations.F

Lines 2125 to 2130 in d03f22f

    
           norm_vector%d%r_sp(col_offset + j - 1) & 
        
              = norm_vector%d%r_sp(col_offset + j - 1) & 
        
                + data_a%d%r2_sp(i, j)**2 
        
           norm_vector%d%r_sp(row_offset + i - 1) & 
        
              = norm_vector%d%r_sp(row_offset + i - 1) & 
        
                + data_a%d%r2_sp(i, j)**2

which seems to me a rather careless way on how to calculate the norm, given that compilers take more care when calculating the norm via an intrinsic and after a quick look into https://www.springer.com/us/book/9780817647056

Am I missing something here?
@hfp maybe you could share some insight?

The text was updated successfully, but these errors were encountered:

hfp · 2018-12-19T09:06:00Z

@dev-zero Fortran built-ins are preferable. I can only find two reason for not using the intrinsics:

Insufficient functionality (e.g., Kahan compensation needed, or low-accuracy approximation sufficient)
Limited portability (e.g., not available in F2008 standard)

For the latter point, we can use __F2008 preprocessor definition (at least somewhat introduced in CP2K's sources). However, this may spread compile-time toggles all over the code if the code-snippet is not wrapped into a (utility-)function. Btw, there is more legacy code for functions such as ERF (at least CP2K implements this among some others) which are meanwhile covered by F-intrinsics. Regarding code style (as pointed out in your example code above), even array expressions may be preferred sometimes over element-wise loop enumeration.

For the norm2 alike, we did some pass over CP2K's source some (longer) time ago and got rid of things like IF (norm2(x) .LE. norm2(y)) THEN (an expression where sqrt is superfluous).

dev-zero · 2018-12-19T09:52:49Z

@hfp thank you very much for the details. So it is indeed something which should be done at some point.
For providing wrapper functions: built-ins seem to get inlined anyway, while functions from different compilation units don't unless you use LTO (or the forceinline directives for Intel or Cray). Any guidelines on how to solve this in a portable way?

hfp · 2018-12-19T10:29:14Z

@dev-zero I believe it is all about picking a solution that is not awkward or bulky. Let's see:

( Only solve the inline-issue on a case-by-case basis i.e., if it is a performance problem )
Use Fypp to decide about USE or INCLUDE (the latter may duplicate symbols but worth an experiment)
Develop a collection of wrapped compiler-specific directives and/or attributes e.g., create something that expands to forceinline. May expand to "nothing" if the compiler does not support a suitable directive/attribute.

The latter point likely needs Fypp since directives may not be expanded from FPP-macros (FPP-capabilities seem to be rather limited when compared to C). As a side-note, C99 supports _Pragma (or prior to C99 most compilers provide an intrinsic to achieve the same) which allows to place a pragma (directive in F-speak) using this built-in function (takes a string representing the directive). The latter in turn allows to write preprocessor macros that expand to a directive.

Regarding forceinline, there are two flavours at least with Intel Compiler: (1) decorate a function to be (strongly/forceably) marked as an inline-candidate, (2) decorate the the call-site to inline the (conceptually) called function. The latter is more sophisticated, but the former may be more practical/portable and even sufficient.

Developing a collection of portable i.e., wrapped (compiler-specific) directives or attributes may be the most suitable and handy solution. It can go beyond the scope of inlining functions and also provide other primitives e.g., a directive to unroll a loop a certain number of times.

oschuett · 2018-12-19T10:30:42Z

The NORM2 intrinsic was added in gfortran 4.6. We know from the user survey that basically everybody uses gfortran 4.8 or newer. For ifort most people use the 17. version. So, it seems we can simple start using NORM2.

dev-zero · 2018-12-19T12:23:11Z

@oschuett norm2 was just an example, but for CP2K I just updated the coding conventions with this example and made some changes to match reality (like F2008 compatibility, MPI-3 standard).

oschuett · 2018-12-19T13:19:50Z

Yes, most of Fortran 2008 was already supported in gfortran 4.8. So, thanks for updating the conventions. Should we then also get ride of -D__F2008 ?

alazzaro · 2020-04-29T21:17:41Z

I went to this issue because I'm trying to improve how we evaluate the norms.
It turns out that the case you are mention in

dbcsr/src/ops/dbcsr_operations.F

Lines 2125 to 2130 in d03f22f

    
           norm_vector%d%r_sp(col_offset + j - 1) & 
        
              = norm_vector%d%r_sp(col_offset + j - 1) & 
        
                + data_a%d%r2_sp(i, j)**2 
        
           norm_vector%d%r_sp(row_offset + i - 1) & 
        
              = norm_vector%d%r_sp(row_offset + i - 1) & 
        
                + data_a%d%r2_sp(i, j)**2

cannot be replaced by norm2 since this is a "distribute" norm of the entire matrix (see the mp_sum). As far as I can see, the only place where we do a local norm is here

dbcsr/src/mm/dbcsr_mm_common.F

Line 603 in 34d5165

vals(blk) = REAL(SUM(DATA(bp:bpe)**2), KIND=sp)

However, I did a test and found that the norm is slower than the current optimized code, so I would prefer to leave it as it is.
@dev-zero Giving my conclusion, can we close this issue then?

dev-zero · 2020-04-30T06:51:26Z

@alazzaro thanks for looking into this. If it is ok at this point to not check for under- or overflow, then yes.

alazzaro · 2020-04-30T06:55:46Z

I'm doing some other replacements, see 021482d. Another one is almost done...

alazzaro · 2020-04-30T09:16:49Z

OK, after more performance tests, I found that the best performance comes with the DOT function. So, we use norm2 where performance is not critical (dbcsr_filter) and DOT in the other place. Closing via 021482d and 44032cf

dev-zero added the question Further information is requested label Dec 18, 2018

dev-zero added enhancement New feature or request and removed question Further information is requested labels Dec 19, 2018

oschuett mentioned this issue Dec 21, 2018

Move to Fortran 2008 intrinsic and remove -D__F2008 flag cp2k/cp2k#142

Closed

dev-zero mentioned this issue Jan 16, 2019

switch to F2008 #136

Merged

alazzaro added this to the v3.0 milestone Apr 15, 2020

alazzaro closed this as completed Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify numerical primitives #129

Verify numerical primitives #129

dev-zero commented Dec 18, 2018

hfp commented Dec 19, 2018

dev-zero commented Dec 19, 2018 •

edited

hfp commented Dec 19, 2018

oschuett commented Dec 19, 2018

dev-zero commented Dec 19, 2018

oschuett commented Dec 19, 2018

alazzaro commented Apr 29, 2020

dev-zero commented Apr 30, 2020

alazzaro commented Apr 30, 2020

alazzaro commented Apr 30, 2020

Verify numerical primitives #129

Verify numerical primitives #129

Comments

dev-zero commented Dec 18, 2018

hfp commented Dec 19, 2018

dev-zero commented Dec 19, 2018 • edited

hfp commented Dec 19, 2018

oschuett commented Dec 19, 2018

dev-zero commented Dec 19, 2018

oschuett commented Dec 19, 2018

alazzaro commented Apr 29, 2020

dev-zero commented Apr 30, 2020

alazzaro commented Apr 30, 2020

alazzaro commented Apr 30, 2020

dev-zero commented Dec 19, 2018 •

edited