Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify numerical primitives #129

Closed
dev-zero opened this issue Dec 18, 2018 · 10 comments
Closed

Verify numerical primitives #129

dev-zero opened this issue Dec 18, 2018 · 10 comments
Labels
enhancement New feature or request
Milestone

Comments

@dev-zero
Copy link
Contributor

Out of interest I was looking into the code of CP2K to check how the L2-norm of vectors is calculated.
I got the 3 equivalent forms sqrt(x(1)*x(1) + ...), sqrt(x(1)**2 + ... and sqrt(dot_product(x,x)) but not many usages of the F2008 built-in norm2.
Being curious I looked at the machine code a typical compiler generates (see https://godbolt.org/z/c5H1Ih for a comparison of the different variants) and noticed that the built-in norm2 is much more elaborate.

Not being an expert on assembler my guess is that this is the implementation of the standard-optional It is recommended that the processor compute the result without undue overflow or underflow. (see 13.7.123 of the Fortran standard).

A quick search in the DBCSR source code led me to the following (and it is likely not the only place):

norm_vector%d%r_sp(col_offset + j - 1) &
= norm_vector%d%r_sp(col_offset + j - 1) &
+ data_a%d%r2_sp(i, j)**2
norm_vector%d%r_sp(row_offset + i - 1) &
= norm_vector%d%r_sp(row_offset + i - 1) &
+ data_a%d%r2_sp(i, j)**2

which seems to me a rather careless way on how to calculate the norm, given that compilers take more care when calculating the norm via an intrinsic and after a quick look into https://www.springer.com/us/book/9780817647056

Am I missing something here?
@hfp maybe you could share some insight?

@dev-zero dev-zero added the question Further information is requested label Dec 18, 2018
@hfp
Copy link
Member

hfp commented Dec 19, 2018

@dev-zero Fortran built-ins are preferable. I can only find two reason for not using the intrinsics:

  • Insufficient functionality (e.g., Kahan compensation needed, or low-accuracy approximation sufficient)
  • Limited portability (e.g., not available in F2008 standard)

For the latter point, we can use __F2008 preprocessor definition (at least somewhat introduced in CP2K's sources). However, this may spread compile-time toggles all over the code if the code-snippet is not wrapped into a (utility-)function. Btw, there is more legacy code for functions such as ERF (at least CP2K implements this among some others) which are meanwhile covered by F-intrinsics. Regarding code style (as pointed out in your example code above), even array expressions may be preferred sometimes over element-wise loop enumeration.

For the norm2 alike, we did some pass over CP2K's source some (longer) time ago and got rid of things like IF (norm2(x) .LE. norm2(y)) THEN (an expression where sqrt is superfluous).

@dev-zero dev-zero added enhancement New feature or request and removed question Further information is requested labels Dec 19, 2018
@dev-zero
Copy link
Contributor Author

dev-zero commented Dec 19, 2018

@hfp thank you very much for the details. So it is indeed something which should be done at some point.
For providing wrapper functions: built-ins seem to get inlined anyway, while functions from different compilation units don't unless you use LTO (or the forceinline directives for Intel or Cray). Any guidelines on how to solve this in a portable way?

@hfp
Copy link
Member

hfp commented Dec 19, 2018

@dev-zero I believe it is all about picking a solution that is not awkward or bulky. Let's see:

  • ( Only solve the inline-issue on a case-by-case basis i.e., if it is a performance problem )
  • Use Fypp to decide about USE or INCLUDE (the latter may duplicate symbols but worth an experiment)
  • Develop a collection of wrapped compiler-specific directives and/or attributes e.g., create something that expands to forceinline. May expand to "nothing" if the compiler does not support a suitable directive/attribute.

The latter point likely needs Fypp since directives may not be expanded from FPP-macros (FPP-capabilities seem to be rather limited when compared to C). As a side-note, C99 supports _Pragma (or prior to C99 most compilers provide an intrinsic to achieve the same) which allows to place a pragma (directive in F-speak) using this built-in function (takes a string representing the directive). The latter in turn allows to write preprocessor macros that expand to a directive.

Regarding forceinline, there are two flavours at least with Intel Compiler: (1) decorate a function to be (strongly/forceably) marked as an inline-candidate, (2) decorate the the call-site to inline the (conceptually) called function. The latter is more sophisticated, but the former may be more practical/portable and even sufficient.

Developing a collection of portable i.e., wrapped (compiler-specific) directives or attributes may be the most suitable and handy solution. It can go beyond the scope of inlining functions and also provide other primitives e.g., a directive to unroll a loop a certain number of times.

@oschuett
Copy link
Member

The NORM2 intrinsic was added in gfortran 4.6. We know from the user survey that basically everybody uses gfortran 4.8 or newer. For ifort most people use the 17. version. So, it seems we can simple start using NORM2.

@dev-zero
Copy link
Contributor Author

@oschuett norm2 was just an example, but for CP2K I just updated the coding conventions with this example and made some changes to match reality (like F2008 compatibility, MPI-3 standard).

@oschuett
Copy link
Member

Yes, most of Fortran 2008 was already supported in gfortran 4.8. So, thanks for updating the conventions. Should we then also get ride of -D__F2008 ?

@alazzaro
Copy link
Member

I went to this issue because I'm trying to improve how we evaluate the norms.
It turns out that the case you are mention in

norm_vector%d%r_sp(col_offset + j - 1) &
= norm_vector%d%r_sp(col_offset + j - 1) &
+ data_a%d%r2_sp(i, j)**2
norm_vector%d%r_sp(row_offset + i - 1) &
= norm_vector%d%r_sp(row_offset + i - 1) &
+ data_a%d%r2_sp(i, j)**2
cannot be replaced by norm2 since this is a "distribute" norm of the entire matrix (see the mp_sum). As far as I can see, the only place where we do a local norm is here
vals(blk) = REAL(SUM(DATA(bp:bpe)**2), KIND=sp)

However, I did a test and found that the norm is slower than the current optimized code, so I would prefer to leave it as it is.
@dev-zero Giving my conclusion, can we close this issue then?

@dev-zero
Copy link
Contributor Author

@alazzaro thanks for looking into this. If it is ok at this point to not check for under- or overflow, then yes.

@alazzaro
Copy link
Member

I'm doing some other replacements, see 021482d. Another one is almost done...

@alazzaro
Copy link
Member

OK, after more performance tests, I found that the best performance comes with the DOT function. So, we use norm2 where performance is not critical (dbcsr_filter) and DOT in the other place. Closing via 021482d and 44032cf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants