Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

Merged
merged 4 commits into from
Aug 30, 2021

Conversation

weslleyspereira
Copy link
Collaborator

@weslleyspereira weslleyspereira commented Jul 26, 2021

[edited]

Description

There is a known bug in LAPACK related to workspaces smaller than the value computed by the routine. This is because the routines use a floating-point variable to return the integer workspace size. See #600. The intent here is to mitigate the problem of overestimating the workspace when doing the conversion INTEGER -> REAL (or DOUBLE PRECISION) highlighted in #600. I do not solve the problem of a possible overflow in LWORK.

This PR applies the same strategy from the MAGMA package to solve the issue (See #600 (comment) from @mgates3).

Checklist

  • The documentation has been updated.
  • Fix all *GESDD routines.
  • Fix all the routines in LAPACK.

@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #605 (fe1e922) into master (aa631b4) will not change coverage.
The diff coverage is 0.00%.

❗ Current head fe1e922 differs from pull request most recent head c7400e5. Consider uploading reports for the commit c7400e5 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master     #605   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1896    +2     
  Lines      199133   199143   +10     
=======================================
- Misses     199133   199143   +10     
Impacted Files Coverage Δ
INSTALL/droundup_lwork.f 0.00% <0.00%> (ø)
INSTALL/sroundup_lwork.f 0.00% <0.00%> (ø)
SRC/cgesdd.f 0.00% <0.00%> (ø)
SRC/dgesdd.f 0.00% <0.00%> (ø)
SRC/sgesdd.f 0.00% <0.00%> (ø)
SRC/zgesdd.f 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa631b4...c7400e5. Read the comment docs.

@luszczek
Copy link

Please see my comment to #600 If integer LWORK overflow then adding EPS won't help.

@weslleyspereira
Copy link
Collaborator Author

weslleyspereira commented Jul 27, 2021

Thanks for the comments @luszczek. The intent here is to mitigate the problem of overestimating the workspace when doing the conversion INTEGER -> REAL (or DOUBLE PRECISION) highlighted in #600. I do not solve the problem of a possible overflow in LWORK. I will update the description. [updated]

…PS))

Test results:

 SROUNDUP_LWORK(0)        =    0.00000000
 SROUNDUP_LWORK(1)        =    1.00000000
 SROUNDUP_LWORK(317)      =    317.000000
 DROUNDUP_LWORK(0)        =    0.0000000000000000
 DROUNDUP_LWORK(1)        =    1.0000000000000000
 DROUNDUP_LWORK(317)      =    317.00000000000000

 Tests with X = 16777216 = 2**24
X-2 = 16777214; SROUNDUP_LWORK(X-2) = 16777214.0
X-1 = 16777215; SROUNDUP_LWORK(X-1) = 16777215.0
X   = 16777216; SROUNDUP_LWORK(X  ) = 16777218.0
X+1 = 16777217; SROUNDUP_LWORK(X+1) = 16777218.0
X+2 = 16777218; SROUNDUP_LWORK(X+2) = 16777220.0

 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; SROUNDUP_LWORK(X-2) = 9007200328482816.0
X-1 = 9007199254740991; SROUNDUP_LWORK(X-1) = 9007200328482816.0
X   = 9007199254740992; SROUNDUP_LWORK(X  ) = 9007200328482816.0
X+1 = 9007199254740993; SROUNDUP_LWORK(X+1) = 9007200328482816.0
X+2 = 9007199254740994; SROUNDUP_LWORK(X+2) = 9007200328482816.0

 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; DROUNDUP_LWORK(X-2) = 9007199254740990.0
X-1 = 9007199254740991; DROUNDUP_LWORK(X-1) = 9007199254740991.0
X   = 9007199254740992; DROUNDUP_LWORK(X  ) = 9007199254740994.0
X+1 = 9007199254740993; DROUNDUP_LWORK(X+1) = 9007199254740994.0
X+2 = 9007199254740994; DROUNDUP_LWORK(X+2) = 9007199254740996.0
@weslleyspereira
Copy link
Collaborator Author

weslleyspereira commented Jul 27, 2021

Two commits with updates in the ROUNDUP_LWORK functions:

  1. ROUNDUP_LWORK functions now always return an integer
  2. Fix bug: ROUNDUP_LWORK must use (1+EPSILON(0)) instead of (1+LAMCH(EPS))

With (1), the roundup routines always return a floating-point number that is an integer. I do that by checking whether

SROUNDUP_LWORK .GE. REAL(RADIX(0.0E+0))**DIGITS(0.0E+0)  

The latter is valid for the single-precision case. I used a similar test for the double-precision case.

The code in MAGMA uses DLAMCH('E'). I am not sure this is the correct option. I ran a few tests (below) and saw that one must use EPSILON(0) (or 2*LAMCH(EPS)) to get the desired rounding up.

Some tests:

I compiled the LAPACK with -fdefault-integer-8 and used this gist to obtain:

 SROUNDUP_LWORK(0)        =    0.00000000    
 SROUNDUP_LWORK(1)        =    1.00000000    
 SROUNDUP_LWORK(317)      =    317.000000    
 DROUNDUP_LWORK(0)        =    0.0000000000000000     
 DROUNDUP_LWORK(1)        =    1.0000000000000000     
 DROUNDUP_LWORK(317)      =    317.00000000000000     
 
 Tests with X = 16777216 = 2**24
X-2 = 16777214; SROUNDUP_LWORK(X-2) = 16777214.0
X-1 = 16777215; SROUNDUP_LWORK(X-1) = 16777215.0
X   = 16777216; SROUNDUP_LWORK(X  ) = 16777218.0
X+1 = 16777217; SROUNDUP_LWORK(X+1) = 16777218.0
X+2 = 16777218; SROUNDUP_LWORK(X+2) = 16777220.0
 
 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; SROUNDUP_LWORK(X-2) = 9007200328482816.0
X-1 = 9007199254740991; SROUNDUP_LWORK(X-1) = 9007200328482816.0
X   = 9007199254740992; SROUNDUP_LWORK(X  ) = 9007200328482816.0
X+1 = 9007199254740993; SROUNDUP_LWORK(X+1) = 9007200328482816.0
X+2 = 9007199254740994; SROUNDUP_LWORK(X+2) = 9007200328482816.0
 
 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; DROUNDUP_LWORK(X-2) = 9007199254740990.0
X-1 = 9007199254740991; DROUNDUP_LWORK(X-1) = 9007199254740991.0
X   = 9007199254740992; DROUNDUP_LWORK(X  ) = 9007199254740994.0
X+1 = 9007199254740993; DROUNDUP_LWORK(X+1) = 9007199254740994.0
X+2 = 9007199254740994; DROUNDUP_LWORK(X+2) = 9007199254740996.0

just to be sure the output has only 0 decimal parts. The gist also applies the CEILING and FLOOR to the result t guarantee there is no decimal part.

@weslleyspereira
Copy link
Collaborator Author

Verifying the round down/up in the conversion from INTEGER(8) to REAL and DOUBLE PRECISION:

  1. Tests with X = 16777216 = 2**24
I X + I REAL(X+1) DBLE(X+1)
-2 16777214 16777214.0 16777214.0
-1 16777215 16777215.0 16777215.0
0 16777216 16777216.0 16777216.0
1 16777217 16777216.0 16777217.0
2 16777218 16777218.0 16777218.0
3 16777219 16777220.0 16777219.0
4 16777220 16777220.0 16777220.0
5 16777221 16777220.0 16777221.0
6 16777222 16777222.0 16777222.0
7 16777223 16777224.0 16777223.0
8 16777224 16777224.0 16777224.0
9 16777225 16777224.0 16777225.0
10 16777226 16777226.0 16777226.0
11 16777227 16777228.0 16777227.0
12 16777228 16777228.0 16777228.0
13 16777229 16777228.0 16777229.0
14 16777230 16777230.0 16777230.0
15 16777231 16777232.0 16777231.0
16 16777232 16777232.0 16777232.0
17 16777233 16777232.0 16777233.0
18 16777234 16777234.0 16777234.0
19 16777235 16777236.0 16777235.0
20 16777236 16777236.0 16777236.0
  1. Tests with X = 9007199254740992 = 2**53
I X + I REAL(X+1) DBLE(X+1)
-2 9007199254740990 9007199254740992.0 9007199254740990.0
-1 9007199254740991 9007199254740992.0 9007199254740991.0
0 9007199254740992 9007199254740992.0 9007199254740992.0
1 9007199254740993 9007199254740992.0 9007199254740992.0
2 9007199254740994 9007199254740992.0 9007199254740994.0
3 9007199254740995 9007199254740992.0 9007199254740996.0
4 9007199254740996 9007199254740992.0 9007199254740996.0
5 9007199254740997 9007199254740992.0 9007199254740996.0
6 9007199254740998 9007199254740992.0 9007199254740998.0
7 9007199254740999 9007199254740992.0 9007199254741000.0
8 9007199254741000 9007199254740992.0 9007199254741000.0
9 9007199254741001 9007199254740992.0 9007199254741000.0
10 9007199254741002 9007199254740992.0 9007199254741002.0
11 9007199254741003 9007199254740992.0 9007199254741004.0
12 9007199254741004 9007199254740992.0 9007199254741004.0
13 9007199254741005 9007199254740992.0 9007199254741004.0
14 9007199254741006 9007199254740992.0 9007199254741006.0
15 9007199254741007 9007199254740992.0 9007199254741008.0
16 9007199254741008 9007199254740992.0 9007199254741008.0
17 9007199254741009 9007199254740992.0 9007199254741008.0
18 9007199254741010 9007199254740992.0 9007199254741010.0
19 9007199254741011 9007199254740992.0 9007199254741012.0
20 9007199254741012 9007199254740992.0 9007199254741012.0

@weslleyspereira
Copy link
Collaborator Author

Another related test shows how REAL(INT( ... )) and DBLE(INT( ... )) behave for huge floating-point numbers. Note in the results that INT(REAL(HUGE( X ))) and INT(REAL(HUGE( X )) * 2) both return the lowest negative integer of the type of X.

      PROGRAM test_convert
      IMPLICIT NONE

      INTEGER(4)           intX
      INTEGER(8)           X
      REAL                 realX
      DOUBLE PRECISION     dbleX

      INTRINSIC            REAL, DBLE, HUGE
      realX = REAL(HUGE(intX))
      dbleX = DBLE(HUGE(intX))
      WRITE(*,992) HUGE(intX), realX, INT(realX,4), REAL(INT(realX,4))
      WRITE(*,992) HUGE(intX), dbleX, INT(dbleX,4), DBLE(INT(dbleX,4))
      
      realX = REAL(HUGE(intX)) * 2
      dbleX = DBLE(HUGE(intX)) * 2
      WRITE(*,992) HUGE(intX), realX, INT(realX,4), REAL(INT(realX,4))
      WRITE(*,992) HUGE(intX), dbleX, INT(dbleX,4), DBLE(INT(dbleX,4))

      realX = REAL(HUGE(X))
      dbleX = DBLE(HUGE(X))
      WRITE(*,993) HUGE(X), realX, INT(realX,8), REAL(INT(realX,8))
      WRITE(*,993) HUGE(X), dbleX, INT(dbleX,8), DBLE(INT(dbleX,8))

      realX = REAL(HUGE(X)) * 2
      dbleX = DBLE(HUGE(X)) * 2
      WRITE(*,993) HUGE(X), realX, INT(realX,8), REAL(INT(realX,8))
      WRITE(*,993) HUGE(X), dbleX, INT(dbleX,8), DBLE(INT(dbleX,8))

  992 FORMAT( I11, "; ", F13.1, "; ", I11, "; ", F13.1 )
  993 FORMAT( I20, "; ", F22.1, "; ", I20, "; ", F22.1 )

      END PROGRAM test_convert

Output (compiled with gfortran -fdefault-integer-8 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)):

 2147483647;  2147483648.0; -2147483648; -2147483648.0
 2147483647;  2147483647.0;  2147483647;  2147483647.0
 2147483647;  4294967296.0; -2147483648; -2147483648.0
 2147483647;  4294967294.0; -2147483648; -2147483648.0
 9223372036854775807;  9223372036854775808.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807;  9223372036854775808.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807; 18446744073709551616.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807; 18446744073709551616.0; -9223372036854775808; -9223372036854775808.0

@weslleyspereira
Copy link
Collaborator Author

What about forcing round up only if INT( REAL( LWORK ) ) .LT. LWORK? See 51eee01. This new implementation should be optimal: It guarantees that *ROUNDUP_LWORK returns the smallest floating-point number that:

  1. has zero decimal part.
  2. is greater than or equal to LWORK.

This test can be less performant than testing REAL( LWORK ) .GE. REAL(RADIX(0.0E+0))**DIGITS(0.0E+0). I say that because we have one extra operation INT( ... ) when compared to cea9568. But I prefer 51eee01 since the output should be optimal.

@mgates3
Copy link
Contributor

mgates3 commented Aug 9, 2021

Regarding MAGMA, after precision-generation, magma_sauxiliary.cpp uses lapackf77_slamch("eps"), but it does the computation in double. If it computed 1.0f + slamch("eps") in single, it would get 1.0f, which isn't useful. Note real_Double_t is double.

> less magma/control/magma_sauxiliary.cpp 
...
float magma_smake_lwork( magma_int_t lwork )
{
    ...
    real_Double_t one_eps = 1. + lapackf77_slamch("Epsilon");
    return MAGMA_S_MAKE( float(lwork*one_eps), 0 );
    ...
}

This works up to lwork around 2^53 (33,554,432 GiB in single). Using 1 + epsilon, for epsilon = 1.19e-07 as defined by Fortran, C, C++, Matlab, etc. works for even larger lwork (I checked up to 2^61). Note slamch("eps") returns unit roundoff, u = 5.96e-08, which is epsilon/2.

@weslleyspereira
Copy link
Collaborator Author

Thanks for the clarification, @mgates3 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants