Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

weslleyspereira · 2021-07-26T22:41:32Z

[edited]

Description

There is a known bug in LAPACK related to workspaces smaller than the value computed by the routine. This is because the routines use a floating-point variable to return the integer workspace size. See #600. The intent here is to mitigate the problem of overestimating the workspace when doing the conversion INTEGER -> REAL (or DOUBLE PRECISION) highlighted in #600. I do not solve the problem of a possible overflow in LWORK.

This PR applies the same strategy from the MAGMA package to solve the issue (See #600 (comment) from @mgates3).

Checklist

The documentation has been updated.
Fix all *GESDD routines.
Fix all the routines in LAPACK.

codecov · 2021-07-26T22:50:27Z

Codecov Report

Merging #605 (fe1e922) into master (aa631b4) will not change coverage.
The diff coverage is 0.00%.

❗ Current head fe1e922 differs from pull request most recent head c7400e5. Consider uploading reports for the commit c7400e5 to get more accurate results

@@           Coverage Diff           @@
##           master     #605   +/-   ##
=======================================
  Coverage    0.00%    0.00%           
=======================================
  Files        1894     1896    +2     
  Lines      199133   199143   +10     
=======================================
- Misses     199133   199143   +10

Impacted Files	Coverage Δ
INSTALL/droundup_lwork.f	`0.00% <0.00%> (ø)`
INSTALL/sroundup_lwork.f	`0.00% <0.00%> (ø)`
SRC/cgesdd.f	`0.00% <0.00%> (ø)`
SRC/dgesdd.f	`0.00% <0.00%> (ø)`
SRC/sgesdd.f	`0.00% <0.00%> (ø)`
SRC/zgesdd.f	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa631b4...c7400e5. Read the comment docs.

luszczek · 2021-07-27T05:30:04Z

Please see my comment to #600 If integer LWORK overflow then adding EPS won't help.

weslleyspereira · 2021-07-27T11:50:25Z

Thanks for the comments @luszczek. The intent here is to mitigate the problem of overestimating the workspace when doing the conversion INTEGER -> REAL (or DOUBLE PRECISION) highlighted in #600. I do not solve the problem of a possible overflow in LWORK. I will update the description. [updated]

…PS)) Test results: SROUNDUP_LWORK(0) = 0.00000000 SROUNDUP_LWORK(1) = 1.00000000 SROUNDUP_LWORK(317) = 317.000000 DROUNDUP_LWORK(0) = 0.0000000000000000 DROUNDUP_LWORK(1) = 1.0000000000000000 DROUNDUP_LWORK(317) = 317.00000000000000 Tests with X = 16777216 = 2**24 X-2 = 16777214; SROUNDUP_LWORK(X-2) = 16777214.0 X-1 = 16777215; SROUNDUP_LWORK(X-1) = 16777215.0 X = 16777216; SROUNDUP_LWORK(X ) = 16777218.0 X+1 = 16777217; SROUNDUP_LWORK(X+1) = 16777218.0 X+2 = 16777218; SROUNDUP_LWORK(X+2) = 16777220.0 Tests with X = 9007199254740992 = 2**53 X-2 = 9007199254740990; SROUNDUP_LWORK(X-2) = 9007200328482816.0 X-1 = 9007199254740991; SROUNDUP_LWORK(X-1) = 9007200328482816.0 X = 9007199254740992; SROUNDUP_LWORK(X ) = 9007200328482816.0 X+1 = 9007199254740993; SROUNDUP_LWORK(X+1) = 9007200328482816.0 X+2 = 9007199254740994; SROUNDUP_LWORK(X+2) = 9007200328482816.0 Tests with X = 9007199254740992 = 2**53 X-2 = 9007199254740990; DROUNDUP_LWORK(X-2) = 9007199254740990.0 X-1 = 9007199254740991; DROUNDUP_LWORK(X-1) = 9007199254740991.0 X = 9007199254740992; DROUNDUP_LWORK(X ) = 9007199254740994.0 X+1 = 9007199254740993; DROUNDUP_LWORK(X+1) = 9007199254740994.0 X+2 = 9007199254740994; DROUNDUP_LWORK(X+2) = 9007199254740996.0

weslleyspereira · 2021-07-27T23:29:02Z

Two commits with updates in the ROUNDUP_LWORK functions:

ROUNDUP_LWORK functions now always return an integer
Fix bug: ROUNDUP_LWORK must use (1+EPSILON(0)) instead of (1+LAMCH(EPS))

With (1), the roundup routines always return a floating-point number that is an integer. I do that by checking whether

SROUNDUP_LWORK .GE. REAL(RADIX(0.0E+0))**DIGITS(0.0E+0)

The latter is valid for the single-precision case. I used a similar test for the double-precision case.

The code in MAGMA uses DLAMCH('E'). I am not sure this is the correct option. I ran a few tests (below) and saw that one must use EPSILON(0) (or 2*LAMCH(EPS)) to get the desired rounding up.

Some tests:

I compiled the LAPACK with -fdefault-integer-8 and used this gist to obtain:

 SROUNDUP_LWORK(0)        =    0.00000000    
 SROUNDUP_LWORK(1)        =    1.00000000    
 SROUNDUP_LWORK(317)      =    317.000000    
 DROUNDUP_LWORK(0)        =    0.0000000000000000     
 DROUNDUP_LWORK(1)        =    1.0000000000000000     
 DROUNDUP_LWORK(317)      =    317.00000000000000     
 
 Tests with X = 16777216 = 2**24
X-2 = 16777214; SROUNDUP_LWORK(X-2) = 16777214.0
X-1 = 16777215; SROUNDUP_LWORK(X-1) = 16777215.0
X   = 16777216; SROUNDUP_LWORK(X  ) = 16777218.0
X+1 = 16777217; SROUNDUP_LWORK(X+1) = 16777218.0
X+2 = 16777218; SROUNDUP_LWORK(X+2) = 16777220.0
 
 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; SROUNDUP_LWORK(X-2) = 9007200328482816.0
X-1 = 9007199254740991; SROUNDUP_LWORK(X-1) = 9007200328482816.0
X   = 9007199254740992; SROUNDUP_LWORK(X  ) = 9007200328482816.0
X+1 = 9007199254740993; SROUNDUP_LWORK(X+1) = 9007200328482816.0
X+2 = 9007199254740994; SROUNDUP_LWORK(X+2) = 9007200328482816.0
 
 Tests with X = 9007199254740992 = 2**53
X-2 = 9007199254740990; DROUNDUP_LWORK(X-2) = 9007199254740990.0
X-1 = 9007199254740991; DROUNDUP_LWORK(X-1) = 9007199254740991.0
X   = 9007199254740992; DROUNDUP_LWORK(X  ) = 9007199254740994.0
X+1 = 9007199254740993; DROUNDUP_LWORK(X+1) = 9007199254740994.0
X+2 = 9007199254740994; DROUNDUP_LWORK(X+2) = 9007199254740996.0

just to be sure the output has only 0 decimal parts. The gist also applies the CEILING and FLOOR to the result t guarantee there is no decimal part.

weslleyspereira · 2021-08-03T01:54:22Z

Verifying the round down/up in the conversion from INTEGER(8) to REAL and DOUBLE PRECISION:

Tests with X = 16777216 = 2**24

I	X + I	REAL(X+1)	DBLE(X+1)
-2	16777214	16777214.0	16777214.0
-1	16777215	16777215.0	16777215.0
0	16777216	16777216.0	16777216.0
1	16777217	16777216.0	16777217.0
2	16777218	16777218.0	16777218.0
3	16777219	16777220.0	16777219.0
4	16777220	16777220.0	16777220.0
5	16777221	16777220.0	16777221.0
6	16777222	16777222.0	16777222.0
7	16777223	16777224.0	16777223.0
8	16777224	16777224.0	16777224.0
9	16777225	16777224.0	16777225.0
10	16777226	16777226.0	16777226.0
11	16777227	16777228.0	16777227.0
12	16777228	16777228.0	16777228.0
13	16777229	16777228.0	16777229.0
14	16777230	16777230.0	16777230.0
15	16777231	16777232.0	16777231.0
16	16777232	16777232.0	16777232.0
17	16777233	16777232.0	16777233.0
18	16777234	16777234.0	16777234.0
19	16777235	16777236.0	16777235.0
20	16777236	16777236.0	16777236.0

Tests with X = 9007199254740992 = 2**53

I	X + I	REAL(X+1)	DBLE(X+1)
-2	9007199254740990	9007199254740992.0	9007199254740990.0
-1	9007199254740991	9007199254740992.0	9007199254740991.0
0	9007199254740992	9007199254740992.0	9007199254740992.0
1	9007199254740993	9007199254740992.0	9007199254740992.0
2	9007199254740994	9007199254740992.0	9007199254740994.0
3	9007199254740995	9007199254740992.0	9007199254740996.0
4	9007199254740996	9007199254740992.0	9007199254740996.0
5	9007199254740997	9007199254740992.0	9007199254740996.0
6	9007199254740998	9007199254740992.0	9007199254740998.0
7	9007199254740999	9007199254740992.0	9007199254741000.0
8	9007199254741000	9007199254740992.0	9007199254741000.0
9	9007199254741001	9007199254740992.0	9007199254741000.0
10	9007199254741002	9007199254740992.0	9007199254741002.0
11	9007199254741003	9007199254740992.0	9007199254741004.0
12	9007199254741004	9007199254740992.0	9007199254741004.0
13	9007199254741005	9007199254740992.0	9007199254741004.0
14	9007199254741006	9007199254740992.0	9007199254741006.0
15	9007199254741007	9007199254740992.0	9007199254741008.0
16	9007199254741008	9007199254740992.0	9007199254741008.0
17	9007199254741009	9007199254740992.0	9007199254741008.0
18	9007199254741010	9007199254740992.0	9007199254741010.0
19	9007199254741011	9007199254740992.0	9007199254741012.0
20	9007199254741012	9007199254740992.0	9007199254741012.0

weslleyspereira · 2021-08-03T02:54:53Z

Another related test shows how REAL(INT( ... )) and DBLE(INT( ... )) behave for huge floating-point numbers. Note in the results that INT(REAL(HUGE( X ))) and INT(REAL(HUGE( X )) * 2) both return the lowest negative integer of the type of X.

      PROGRAM test_convert
      IMPLICIT NONE

      INTEGER(4)           intX
      INTEGER(8)           X
      REAL                 realX
      DOUBLE PRECISION     dbleX

      INTRINSIC            REAL, DBLE, HUGE
      realX = REAL(HUGE(intX))
      dbleX = DBLE(HUGE(intX))
      WRITE(*,992) HUGE(intX), realX, INT(realX,4), REAL(INT(realX,4))
      WRITE(*,992) HUGE(intX), dbleX, INT(dbleX,4), DBLE(INT(dbleX,4))
      
      realX = REAL(HUGE(intX)) * 2
      dbleX = DBLE(HUGE(intX)) * 2
      WRITE(*,992) HUGE(intX), realX, INT(realX,4), REAL(INT(realX,4))
      WRITE(*,992) HUGE(intX), dbleX, INT(dbleX,4), DBLE(INT(dbleX,4))

      realX = REAL(HUGE(X))
      dbleX = DBLE(HUGE(X))
      WRITE(*,993) HUGE(X), realX, INT(realX,8), REAL(INT(realX,8))
      WRITE(*,993) HUGE(X), dbleX, INT(dbleX,8), DBLE(INT(dbleX,8))

      realX = REAL(HUGE(X)) * 2
      dbleX = DBLE(HUGE(X)) * 2
      WRITE(*,993) HUGE(X), realX, INT(realX,8), REAL(INT(realX,8))
      WRITE(*,993) HUGE(X), dbleX, INT(dbleX,8), DBLE(INT(dbleX,8))

  992 FORMAT( I11, "; ", F13.1, "; ", I11, "; ", F13.1 )
  993 FORMAT( I20, "; ", F22.1, "; ", I20, "; ", F22.1 )

      END PROGRAM test_convert

Output (compiled with gfortran -fdefault-integer-8 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)):

 2147483647;  2147483648.0; -2147483648; -2147483648.0
 2147483647;  2147483647.0;  2147483647;  2147483647.0
 2147483647;  4294967296.0; -2147483648; -2147483648.0
 2147483647;  4294967294.0; -2147483648; -2147483648.0
 9223372036854775807;  9223372036854775808.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807;  9223372036854775808.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807; 18446744073709551616.0; -9223372036854775808; -9223372036854775808.0
 9223372036854775807; 18446744073709551616.0; -9223372036854775808; -9223372036854775808.0

weslleyspereira · 2021-08-04T21:36:56Z

What about forcing round up only if INT( REAL( LWORK ) ) .LT. LWORK? See 51eee01. This new implementation should be optimal: It guarantees that *ROUNDUP_LWORK returns the smallest floating-point number that:

has zero decimal part.
is greater than or equal to LWORK.

This test can be less performant than testing REAL( LWORK ) .GE. REAL(RADIX(0.0E+0))**DIGITS(0.0E+0). I say that because we have one extra operation INT( ... ) when compared to cea9568. But I prefer 51eee01 since the output should be optimal.

…s necessary

mgates3 · 2021-08-09T16:38:59Z

Regarding MAGMA, after precision-generation, magma_sauxiliary.cpp uses lapackf77_slamch("eps"), but it does the computation in double. If it computed 1.0f + slamch("eps") in single, it would get 1.0f, which isn't useful. Note real_Double_t is double.

> less magma/control/magma_sauxiliary.cpp 
...
float magma_smake_lwork( magma_int_t lwork )
{
    ...
    real_Double_t one_eps = 1. + lapackf77_slamch("Epsilon");
    return MAGMA_S_MAKE( float(lwork*one_eps), 0 );
    ...
}

This works up to lwork around 2^53 (33,554,432 GiB in single). Using 1 + epsilon, for epsilon = 1.19e-07 as defined by Fortran, C, C++, Matlab, etc. works for even larger lwork (I checked up to 2^61). Note slamch("eps") returns unit roundoff, u = 5.96e-08, which is epsilon/2.

weslleyspereira · 2021-08-12T03:18:21Z

Thanks for the clarification, @mgates3 !

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation

a620672

weslleyspereira mentioned this pull request Jul 27, 2021

Loss of precision in all single precision workspace queries, implicit and explicit #600

Open

weslleyspereira added 2 commits July 27, 2021 16:56

RoundUp functions now always returns an integer

f12515a

weslleyspereira force-pushed the try-workspace-query branch from 419261a to cea9568 Compare July 27, 2021 23:06

weslleyspereira force-pushed the try-workspace-query branch from 51eee01 to 913ca15 Compare August 4, 2021 21:37

Improve documentation of the new routines. Force roundup only if it i…

c7400e5

…s necessary

weslleyspereira force-pushed the try-workspace-query branch from 913ca15 to c7400e5 Compare August 4, 2021 21:43

langou approved these changes Aug 30, 2021

View reviewed changes

langou merged commit 7a4c63b into Reference-LAPACK:master Aug 30, 2021

WarrenWeckesser mentioned this pull request Sep 9, 2022

BUG: linalg: Fix the XSLOW test test_sgesdd_lwork_bug_workaround() scipy/scipy#16965

Merged

julielangou added this to the LAPACK 3.10.1 milestone Nov 12, 2022

kleineLi mentioned this pull request Aug 31, 2023

Apply ROUNDUP_LWORK function in lapack #904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

weslleyspereira commented Jul 26, 2021 •

edited

Loading

codecov bot commented Jul 26, 2021 •

edited

Loading

luszczek commented Jul 27, 2021

weslleyspereira commented Jul 27, 2021 •

edited

Loading

weslleyspereira commented Jul 27, 2021 •

edited

Loading

weslleyspereira commented Aug 3, 2021

weslleyspereira commented Aug 3, 2021

weslleyspereira commented Aug 4, 2021

mgates3 commented Aug 9, 2021

weslleyspereira commented Aug 12, 2021

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

Add (s,d)ROUNDUP_LWORK functions to fix the workspace computation #605

Conversation

weslleyspereira commented Jul 26, 2021 • edited Loading

codecov bot commented Jul 26, 2021 • edited Loading

Codecov Report

luszczek commented Jul 27, 2021

weslleyspereira commented Jul 27, 2021 • edited Loading

weslleyspereira commented Jul 27, 2021 • edited Loading

weslleyspereira commented Aug 3, 2021

weslleyspereira commented Aug 3, 2021

weslleyspereira commented Aug 4, 2021

mgates3 commented Aug 9, 2021

weslleyspereira commented Aug 12, 2021

weslleyspereira commented Jul 26, 2021 •

edited

Loading

codecov bot commented Jul 26, 2021 •

edited

Loading

weslleyspereira commented Jul 27, 2021 •

edited

Loading

weslleyspereira commented Jul 27, 2021 •

edited

Loading