-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of precision in all single precision workspace queries, implicit and explicit #600
Comments
I guess it must have made sense at the time (to return the size via the work array pointer), but I wonder what keeps us from turning the size specifier into an in/out variable and passing back the exact value there as well? A "modern" caller would then check that first and resort to the work array member only if lwork was still -1, an "old" caller would notice no change. |
Also probably the required LWORK back then was physically way too big to hit this problem and ilp64 made it obvious. I'm dreaming the days where NB value is derived at runtime instead of two-call scheme. |
Here are two discussion threads related to this: This is especially an issue for algorithm that require an O(n^2) workspace. For algorithm that requires an O(n*nb) workspace, this is less of an issue. Yes, this is a design flaw. @martin-frbg: how would your proposed change for the C _work interface? We have LWORK as INPUT only there. Changing LWORK to INPUT/OUTPUT there is a major change. Do you have an idea to solve this issue? See: lapack/LAPACKE/src/lapacke_dgeqrf_work.c Line 35 in aa631b4
I was thinking that we could also create some workspace allocation subroutines such as LAPACK_dgeqrf__workspace_query( ) and this would return the workspace needed. |
Welp, my cunning plan does not really work when sober... |
I can see 2 different flaws here:
@martin-frbg's idea is a good solution to (1). New Fortran code could use the return value of LWORK instead of WORK(1). We can try modifying the code with some (semi-)automatic procedure for replacement. In
by
Adding |
Changing lwork to be IN/OUT would be a nice solution originally, but it is not backwards compatible. The application would then have to know whether the LAPACK version was <= 3.10 (say) or > 3.10 to know where to get the lwork. Worse, there are instances where applications pass in a const value — expecting it to remain const — so LAPACK changing its behavior to overwrite that value would be very detrimental (UB). For instance, in MAGMA:
The solution I proposed some years ago and implemented in MAGMA is simply in sgesdd, etc., to round the lwork returned in work[1] up a little bit, so the returned value is always >= the intended value. See https://bitbucket.org/icl/magma/src/master/control/magma_zauxiliary.cpp and use in https://bitbucket.org/icl/magma/src/master/src/zgesdd.cpp. (See a release for the generated single-precision version.) Basically replace
with
where the function After more testing, I found for lwork > 2^54, it needs to use C/C++/Fortran definition of epsilon = 1.19e-07 (aka ulp), rather than the LAPACK definition slamch("eps") = 5.96e-08 (aka unit roundoff, u). If using ulp, it looks like the calculation can be done in single. |
I think this issue confuses 32-bit overflow with float-integer conversion. The conversion from 32-bit float to 32-bit integer can be at most off by 256=2^8 because 32-bit floating point can only be off 8bits (23-bit mantissa vs 31 bits in integer). In practice, I couldn't get the difference to be larger than 63. This can become an issue in ILP64 mode when 23-bit mantissa would not be enough to store 63 bits of integer. Adding epsilon could potential overestimate allocation by over a Tera-byte. |
I think there is no confusion but two separate (but related) issues - one brought up in the original message, where ILP64 gets truncated to float (and rounding up the result - still as a float - should get around that truncation). I am not sure if I read the |
Yes, I agree there are multiple problems. #605 tries to solve the underestimation of LWORK in the conversion INTEGER -> FLOAT (DOUBLE PRECISION).
Just to make it clearer for the discussion. If we use a float (23-bit mantissa) to store a big int64 N = 2^60-1, we obtain:
where (2^37+1) is the overestimation. I don't see a big problem here because, although (2^60-1) floating-point positions represent occupies a large memory space, this is nothing compared to the total amount, i.e., 2^60*(1+2^-23). The increment in the workspace is eps*N at most, which means there is a relative overestimation of eps/(1+eps). We could try to avoid this overestimation in the case where N can be represented exactly. We would need a specific check for that. |
I updated #605.
using LAPACK 3.10. Mind that DLAMCH('E') can be either EPSILON(1.0D0) or EPSILON(1.0D0) * 0.5. |
We just merged #605 from Weslley. This is a start. This fixes the rounding during the INTEGER to SINGLE/DOUBLE PRECISION cast. The workspace computation still needs more work. Since the computation is intrinsically done using INTEGER, the computation will overflow independently of rounding (during the INTEGER to SINGLE/DOUBLE PRECISION). So next step is to compute the workspace directly in SINGLE/DOUBLE PRECISION. #605 is a start. |
Hello everybody,
recently I stumbled across an error when using ssyevd via the lapacke interface.
It points to a problem in the lapack interface in general. It goes like this:
According the the lapack standard interface, many routines like ssyevd you have to call twice:
Once for asking the routine how much scratch memory it needs for a certain matrix size,
and only then call the routine in earnest with the required scratch memory sections
as parameters.
If you look closely at this first call, which should return the required memory size, e.g.
for a routine like ssyevd, you see that even according to the lapack documentation,
the memory requirement is passed back via a pointer to a float value.
So when calculating the memory it goes through a series of values:
It is int64 is in case of an ilp64 interface, otherwise it would be int32.
So in essence, we have an intermediary shortening of the memory value from
63 bit to 24 bit !!!
(or more precise, to 24bit between the outermost set bits in IEEE 754 float representation)
Even in the case of 32bit integers, you have a shortening of 31 bits to 24 bits.
So, if you would call the memory requirement calculation as in the past 'by hand'
you might have chance to see where it goes wrong, but still you cannot prevent the shorting
to a float value. If you use the automatic memory allocation via the modern lapacke interface
you don't even have an idea what could be wrong, as the routine advertises to take care
of all memory management by itself!
This happens in all single precision routines (s/c) in lapack, which calculate
the memory requirement as an intermediary step.
Switching to double precison would then use a double precision reference instead,
increasing the intermediary value to 53 bits instead, which is still not close to the 64 bits
one would assume with a 64bit interface.
Workaround, four possible ways:
Other people have stumbled upon this, but didn't follow it through to the real cause, e.g.
openBLAS build with int64 support fails on valid input for ssyevd
One has to stress, that at least with the two-call method, this is not a bug, but a design flaw.
In case of the lapacke automatic memory allocation, it has to be considered a fairly severe bug.
Regards,
oxydicer
The text was updated successfully, but these errors were encountered: