Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid constant-folding when converting from floating-point to integer #456

Closed
peddie opened this issue Feb 22, 2019 · 84 comments
Closed
Assignees
Labels

Comments

@peddie
Copy link
Collaborator

peddie commented Feb 22, 2019

I'm using the latest commit 2066aba0074a11f7409007b0417f33e40efb753b.

If I write a program that converts from floating-point to integer type:

toWord8 :: SDouble -> SWord8
toWord8 = fromSDouble sRNE

and apply it directly to a negative constant whose rounded value is outside the range of the integral type in GHCI, I see (for example):

>>> toWord8 (-257)
255 :: SWord8

The generated C code disagrees with the Haskell interpreter if the constant is avoided:

compileToC Nothing "conversion" $ do
  doubleIn <- cgInput "input" :: SBVCodeGen SDouble
  cgReturn $ toWord8 doubleIn

results in

SWord8 conversion(const SDouble input)
{
  const SDouble s0 = input;
  const SBool   s1 = isnan(s0);
  const SBool   s2 = isinf(s0);
  const SBool   s3 = s1 || s2;
  const SBool   s4 = !s3;
  const SWord8  s6 = (!isnan(s0) && signbit(s0)) ? (- ((SWord8) fabs(s0))) : ((SWord8) s0);
  const SWord8  s8 = s4 ? s6 : 0;

  return s8;
}

which when called thus

const double conversion_in = -257;
printf("conversion(%lf) = %"PRIu8"\n", conversion_in, conversion(conversion_in));

yields

conversion(-257.000000) = 1

However, if we let constant-folding see what we're about, we will get a different answer:

compileToC Nothing "conversion" . cgReturn $ toWord8 (-257)

results in

SWord8 conversion()
{
  return 255;
}

If I ask Z3 to check the interpreter's evaluation, I seem to get disagreement:

prove $ do
  doubleIn <- sDouble "input"
  constrain $ doubleIn .== (-257)
  let conv = observe "conv" $ fromSDouble sRNE doubleIn :: SWord8
  pure $ conv .== 255

Falsifiable. Counter-example:
  conv  =      0 :: Word8
  input = -257.0 :: Double

But I can also constrain it to be equal to 255, and Z3 is just as happy.

sat $ do
  doubleIn <- sDouble "input"
  constrain $ doubleIn .== (-257)
  let conv = observe "conv" $ fromSDouble sRNE doubleIn :: SWord8
  pure $ conv .== 255

Satisfiable. Model:
  conv  =    255 :: Word8
  input = -257.0 :: Double

What's going on here?

LeventErkok added a commit that referenced this issue Feb 23, 2019
@LeventErkok
Copy link
Owner

LeventErkok commented Feb 23, 2019

There are really two bugs here. One of them is conversion to C; and I believe I just fixed that. Please give it a shot and let me know if it's still not working.

The second is trickier, unfortunately. The SMTLib conversion functions are essentially uninterpreted when the value is out of range. This is directly from the spec:

All fp.to_* functions are unspecified for NaN and infinity input values.
In addition, fp.to_ubv and fp.to_sbv are unspecified for finite number inputs
that are out of range (which includes all negative numbers for fp.to_ubv).

But that's not what Haskell (or C) does. So, when you use the prover to compute the values, you're essentially getting a junk value when the input is out-of-range.

I need to think a bit about how to address that; it's not quite trivial because checking "out-of-range" is a tricky thing to do; but should be possible. I'll deal with it sometime this weekend.

Thanks for the report!

@LeventErkok LeventErkok self-assigned this Feb 23, 2019
@LeventErkok LeventErkok added this to the 8.1 milestone Feb 23, 2019
@peddie
Copy link
Collaborator Author

peddie commented Feb 27, 2019

Thanks for the fast turnaround as always. I've got your fix for the function described above, which works. Your explanation of SMTLIB's behavior makes sense.

Unfortunately I seem to have another, similar situation where the result I get in GHCI doesn't match the result I get in a generated C program. It's undefined behavior what happens in C in this particular situation, but SBV doesn't match any version of GCC or clang I could test, and these compilers all agree.

Define a conversion function:

> let toInt16 = fromSDouble sRNE . fpRoundToIntegral sRNE :: SDouble -> SInt16

Try converting an out-of-range value:

>  toInt16 4294967295
-1 :: SInt16

Now generate the corresponding C code for toInt16:

> compileToC Nothing "test" $ do { doubleIn <- cgInput "input"; cgReturn $ toInt16 doubleIn }
...
SInt16 test(const SDouble input)
{
  const SDouble s0 = input;
  const SDouble s2 = rint(s0);
  const SBool   s3 = isnan(s2);
  const SBool   s4 = isinf(s2);
  const SBool   s5 = s3 || s4;
  const SBool   s6 = !s5;
  const SInt16  s7 = (SInt16) s2;
  const SInt16  s9 = s6 ? s7 : 0x0000;

  return s9;
}

When I call this function from main()

  const double test_in = 4294967295;
  printf("test(%.16e) : %"PRIi16"\n", test_in, test(test_in));

I see

test(4.2949672950000000e+09) : 0

Is this a bug, or is this kind of thing undefined too many places (C99, SMTLIB), and I shouldn't expect things to line up?

@LeventErkok
Copy link
Owner

I'm stumped on this too. Let's forget about SBV for a second, and compare Haskell to C:

Prelude Data.Int> round  (4294967295::Double) :: Int16
-1

But:

#include <stdio.h>
#include <inttypes.h>

int main(void) {

    double d = 4294967295;
    int16_t r = (int16_t) d;

    printf("%"PRId16"\n", r);
    return 0;
}

which produces:

$ make a
cc     a.c   -o a
$ ./a
0

We need to understand why Haskell and C are disagreeing here first. Do you know why?

@LeventErkok
Copy link
Owner

In Section 6.3.1.4 of the C99 spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), it says:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

I couldn't find anything clear on the Haskell side on how this should behave.

I think there's a general consensus that for "undefined" things, C and Haskell are undefined in the same way. (Whatever that means!) But I think you found a case where their undefined behavior diverges. Perhaps ask about this in the Haskell mailing list?

@peddie
Copy link
Collaborator Author

peddie commented Feb 27, 2019

Sure, I'll ask on the Haskell-cafe list whether anyone knows why this behavior is different.

By the way, if I enable optimizations with -O1 and use gcc version 7 or 8, I get 32767 out of that rounding operation; clang's answer doesn't change.

@LeventErkok
Copy link
Owner

Yeah, I think this is one of those cases it's impossible to align; obviously don't want to generate different code for different optimization levels.

Does GHC's answer change with different optimization levels?

@peddie
Copy link
Collaborator Author

peddie commented Feb 27, 2019

I haven't found any optimization flag settings that can change GHC's output in this case. I agree that it's not really possible to get everything to line up using the built-in operations.

What I've already begun doing for my program is to ensure that rounding will always result in the nearest valid value (defining 32767 as the "right" answer in this case) and generate C code that explicitly checks the bounds and matches the Haskell behavior. Maybe it would also suit SBV to have some kind of matching behavior like this across SMTLIB, Haskell and C available out of the box or by default?

@LeventErkok
Copy link
Owner

I've been thinking about it, but haven't found a simple enough solution to align Haskell, SMTLib, C yet. (I'm most worried about the Haskell<->SMTLib correspondence, but if we get the C equivalence, that'd be nice as well.)

There's some discussion here on how to do this: https://stackoverflow.com/questions/526070/handling-overflow-when-casting-doubles-to-integers-in-c

But it's not obvious how to solve the problem "correctly" as the integer-bounds are not necessarily exactly always representable. If you've found a good way to solve the problem that works for all conversions Float/Double/Int8-16-32-64/Word8-16-32-64, I'd love to hear it!

@LeventErkok
Copy link
Owner

More info here: https://wiki.sei.cmu.edu/confluence/display/c/FLP34-C.+Ensure+that+floating-point+conversions+are+within+range+of+the+new+type

I'm also quite hesitant to spit out a lot of code that complicates both verification and performs a ton of unnecessary checks for the "happy path" execution.

@LeventErkok
Copy link
Owner

LeventErkok commented Feb 28, 2019

Thought some more about this, and it is really complicated!

The semantics is really the following:

  • Take the floating point number
  • Round it to the next integral value in that floating-point number's original domain, given the rounding-mode. (That is, find the closest "integral" value that is representable in that domain.)
  • If the result is NaN or +/-Inf, then result is 0
  • Otherwise, do a rem operation

What this means is that depending on the rounding mode, you can get a different result for the same conversion. (There's, of course, no way to see this in Haskell, since there's only one rounding mode.)

So, even if we're willing to check the "out-of-bounds," that means we have to first do a rounding to and infinitely precise real-number first using the rounding mode. And that's where it gets complicated: There's no function in SMTLib to do that, nor in Haskell or C, that I can rely on.

The way hardware implements this sort of stuff is that they have an internal notion of "working-precision" number, which is conceptually infinitely precise. (In reality, of course, they only need to keep track of what's known as guard/sticky/round bits; I found some notes here if you are interested: http://pages.cs.wisc.edu/~markhill/cs354/Fall2008/notes/flpt.apprec.html)

Long story short, this looks extremely tricky to do correctly across Haskell/SMTLib/C. I'm open for ideas, but I'm inclined towards documenting this a bit more clearly in the code and saying "undefined," that is neither the SMTLib conversion nor the C-translation is guaranteed to match what Haskell does if the inputs are out-of-bounds for the target type and the given rounding mode.

What do you think?

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

Summary: I understand the price of blowing out the size of SMTLIB expressions just for this hopefully-unlikely case, and I think it makes sense for you to just say "no two parts of this system agree, so don't do this."

Long version:

Thanks for the pointers. I have come across some of this before, but I didn't even know about isgreater() and friends.

I think what I am planning to do is like this:

  • If the input value is NaN or +/-Inf, handle the edge case as desired (I might prefer to round Inf to the appropriate bound of the integer type)
  • Find the largest (smallest) value in the output domain.
  • Given the rounding mode, find the largest (smallest) number in the input domain that rounds to the next largest (smallest) value in the output domain (e.g. what is the largest float that rounds to INT64_MAX - 1?). This need only be done once offline for any given tuple of (input type, output type, rounding mode), I think.
  • If the input value is greater (less) than that largest (smallest) floating-point equivalent of the largest(smallest)-within-bounds output domain value, return the largest (smallest) output domain value.
  • Otherwise do a normal cast which is well-defined.

It's not yet clear to me that for that limited set of cases (Int|Word)(8|16|32|64), one can't find the correct bounding values, although I can see why care is required. I'm still hoping to implement this shortly and will let you know (unless you can spot a problem already). I wonder whether I'll be able to prove my solution correct with Z3 or MathSAT.

It sounds like our goals are diverging here, because my main application at this point is the C-code generation (most of our terms are far too large and contain too many 64-bit values for any SMT solver I've found to return results in reasonable time), and I'm happy to modify my Haskell DSL's behavior to get whatever behavior I want from the final C program; meanwhile SBV is of course focused on SMT interaction. In other words, I don't think you should worry too much about solving the problems I'm encountering.

@LeventErkok
Copy link
Owner

LeventErkok commented Feb 28, 2019

Ah, you make an excellent point. We only need to compute the "boundaries" once. Too bad we cannot do this in Haskell, but that doesn't mean these cannot be computed elsewhere.

I'm perfectly happy to adopt this solution directly in SBV! To summarize, I need the following table:

import Data.SBV

-- For each kind/rounding-mode combination; return the min/max convertable
-- values for both float and doubles. For each combo we have 3 values: The minumum
-- representable, the maximum representable, and the default value to return when
-- the input is out-of-range. This should follow what "Haskell" does.
ranges :: [((Kind, RoundingMode) , ((Float, Float, Float) , (Double, Double, Double)))]
ranges = [ ((KBounded False  8, RoundNearestTiesToEven), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False  8, RoundNearestTiesToAway), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False  8, RoundTowardPositive   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False  8, RoundTowardNegative   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False  8, RoundTowardZero       ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 16, RoundNearestTiesToEven), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 16, RoundNearestTiesToAway), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 16, RoundTowardPositive   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 16, RoundTowardNegative   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 16, RoundTowardZero       ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 32, RoundNearestTiesToEven), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 32, RoundNearestTiesToAway), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 32, RoundTowardPositive   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 32, RoundTowardNegative   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 32, RoundTowardZero       ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 64, RoundNearestTiesToEven), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 64, RoundNearestTiesToAway), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 64, RoundTowardPositive   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 64, RoundTowardNegative   ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded False 64, RoundTowardZero       ), ((0,   tbd, tbd), (0,   tbd, tbd)))
         , ((KBounded True   8, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True   8, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True   8, RoundTowardPositive   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True   8, RoundTowardNegative   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True   8, RoundTowardZero       ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  16, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  16, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  16, RoundTowardPositive   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  16, RoundTowardNegative   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  16, RoundTowardZero       ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  32, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  32, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  32, RoundTowardPositive   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  32, RoundTowardNegative   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  32, RoundTowardZero       ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  64, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  64, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  64, RoundTowardPositive   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  64, RoundTowardNegative   ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         , ((KBounded True  64, RoundTowardZero       ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
         ]

Once you fill on all the tbd entries, I think we can code this up and it wouldn't be too terrible in practice. I already filled in the easy ones for you :-)

I'm not sure what it would take to "validate" these numbers. But let's see how you compute them first (because I'm not sure how to go about it), we can think about that later.

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

It's by no means clever, but my plan is to write a loop in C that starts by casting the extreme-minus-one value to double, then calls nextafter() in a loop, checking that the reverse cast still yields the extreme-minus-one value and returning the last value that does and the first value that does not.

@LeventErkok
Copy link
Owner

And do this for each rounding mode?

@LeventErkok
Copy link
Owner

I guess you also have to make sure there are no optimizations turned on? You reported different compilers return different values with differing flags; so I'm a little concerned with that.

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

Each rounding mode, each integer type, each floating-point type and each boundary.

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

I don't believe my plan will involve any undefined behavior, since I'm only ever rounding from the floating-point domain to the value next closest to the maximum or minimum representable by the integer domain, which is a defined cast. Did I miss something?

@LeventErkok
Copy link
Owner

I guess there's an argument there about continuity; but these things are tricky to think about.

If you end up writing a C program that generates that table automatically (which I suspect you will!), please consider turning that into the SBV repo. There's a buildUtils directory that we can put it in and refer to it later should something be in dispute. (And regenerate the table if necessary.)

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

I'm just testing that it works in a single case right now. I will update with the whole program if it's working.

I'm not sure I understand the continuity issue you are referring to. My naive hope is that nextafter() will be sufficient to make this work.

@LeventErkok
Copy link
Owner

Oh, I meant your nextafter idea will work because of continuity. But I've been bitten by bizarre floating-point behavior before, so I wanted to hedge my bets.

@LeventErkok
Copy link
Owner

One final point. When you fill in the entries of the table, please use hexadecimal floats: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#hexadecimal-floating-point-literals

This way we don't have to worry about precision loss from/to strings. I think the corresponding printf modifier is %a in C.

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

Re-reading your comment above the table, I want to clarify that I'm not (currently) computing the min/max roundable value. I'm computing the min (max) floating value for which rounding from floating to integer yields a value above (below) the min (max) bound of the integer type. This is because I set out with the goal of rounding out-of-bounds values to the appropriate bound, so I know the answer once my input value is beyond this.

I don't think the same approach will work to determine the min and max values for which casting is defined in the general case, because of the optimization issue, as you mentioned (gcc returned INT16_MAX when I gave it a high out-of-bounds value to round to int16_t, so my loop would not terminate in the right place if I enabled optimizations). But hopefully between two compilers and an explanatory note in the docs, we could reasonably expect to get the right result again in the future somehow.

@LeventErkok
Copy link
Owner

It'd be great if we can figure out a way to compute this table as I described. Then I can easily use it to both correctly generate SMTLib and C that match the Haskell behavior. (The whole goal of SBV is to match Haskell behavior in SMTLib and C; not the other way around.)

I'll think about this some more; but please explore and let me know what you find out!

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

I will try my best to compute both! Thanks for the tip about %a, I had been memcpying.

@LeventErkok
Copy link
Owner

Hmm, looks like maybe we can just use z3 to compute these for us!

Prelude Data.SBV> optimizeWith z3{printBase=16} Lexicographic $ \x z -> do { minimize "z" z; return (sFromIntegral (z::SInt16) .== fpRoundToIntegral sRTZ (x :: SDouble)) }
Optimal model:
  s0 = -32768.004396084696 :: Double
                  6    5          4         3         2         1         0
                  3 21098765432 1098765432109876543210987654321098765432109876543210
                  S ----E11---- ------------------------F52-------------------------
          Binary: 1 10000001110 0000000000000000000000100100000000110100001000000000
             Hex: C0E0 0000 2403 4200
       Precision: DP
            Sign: Negative
        Exponent: 15 (Stored: 1038, Bias: 1023)
       Hex-float: -0x1.00000240342p15
           Value: -32768.004396084696 (NORMAL)
  s1 =             -0x8000 :: Int16
  z  =             -0x8000 :: Int16
Prelude Data.SBV> optimizeWith z3{printBase=16} Lexicographic $ \x z -> do { maximize "z" z; return (sFromIntegral (z::SInt16) .== fpRoundToIntegral sRTZ (x :: SDouble)) }
Optimal model:
  s0 = 32767.00781250052 :: Double
                  6    5          4         3         2         1         0
                  3 21098765432 1098765432109876543210987654321098765432109876543210
                  S ----E11---- ------------------------F52-------------------------
          Binary: 0 10000001101 1111111111111100000010000000000000000000000010001111
             Hex: 40DF FFC0 8000 008F
       Precision: DP
            Sign: Positive
        Exponent: 14 (Stored: 1037, Bias: 1023)
       Hex-float: +0x1.fffc08000008fp14
           Value: +32767.00781250052 (NORMAL)
  s1 =            0x7fff :: Int16
  z  =            0x7fff :: Int16

I belive this is exactly what we're looking for!

@peddie
Copy link
Collaborator Author

peddie commented Feb 28, 2019

You are the man! That's exactly it! Time to throw away my hacky binary search code.

What else do you need from me to implement this?

@LeventErkok
Copy link
Owner

Thanks Matt. I think there's enough evidence here that Z3 values are good. And if you actually confirm there are MPFR bugs, I think z3 folks would love to hear that their tool was instrumental in that; if you do pursue that, please do keep me and z3 folks in the loop.

Regarding SBV: Yes; it'll be great to get some confidence that the C translation does the right thing. I trust you noticed the caveat about the out-of-bounds Int64 range, right? I had to make that underspecified as I couldn't get any proofs going with z3. (Note that Word64 is fine; the unspecified case is when the value is out-of-range for the target and out of range of Int64. If you've got either, the translation should work fine.)

@peddie
Copy link
Collaborator Author

peddie commented Mar 4, 2019

The changes you've made look great to me. I think I'll have a much easier time working with the generated code now, as the scope of undefined behavior is greatly reduced. I will let you know in the next few days whether I find any problems on the C side.

Unfortunately, I had to leave the one case unspecified: If the float doesn't fit into the target type, and if it doesn't also fit into Int64 range (regardless of the signedness of the target), then the result is undefined. The issue is that there's no way to get Z3 or C translation to work correctly in this case: The C-case follows from undefinedness. SMTLib case follows because the conversion is unspecified, and the correct way to do this would be to convert the float to a Real and then to an Int, and then to a bit-vector, and I know for a fact that float-to-real conversion (while defined in SMTLib) isn't really supported by z3 or mathsat. It's just too difficult for solvers to deal with this case.

This makes sense; not sure what else you can do (at least on the C side) when faced with this issue.

The nice thing that I like about the new design is that it gives the user a choice: The two new predicates sFloatInRange and sDoubleInRange (together with fpIsNaN, fpIsInfinite) gives you the control to check and do the conversions as you wish in the user code. I understand this is important to you.

This is indeed pretty convenient for my use case (and it matches mpfr_fits_*_p() which turned out to be handy).

@LeventErkok
Copy link
Owner

@peddie

Hi Matt. I'll put out an SBV release this weekend, and I'm afraid the next release after that won't be for quite a while. It'd be good to close this ticket before the release goes out. For the purposes of this, perhaps we can just assume the constants as computed now are good, and just make sure that the C- code that gets generated is good? If you can at least run some cases and convince yourself of the C output, it'd be great. (They should all generate very similar code anyhow, aside from the particular bounds used.)

@peddie
Copy link
Collaborator Author

peddie commented Mar 6, 2019

I'm assuming the constants are OK for now; I simply haven't had time to move our internal system to the latest SBV and run it through the gauntlet yet due to other responsibilities. It shouldn't take too long once I can get to it, and I will do my best to get this done before your weekend begins. Thanks for the heads-up.

@LeventErkok
Copy link
Owner

LeventErkok commented Mar 6, 2019

Great!

Since SBV only generates C code when rounding mode is RoundNearestTiesToEven, the number of cases you need to test is rather small actually. And it always generates the "same" kind of code. I think the following combos would do:

    Float -> SWord8, SInt8, SWord64, SInt64
    Double -> SWord8, SInt8, SWord64, SInt64

So, essentially conversions to signed/unsigned 8 and 64 would cover it all I think. Of course, if you automate, then you can go for all 16 combinations adding SInt/Word16; and SInt/Word32.

@peddie
Copy link
Collaborator Author

peddie commented Mar 7, 2019

I'm starting on this work now, and I'm finding sFloatInRange and sDoubleInRange not quite sufficient for my use case. There are two reasons for this:

  • Each identifies whether the input floating value is within the bounds from the constant table, but this does not match the logic in genericFPConverter's resultFromIntermediate; some values for which sFloatInRange returns False will be rounded (safely, with defined behavior) if you actually call fromSFloat on them.

  • Neither lets me know in which direction I'm out of the range of the result type.

In my ideal world, I'd be able to get all the information at play here through the public interface:

  1. Is the value in range for the target type (sFloatInRange)?
  2. Is rounding the value to the target type well-defined (maybe sFloatSafelyRoundable or something?)
  3. Which way does the input value violate either of these bounds (maybe data OutOfBounds = BelowMinBound | AboveMaxBound and sFloatOutOfRange :: SFloat -> Maybe OutOfBounds, sFloatNotSafelyRoundable :: SFloat -> Maybe OutOfBounds)?

But I understand that as described, this may be a bit more complex than most users want. The function I have implemented on my end is equivalent to sFloatNotSafelyRoundable, so I can find out whether I'm in danger of undefined behavior and know what to do to avoid it all at once.

LeventErkok added a commit that referenced this issue Mar 7, 2019
@LeventErkok
Copy link
Owner

LeventErkok commented Mar 7, 2019

Good points all around!

I've added new predicates:

sFloatOverflow, sFloatUnderflow, sDoubleOverflow, sDoubleUnderflow

These should tell you if you can underflow/overflow in each case. sFloatInRange and sDoubleInRange still exist; but are now defined in terms of the above in the obvious sense.

(Note that it's not possible to give Maybe results or things like data OutOfBOunds = BelowMin | AboveMax in a symbolic context. These functions will properly work when rounding mode and the inputs are possibly symbolic.)

Regarding the conversion via intermediate: That one is on purpose to match the Haskell semantics. I'm not saying what Haskell is doing is reasonable, but SBV has a goal of matching what Haskell does as close as possible. (This isn't always possible as we found out when the float is out-of-range for Int64, but I don't know how to overcome that.)

Will these additions be sufficient for your purposes? Also, would be awesome to have some confirmation on the C-output preserving the claimed semantics.

LeventErkok added a commit that referenced this issue Mar 7, 2019
@LeventErkok
Copy link
Owner

Also, just exported conversionBounds from Data.SBV.Internals: In case you need the bounds for some future purpose.

@peddie
Copy link
Collaborator Author

peddie commented Mar 7, 2019

Thanks very much for those changes. Especially with conversionBounds in hand, I shouldn't have any further issues.

The C code looks pretty good so far (I've run a few hundred random cases). I will do (heaps) more tests tomorrow and report back.

@LeventErkok
Copy link
Owner

Wonderful! Looking forward to hearing about those test results, hopefully, it'll all go well.

I do need to make an SBV release by this Saturday sometime, so would be great to at least know that the C translation is not "obviously broken."

@peddie
Copy link
Collaborator Author

peddie commented Mar 8, 2019

I wasn't able to do quite as many tests today as I'd planned, for various reasons, but I still haven't found any issues with the generated C by testing (and I do know that plenty of cases involving rounding behavior occurred). As with any randomized testing system, I can never be totally sure. I think there's "nothing obviously wrong."

I began preparing my MPFR bug report and discovered that in fact some of our Z3-derived constants are incorrect. I'll give one case as an example: going from Float to Int32 using RoundNearestTiesToEven. Here is a table of the relevant values (great job on crackNum, by the way; it's really great!):

Value     | Hex            | Decimal       | Binary
-------------------------------------------------------------------------------
Z3        | 0x1p+31        | 2147483648.00 | 0 10011110 00000000000000000000000
INT32_MAX |                | 2147483647    |
MPFR      | 0x1.fffffep+30 | 2147483520.00 | 0 10011101 11111111111111111111111

If we try to round 0x1p+31 using rintf(), we'll of course get back the same decimal value as above. It's clearly greater than INT32_MAX. I'm not sure any rounding (in the IEEE 754 sense of having to choose which 32-bit Float to use to represent a more precise result) is actually invoked here; this jibes with the Z3 and MPFR tables having identical values for all rounding modes. If I cast rintf(0x1p+31) to int32_t and run the program with the undefined behavior sanitizer enabled, I do get a warning (and a negative result from the cast). If I ask GHC about this situation, it agrees (as does SBV, in spite of the table):

*Data.SBV> let x = 0x1.fffffep+30 :: SFloat 
*Data.SBV> let xi = fromSFloat sRNE $ fpRoundToIntegral sRNE x :: SInt32 
*Data.SBV> x
2.1474835e9 :: SFloat
*Data.SBV> xi
2147483520 :: SInt32
*Data.SBV> sFromIntegral xi :: SFloat
2.1474835e9 :: SFloat
*Data.SBV> let y = 0x1p31 :: SFloat                                                  
*Data.SBV> let yi = fromSFloat sRNE $ fpRoundToIntegral sRNE y :: SInt32              
*Data.SBV> y 
2.1474836e9 :: SFloat
*Data.SBV> yi
-2147483648 :: SInt32
*Data.SBV> sFromIntegral yi :: SFloat
-2.1474836e9 :: SFloat

Are we missing another constraint from our optimization problem? Maybe we need to add a constraint about round-tripping or something? The floating-point value after fpRoundToIntegral should be required to match the floating-point value converted from the rounded integer value, perhaps? I'll try it.

I'm terribly sorry to give this news at the eleventh hour. I should be able to give this my full attention tomorrow (my Saturday).

@LeventErkok
Copy link
Owner

Hi Matt! Great analysis!

I agree that it's bizarre to map to a value that's in bounds, but not what you'd expect. So, perhaps MPFR has a point on this.

I'm not sure how to get z3 to compute the MPFR bounds. Will need to do some thinking, but please charge ahead. You've a much better understanding of this than I do.

@peddie
Copy link
Collaborator Author

peddie commented Mar 9, 2019

I have identified a (very minor) issue with the C code:

> fmap (flip showHFloat "") $ unliteral (sFromIntegral (-(1 :: SWord64)) :: SDouble)
Just "0x1.fffffffffffffp63"

Then compile it:

> compileToC Nothing "test" $ do { input <- cgInput "input" :: SBVCodeGen SWord64; let { out = sFromIntegral (-input) :: SDouble }; cgReturn out }
...
SDouble test(const SWord64 input)
{
  const SWord64 s0 = input;
  const SWord64 s1 = (- s0);
  const SDouble s2 = (SDouble) s1;

  return s2;
}

When I run this program:

test(1) : 0x1p+64 (1.8446744073709552e+19)

So it's a one-bit mismatch.

@LeventErkok
Copy link
Owner

LeventErkok commented Mar 9, 2019

This one doesn't even use toSFloat/toSDouble. Sigh. But it's actually a pure Haskell problem no?

Prelude Data.Word Numeric> flip showHFloat "" $ (fromIntegral (-(1 :: Word64)) :: Double)
"0x1.fffffffffffffp63"

Versus:

#include <stdio.h>
#include <inttypes.h>

int main(void)
{
  uint64_t inp  = 1;
  uint64_t ninp = - inp;
  double   out  = (double) ninp;

  printf("%a\n", out);
}

which prints:

0x1p+64

Are these "corresponding" programs in Haskell/C in your mind? If not, what should be the correct translation? (Note that there's no SBV involved in this one at all!)

@peddie
Copy link
Collaborator Author

peddie commented Mar 9, 2019

Yeah, I agree that this is just a difference in how C and Haskell do their rounding. I think it will only matter in this edge case, but I suppose since the value 0x1p64 is inside the bounds of the "defined rounding behavior" table, I had hoped that the generated C might match Haskell/SBV in this case. I will work around it myself.

@LeventErkok
Copy link
Owner

What would be the correct translation that avoids this?

@peddie
Copy link
Collaborator Author

peddie commented Mar 9, 2019

I think it's likely to be more complex than is worth it in the general case. Haskell and C agree on what -1 :: Word64 ought to be; it's just to do with conversion to double. I'm trying to dig into how this happens (I believe in GMP, which doesn't sound promising).

Some additional food for thought: in number of these mismatches I've come across, the mismatch stems from SBV doing constant folding in Haskell and generating a reduced C program; if the full program is generated for some input variable (rather than an inlined constant) and applied to the original constant value in C, I get a different result. My understanding is that it would be quite difficult to avoid this behavior in SBV, since it's so efficient about recovering sharing and the like immediately (rather than in a later pass before a compiler backend or something). But it's a bit of a complicating factor even if you're OK with the program as run in Haskell or in Z3 not matching the program as run in C.

@LeventErkok
Copy link
Owner

That's absolutely correct. Constant folding aggressively is essential for getting any sort of performance out of SBV. And these mismatches always come about when Haskell/SMTLib/C don't agree. The trick then becomes to carefully avoid those. The Haskell/SMTLib mismatches I'm most worried about since the whole goal of SBV is to reason about Haskell; and I'd like C to have the same meaning but it has received less attention. And floating-point by itself is a beast to start with.

@peddie
Copy link
Collaborator Author

peddie commented Mar 9, 2019

I'm having some trouble looking into the table calculation. It seems like Z3 is contradicting itself. Could this have to do with the new blast-based Metric for floating-point types?

> optResult <- optimizeWith z3 Lexicographic $ do { x <- free "x" :: Symbolic SDouble; maximize "x" x; let { rounded = observe "rounded" $ fpRoundToIntegral sRNE x; converted = observe "converted" $ fromSDouble sRNE rounded :: SInt64; roundTripped = observe "roundTripped" $ sFromIntegral converted :: SDouble; }; constrain $ roundTripped .== rounded }
> let xout = case optResult of { LexicographicResult  m -> maybe (error "disaster!") id $ getModelValue "x" m; _ -> error "oh no!" } :: Double
> showHFloat xout ""
"0x1p63"
> prove $ do { x <- free "x" :: Symbolic SDouble; constrain $ x .== literal xout; let { rounded = observe "rounded" $ fpRoundToIntegral sRNE x; converted = observe "converted" $ fromSDouble sRNE rounded :: SInt64; roundTripped = observe "roundTripped" $ sFromIntegral converted :: SDouble; }; pure $ roundTripped .== rounded }
Falsifiable. Counter-example:
  converted    =  -9223372036854775808 :: Int64
  roundTripped = -9.223372036854776e18 :: Double
  rounded      =  9.223372036854776e18 :: Double
  x            =  9.223372036854776e18 :: Double
> satResult <- sat $ do { x <- free "x" :: Symbolic SDouble; constrain $ x .== literal xout; let { rounded = observe "rounded" $ fpRoundToIntegral sRNE x; converted = observe "converted" $ fromSDouble sRNE rounded :: SInt64; roundTripped = observe "roundTripped" $ sFromIntegral converted :: SDouble; }; pure $ roundTripped .== rounded }
> flip showHFloat "" <$> (getModelValue "x" satResult  :: Maybe Double)
Just "0x1p63"

The theorem is the same as the optimization problem, with the maximize "x" x replaced by constrain $ x .== literal xout and the constrain $ roundTripped .== rounded replaced by pure $ roundTripped .== rounded. Have I done something underspecified?

@LeventErkok
Copy link
Owner

LeventErkok commented Mar 9, 2019

Well, look at the last 4 lines of the conversions table. It says everything in the range -1p63 to 1p63 is safe. Even for float. That's just wrong, isn't it?

Here're the lines for reference: https://github.com/LeventErkok/sbv/blob/master/Data/SBV/Utils/FloatConversionBounds.hs#L63-L66

I don't think this has to the with the Metric expansion, but something else is going on. I don't trust these numbers anymore. Hmm, back to square one.

@LeventErkok
Copy link
Owner

One thing I'm noticing is that we can't really use SBV to compute these numbers! Because the construction of the problem relies on these numbers! Kind of a chicken-and-egg problem. Does that make sense?

We need to directly use z3 or figure out some other way.

@LeventErkok
Copy link
Owner

There's also this: https://gforge.inria.fr/tracker/index.php?func=detail&aid=6212&group_id=136&atid=619

I'm now really hesitant to include these predicates of bounds checking with SBV, and the conversion table all together.

Instead, I'm inclined to go back to the original version of these definitions, but put a comment saying:

  1. If the input float/double is 0, +oo, -oo; then conversion produces 0
  2. Otherwise, we correctly round the float to the next integral value representable.
  3. If this rounding produces a value that fits in the target type, then the result is that value. If not, then the result is unspecified. In this last case, Haskell, SMTLib, and C-translations might all disagree; no guarantees are given.

This is less functionality, but it is honest and clear. (And implementable in the obvious way!)

All the "inRange"/"overflow"/"underflow" predicates would go away. Note that those can later be defined outside of SBV, if we can eventually figure out a "correct" bounds table somehow.

Unless I hear otherwise from you, I'll implement this and put out release 8.1. What do you think?

@peddie
Copy link
Collaborator Author

peddie commented Mar 9, 2019

I think if you're in a hurry to get the release out, definitely go ahead with the 8.0 behavior and thorough docs. It seems like we know there should be a better way, but we clearly don't know exactly how to do it yet, so it doesn't make sense to throw it out there.

I can ask on the MPFR list whether there are still concerns related to that ticket. I keep testing edge values hoping to catch them out on something I know for sure is wrong, but so far it's been right in every case. I notice now, though, that the MPFR table also features -0x1p63 for Float -> Int64 conversions, which seems pretty unlikely. I'll look at that next.

I've found no other issues with the C generation so far either (not that it matters if you are planning to roll back).

@LeventErkok
Copy link
Owner

I'd trust MPFR for float/double. (Though they might have issues for other precisions, but that's not really our concern.)

I'd like the predicates to be present in SBV, but I agree that it won't happen until we can get those bound numbers computed somehow. I'll put a release out by rolling back the definitions to 8.0 essentially, and perhaps predicates can make it to 8.2; though I'm not sure when that'll be a possibility.

The reason I'm rushing is because I'm switching jobs and starting a new one on Monday, and unfortunately my new employer isn't quite lenient about open-source. So, it isn't clear to me when there'll be an 8.2.

But if you ever construct that table and create the correct overflow/underflow predicates, I'd strongly encourage you to put that on to hackage as a separate package built on top of SBV; and users can be very precise about conversions that way. And perhaps one day that table can be integrated into SBV proper.

@LeventErkok
Copy link
Owner

@peddie

Hi Matt.. I've completely gone back on the semantics, and made it completely unspecified if the input is out-of-bounds for the target. I've also changed so that even -oo, oo, and NaN are unspecified now. It really doesn't make sense to map them to 0. (I think my original motivation was because that's what Haskell did, but in hindsight, it's just as meaningless.)

Furthermore, SBV performs no constant folding when given a float/double: They all go symbolic. This was a necessary change to make sure that people don't trip up over different meanings if you get to constant fold or not, as constant folding is really not something users have any control over. I think this'll simplify your life as well to a certain extent; since you don't have to worry about SBV constant-folding things depending on how you code your test programs.

The nice effect of this is that the generated C is absolutely dead-simple now. Simply do an rint or rintf; followed by a type-cast. The SBV code doing all this stuff is simpler as well.

Please give it a shot and let me know what you think. However, as far as 8.1 is concerned, I think this's how it'll go out; unless you point out some obvious bug. (Hopefully, you'll see this in time to respond!)

I'll open a new ticket regarding missing predicates. I want them, but not sure if we can fit it into the current release.

@LeventErkok
Copy link
Owner

SBV 8.1 is now out on hackage; closing the door on anything new for a while. But let's continue the discussion on #462; I'd like to at least understand why z3 is producing what it's producing. I have an inkling that we should be able to "fix" it by avoiding the undefined behavior somehow, but so far it eluded me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants