New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid constant-folding when converting from floating-point to integer #456
Comments
There are really two bugs here. One of them is conversion to C; and I believe I just fixed that. Please give it a shot and let me know if it's still not working. The second is trickier, unfortunately. The SMTLib conversion functions are essentially uninterpreted when the value is out of range. This is directly from the spec:
But that's not what Haskell (or C) does. So, when you use the prover to compute the values, you're essentially getting a junk value when the input is out-of-range. I need to think a bit about how to address that; it's not quite trivial because checking "out-of-range" is a tricky thing to do; but should be possible. I'll deal with it sometime this weekend. Thanks for the report! |
Thanks for the fast turnaround as always. I've got your fix for the function described above, which works. Your explanation of SMTLIB's behavior makes sense. Unfortunately I seem to have another, similar situation where the result I get in GHCI doesn't match the result I get in a generated C program. It's undefined behavior what happens in C in this particular situation, but SBV doesn't match any version of GCC or clang I could test, and these compilers all agree. Define a conversion function:
Try converting an out-of-range value:
Now generate the corresponding C code for
When I call this function from
I see
Is this a bug, or is this kind of thing undefined too many places (C99, SMTLIB), and I shouldn't expect things to line up? |
I'm stumped on this too. Let's forget about SBV for a second, and compare Haskell to C: Prelude Data.Int> round (4294967295::Double) :: Int16
-1 But: #include <stdio.h>
#include <inttypes.h>
int main(void) {
double d = 4294967295;
int16_t r = (int16_t) d;
printf("%"PRId16"\n", r);
return 0;
} which produces:
We need to understand why Haskell and C are disagreeing here first. Do you know why? |
In Section 6.3.1.4 of the C99 spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), it says:
I couldn't find anything clear on the Haskell side on how this should behave. I think there's a general consensus that for "undefined" things, C and Haskell are undefined in the same way. (Whatever that means!) But I think you found a case where their undefined behavior diverges. Perhaps ask about this in the Haskell mailing list? |
Sure, I'll ask on the Haskell-cafe list whether anyone knows why this behavior is different. By the way, if I enable optimizations with |
Yeah, I think this is one of those cases it's impossible to align; obviously don't want to generate different code for different optimization levels. Does GHC's answer change with different optimization levels? |
I haven't found any optimization flag settings that can change GHC's output in this case. I agree that it's not really possible to get everything to line up using the built-in operations. What I've already begun doing for my program is to ensure that rounding will always result in the nearest valid value (defining 32767 as the "right" answer in this case) and generate C code that explicitly checks the bounds and matches the Haskell behavior. Maybe it would also suit SBV to have some kind of matching behavior like this across SMTLIB, Haskell and C available out of the box or by default? |
I've been thinking about it, but haven't found a simple enough solution to align Haskell, SMTLib, C yet. (I'm most worried about the Haskell<->SMTLib correspondence, but if we get the C equivalence, that'd be nice as well.) There's some discussion here on how to do this: https://stackoverflow.com/questions/526070/handling-overflow-when-casting-doubles-to-integers-in-c But it's not obvious how to solve the problem "correctly" as the integer-bounds are not necessarily exactly always representable. If you've found a good way to solve the problem that works for all conversions Float/Double/Int8-16-32-64/Word8-16-32-64, I'd love to hear it! |
More info here: https://wiki.sei.cmu.edu/confluence/display/c/FLP34-C.+Ensure+that+floating-point+conversions+are+within+range+of+the+new+type I'm also quite hesitant to spit out a lot of code that complicates both verification and performs a ton of unnecessary checks for the "happy path" execution. |
Thought some more about this, and it is really complicated! The semantics is really the following:
What this means is that depending on the rounding mode, you can get a different result for the same conversion. (There's, of course, no way to see this in Haskell, since there's only one rounding mode.) So, even if we're willing to check the "out-of-bounds," that means we have to first do a rounding to and infinitely precise real-number first using the rounding mode. And that's where it gets complicated: There's no function in SMTLib to do that, nor in Haskell or C, that I can rely on. The way hardware implements this sort of stuff is that they have an internal notion of "working-precision" number, which is conceptually infinitely precise. (In reality, of course, they only need to keep track of what's known as guard/sticky/round bits; I found some notes here if you are interested: http://pages.cs.wisc.edu/~markhill/cs354/Fall2008/notes/flpt.apprec.html) Long story short, this looks extremely tricky to do correctly across Haskell/SMTLib/C. I'm open for ideas, but I'm inclined towards documenting this a bit more clearly in the code and saying "undefined," that is neither the SMTLib conversion nor the C-translation is guaranteed to match what Haskell does if the inputs are out-of-bounds for the target type and the given rounding mode. What do you think? |
Summary: I understand the price of blowing out the size of SMTLIB expressions just for this hopefully-unlikely case, and I think it makes sense for you to just say "no two parts of this system agree, so don't do this." Long version: Thanks for the pointers. I have come across some of this before, but I didn't even know about I think what I am planning to do is like this:
It's not yet clear to me that for that limited set of cases ( It sounds like our goals are diverging here, because my main application at this point is the C-code generation (most of our terms are far too large and contain too many 64-bit values for any SMT solver I've found to return results in reasonable time), and I'm happy to modify my Haskell DSL's behavior to get whatever behavior I want from the final C program; meanwhile SBV is of course focused on SMT interaction. In other words, I don't think you should worry too much about solving the problems I'm encountering. |
Ah, you make an excellent point. We only need to compute the "boundaries" once. Too bad we cannot do this in Haskell, but that doesn't mean these cannot be computed elsewhere. I'm perfectly happy to adopt this solution directly in SBV! To summarize, I need the following table: import Data.SBV
-- For each kind/rounding-mode combination; return the min/max convertable
-- values for both float and doubles. For each combo we have 3 values: The minumum
-- representable, the maximum representable, and the default value to return when
-- the input is out-of-range. This should follow what "Haskell" does.
ranges :: [((Kind, RoundingMode) , ((Float, Float, Float) , (Double, Double, Double)))]
ranges = [ ((KBounded False 8, RoundNearestTiesToEven), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 8, RoundNearestTiesToAway), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 8, RoundTowardPositive ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 8, RoundTowardNegative ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 8, RoundTowardZero ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 16, RoundNearestTiesToEven), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 16, RoundNearestTiesToAway), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 16, RoundTowardPositive ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 16, RoundTowardNegative ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 16, RoundTowardZero ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 32, RoundNearestTiesToEven), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 32, RoundNearestTiesToAway), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 32, RoundTowardPositive ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 32, RoundTowardNegative ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 32, RoundTowardZero ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 64, RoundNearestTiesToEven), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 64, RoundNearestTiesToAway), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 64, RoundTowardPositive ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 64, RoundTowardNegative ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded False 64, RoundTowardZero ), ((0, tbd, tbd), (0, tbd, tbd)))
, ((KBounded True 8, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 8, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 8, RoundTowardPositive ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 8, RoundTowardNegative ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 8, RoundTowardZero ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 16, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 16, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 16, RoundTowardPositive ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 16, RoundTowardNegative ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 16, RoundTowardZero ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 32, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 32, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 32, RoundTowardPositive ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 32, RoundTowardNegative ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 32, RoundTowardZero ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 64, RoundNearestTiesToEven), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 64, RoundNearestTiesToAway), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 64, RoundTowardPositive ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 64, RoundTowardNegative ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
, ((KBounded True 64, RoundTowardZero ), ((tbd, tbd, tbd), (tbd, tbd, tbd)))
] Once you fill on all the I'm not sure what it would take to "validate" these numbers. But let's see how you compute them first (because I'm not sure how to go about it), we can think about that later. |
It's by no means clever, but my plan is to write a loop in C that starts by casting the extreme-minus-one value to double, then calls |
And do this for each rounding mode? |
I guess you also have to make sure there are no optimizations turned on? You reported different compilers return different values with differing flags; so I'm a little concerned with that. |
Each rounding mode, each integer type, each floating-point type and each boundary. |
I don't believe my plan will involve any undefined behavior, since I'm only ever rounding from the floating-point domain to the value next closest to the maximum or minimum representable by the integer domain, which is a defined cast. Did I miss something? |
I guess there's an argument there about continuity; but these things are tricky to think about. If you end up writing a C program that generates that table automatically (which I suspect you will!), please consider turning that into the SBV repo. There's a |
I'm just testing that it works in a single case right now. I will update with the whole program if it's working. I'm not sure I understand the continuity issue you are referring to. My naive hope is that |
Oh, I meant your |
One final point. When you fill in the entries of the table, please use hexadecimal floats: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#hexadecimal-floating-point-literals This way we don't have to worry about precision loss from/to strings. I think the corresponding |
Re-reading your comment above the table, I want to clarify that I'm not (currently) computing the min/max roundable value. I'm computing the min (max) floating value for which rounding from floating to integer yields a value above (below) the min (max) bound of the integer type. This is because I set out with the goal of rounding out-of-bounds values to the appropriate bound, so I know the answer once my input value is beyond this. I don't think the same approach will work to determine the min and max values for which casting is defined in the general case, because of the optimization issue, as you mentioned (gcc returned |
It'd be great if we can figure out a way to compute this table as I described. Then I can easily use it to both correctly generate SMTLib and C that match the Haskell behavior. (The whole goal of SBV is to match Haskell behavior in SMTLib and C; not the other way around.) I'll think about this some more; but please explore and let me know what you find out! |
I will try my best to compute both! Thanks for the tip about |
Hmm, looks like maybe we can just use z3 to compute these for us! Prelude Data.SBV> optimizeWith z3{printBase=16} Lexicographic $ \x z -> do { minimize "z" z; return (sFromIntegral (z::SInt16) .== fpRoundToIntegral sRTZ (x :: SDouble)) }
Optimal model:
s0 = -32768.004396084696 :: Double
6 5 4 3 2 1 0
3 21098765432 1098765432109876543210987654321098765432109876543210
S ----E11---- ------------------------F52-------------------------
Binary: 1 10000001110 0000000000000000000000100100000000110100001000000000
Hex: C0E0 0000 2403 4200
Precision: DP
Sign: Negative
Exponent: 15 (Stored: 1038, Bias: 1023)
Hex-float: -0x1.00000240342p15
Value: -32768.004396084696 (NORMAL)
s1 = -0x8000 :: Int16
z = -0x8000 :: Int16
Prelude Data.SBV> optimizeWith z3{printBase=16} Lexicographic $ \x z -> do { maximize "z" z; return (sFromIntegral (z::SInt16) .== fpRoundToIntegral sRTZ (x :: SDouble)) }
Optimal model:
s0 = 32767.00781250052 :: Double
6 5 4 3 2 1 0
3 21098765432 1098765432109876543210987654321098765432109876543210
S ----E11---- ------------------------F52-------------------------
Binary: 0 10000001101 1111111111111100000010000000000000000000000010001111
Hex: 40DF FFC0 8000 008F
Precision: DP
Sign: Positive
Exponent: 14 (Stored: 1037, Bias: 1023)
Hex-float: +0x1.fffc08000008fp14
Value: +32767.00781250052 (NORMAL)
s1 = 0x7fff :: Int16
z = 0x7fff :: Int16 I belive this is exactly what we're looking for! |
You are the man! That's exactly it! Time to throw away my hacky binary search code. What else do you need from me to implement this? |
Thanks Matt. I think there's enough evidence here that Z3 values are good. And if you actually confirm there are MPFR bugs, I think z3 folks would love to hear that their tool was instrumental in that; if you do pursue that, please do keep me and z3 folks in the loop. Regarding SBV: Yes; it'll be great to get some confidence that the C translation does the right thing. I trust you noticed the caveat about the out-of-bounds |
The changes you've made look great to me. I think I'll have a much easier time working with the generated code now, as the scope of undefined behavior is greatly reduced. I will let you know in the next few days whether I find any problems on the C side.
This makes sense; not sure what else you can do (at least on the C side) when faced with this issue.
This is indeed pretty convenient for my use case (and it matches |
Hi Matt. I'll put out an SBV release this weekend, and I'm afraid the next release after that won't be for quite a while. It'd be good to close this ticket before the release goes out. For the purposes of this, perhaps we can just assume the constants as computed now are good, and just make sure that the C- code that gets generated is good? If you can at least run some cases and convince yourself of the C output, it'd be great. (They should all generate very similar code anyhow, aside from the particular bounds used.) |
I'm assuming the constants are OK for now; I simply haven't had time to move our internal system to the latest SBV and run it through the gauntlet yet due to other responsibilities. It shouldn't take too long once I can get to it, and I will do my best to get this done before your weekend begins. Thanks for the heads-up. |
Great! Since SBV only generates C code when rounding mode is
So, essentially conversions to signed/unsigned 8 and 64 would cover it all I think. Of course, if you automate, then you can go for all 16 combinations adding |
I'm starting on this work now, and I'm finding
In my ideal world, I'd be able to get all the information at play here through the public interface:
But I understand that as described, this may be a bit more complex than most users want. The function I have implemented on my end is equivalent to |
Good points all around! I've added new predicates:
These should tell you if you can underflow/overflow in each case. (Note that it's not possible to give Regarding the conversion via intermediate: That one is on purpose to match the Haskell semantics. I'm not saying what Haskell is doing is reasonable, but SBV has a goal of matching what Haskell does as close as possible. (This isn't always possible as we found out when the float is out-of-range for Int64, but I don't know how to overcome that.) Will these additions be sufficient for your purposes? Also, would be awesome to have some confirmation on the C-output preserving the claimed semantics. |
Also, just exported |
Thanks very much for those changes. Especially with The C code looks pretty good so far (I've run a few hundred random cases). I will do (heaps) more tests tomorrow and report back. |
Wonderful! Looking forward to hearing about those test results, hopefully, it'll all go well. I do need to make an SBV release by this Saturday sometime, so would be great to at least know that the C translation is not "obviously broken." |
I wasn't able to do quite as many tests today as I'd planned, for various reasons, but I still haven't found any issues with the generated C by testing (and I do know that plenty of cases involving rounding behavior occurred). As with any randomized testing system, I can never be totally sure. I think there's "nothing obviously wrong." I began preparing my MPFR bug report and discovered that in fact some of our Z3-derived constants are incorrect. I'll give one case as an example: going from
If we try to round
Are we missing another constraint from our optimization problem? Maybe we need to add a constraint about round-tripping or something? The floating-point value after I'm terribly sorry to give this news at the eleventh hour. I should be able to give this my full attention tomorrow (my Saturday). |
Hi Matt! Great analysis! I agree that it's bizarre to map to a value that's in bounds, but not what you'd expect. So, perhaps MPFR has a point on this. I'm not sure how to get z3 to compute the MPFR bounds. Will need to do some thinking, but please charge ahead. You've a much better understanding of this than I do. |
I have identified a (very minor) issue with the C code:
Then compile it:
When I run this program:
So it's a one-bit mismatch. |
This one doesn't even use
Versus: #include <stdio.h>
#include <inttypes.h>
int main(void)
{
uint64_t inp = 1;
uint64_t ninp = - inp;
double out = (double) ninp;
printf("%a\n", out);
} which prints:
Are these "corresponding" programs in Haskell/C in your mind? If not, what should be the correct translation? (Note that there's no SBV involved in this one at all!) |
Yeah, I agree that this is just a difference in how C and Haskell do their rounding. I think it will only matter in this edge case, but I suppose since the value |
What would be the correct translation that avoids this? |
I think it's likely to be more complex than is worth it in the general case. Haskell and C agree on what Some additional food for thought: in number of these mismatches I've come across, the mismatch stems from SBV doing constant folding in Haskell and generating a reduced C program; if the full program is generated for some input variable (rather than an inlined constant) and applied to the original constant value in C, I get a different result. My understanding is that it would be quite difficult to avoid this behavior in SBV, since it's so efficient about recovering sharing and the like immediately (rather than in a later pass before a compiler backend or something). But it's a bit of a complicating factor even if you're OK with the program as run in Haskell or in Z3 not matching the program as run in C. |
That's absolutely correct. Constant folding aggressively is essential for getting any sort of performance out of SBV. And these mismatches always come about when Haskell/SMTLib/C don't agree. The trick then becomes to carefully avoid those. The Haskell/SMTLib mismatches I'm most worried about since the whole goal of SBV is to reason about Haskell; and I'd like C to have the same meaning but it has received less attention. And floating-point by itself is a beast to start with. |
I'm having some trouble looking into the table calculation. It seems like Z3 is contradicting itself. Could this have to do with the new blast-based
The theorem is the same as the optimization problem, with the |
Well, look at the last 4 lines of the conversions table. It says everything in the range Here're the lines for reference: https://github.com/LeventErkok/sbv/blob/master/Data/SBV/Utils/FloatConversionBounds.hs#L63-L66 I don't think this has to the with the |
One thing I'm noticing is that we can't really use SBV to compute these numbers! Because the construction of the problem relies on these numbers! Kind of a chicken-and-egg problem. Does that make sense? We need to directly use z3 or figure out some other way. |
There's also this: https://gforge.inria.fr/tracker/index.php?func=detail&aid=6212&group_id=136&atid=619 I'm now really hesitant to include these predicates of bounds checking with SBV, and the conversion table all together. Instead, I'm inclined to go back to the original version of these definitions, but put a comment saying:
This is less functionality, but it is honest and clear. (And implementable in the obvious way!) All the "inRange"/"overflow"/"underflow" predicates would go away. Note that those can later be defined outside of SBV, if we can eventually figure out a "correct" bounds table somehow. Unless I hear otherwise from you, I'll implement this and put out release 8.1. What do you think? |
I think if you're in a hurry to get the release out, definitely go ahead with the 8.0 behavior and thorough docs. It seems like we know there should be a better way, but we clearly don't know exactly how to do it yet, so it doesn't make sense to throw it out there. I can ask on the MPFR list whether there are still concerns related to that ticket. I keep testing edge values hoping to catch them out on something I know for sure is wrong, but so far it's been right in every case. I notice now, though, that the MPFR table also features I've found no other issues with the C generation so far either (not that it matters if you are planning to roll back). |
I'd trust MPFR for float/double. (Though they might have issues for other precisions, but that's not really our concern.) I'd like the predicates to be present in SBV, but I agree that it won't happen until we can get those bound numbers computed somehow. I'll put a release out by rolling back the definitions to 8.0 essentially, and perhaps predicates can make it to 8.2; though I'm not sure when that'll be a possibility. The reason I'm rushing is because I'm switching jobs and starting a new one on Monday, and unfortunately my new employer isn't quite lenient about open-source. So, it isn't clear to me when there'll be an 8.2. But if you ever construct that table and create the correct overflow/underflow predicates, I'd strongly encourage you to put that on to hackage as a separate package built on top of SBV; and users can be very precise about conversions that way. And perhaps one day that table can be integrated into SBV proper. |
Hi Matt.. I've completely gone back on the semantics, and made it completely unspecified if the input is out-of-bounds for the target. I've also changed so that even Furthermore, SBV performs no constant folding when given a float/double: They all go symbolic. This was a necessary change to make sure that people don't trip up over different meanings if you get to constant fold or not, as constant folding is really not something users have any control over. I think this'll simplify your life as well to a certain extent; since you don't have to worry about SBV constant-folding things depending on how you code your test programs. The nice effect of this is that the generated C is absolutely dead-simple now. Simply do an Please give it a shot and let me know what you think. However, as far as 8.1 is concerned, I think this's how it'll go out; unless you point out some obvious bug. (Hopefully, you'll see this in time to respond!) I'll open a new ticket regarding missing predicates. I want them, but not sure if we can fit it into the current release. |
SBV 8.1 is now out on hackage; closing the door on anything new for a while. But let's continue the discussion on #462; I'd like to at least understand why z3 is producing what it's producing. I have an inkling that we should be able to "fix" it by avoiding the undefined behavior somehow, but so far it eluded me. |
I'm using the latest commit
2066aba0074a11f7409007b0417f33e40efb753b
.If I write a program that converts from floating-point to integer type:
and apply it directly to a negative constant whose rounded value is outside the range of the integral type in GHCI, I see (for example):
The generated C code disagrees with the Haskell interpreter if the constant is avoided:
results in
which when called thus
yields
However, if we let constant-folding see what we're about, we will get a different answer:
results in
If I ask Z3 to check the interpreter's evaluation, I seem to get disagreement:
But I can also constrain it to be equal to
255
, and Z3 is just as happy.What's going on here?
The text was updated successfully, but these errors were encountered: