New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIDM has platform-dependent behavior #8921
Comments
I ran a simulation of mine on a Linux System (OpenSuse) and on a Windows System and got the same fcd-output... |
I must be something subtle that requires many RNG-calls:
|
I took a deeper look into the issue with RNG-calls and added a driverstate-device to Krauss-vehicles. I also tested if it has something to do with collisions, but even without collisions the values are not the same. Could you check this behavior on your machines? The scenario: |
Update:
|
Since the "normal" floating point math governing vehicle positions should work the same on all machines (and seems to do so for other models) I suspect that it's tanh or some other math library function used so far only by EIDM. |
This appears to be "normal": https://stackoverflow.com/questions/21183477/windows-vs-linux-math-result-difference though it wasn't a problem for us so far. |
As mentioned above, the DriverState-Device suffers from a similar precision-leak as the EIDM does. I tracked the issue down to the log-function call in randNorm. Summary from my understanding:
This solution is not really elegant and may still drift away after some time, but works as intended for the above examples (circle example with the DriverState Krauss or the EIDM). |
As far as my understanding of floating point rounding goes, there are many cases where the given approach fails and may actually increase the deviation between platforms. |
On further thought, my example is maybe "proving too much"(namely, that rounding generally doesn't work). Every rounding algorithm has edge cases where nearby values that fall on different sides of a threshold are rounded away from each other. Nevertheless, there are more cases where rounding serves to reduce the difference (as you yourself have observed). |
I compared the values from the randNorm-function and could see that "q" is always the same on each system and between 0 and 1. |
I agree that rounding does not solve it but it may be good enough. I would just prefer to do the rounding on the final result instead of just the log. This would also mask out deviations stemming from different sqrt behavior. And maybe there is a more efficient way of zeroing out the last bits of the mantissa (https://stackoverflow.com/a/5672983/5731587) |
I'm pretty sure that sqrt behaves the same on all platforms or we'd have noticed (though I don't have a hard source for this). |
Indeed, recasting and changing bits instead of using a "round"-function would work a lot better. But I also agree that sqrt should not be a problem. |
But this thread looks like we could also get near it with some compiler options and/or using |
|
I dug a bit deeper and it all comes down to the log-function. I unfortunately could not find a good solution and hope I understood everything correctly. The algorithms to run the log-calculation are compiler and hardware dependant. Changing options via _controlfp and /fp:strict (for windows) or fesetround and -frounding-math/-ffloat-store/-fexcess-precision=style (for linux) did not change anything for me. The resulting difference in the log-calculation happens approx. every 100th call and consists of a rounding error (+- 1bit). To give this issue a bump: What are you preferring going forward? |
I think most users do not need to replicate the same simulation on different machines. The annoyance comes mostly from the developer side when trying to reproduce user examples. I would just document the platform-dependency of EIDM including the fact that it comes from |
But if you have a patch ready which solves it on the platforms you tested fell free to submit a PR |
The workaround I wrote about is here:
I also tried a bit-shift approach, but this did not work as intended. The log()-function only varies by +- 1 bit (rounding of the internal log algorithm), so if I make sure I cut off the number representation of this bit, both results are the same. Like I stated above, it is not a great method, but I could not find any better one (except by adding a C-software-based log()-function). So if someone absolutely needs the platform independency, they can use this approach. |
I just applied the patch with a small adaption. Please recheck whether it still works with your setup. |
First of all, I am sorry for the erroneous patch. I just recognized that back then after testing I added an int32 instead of an int64... Secondly, I must admit that the workaround does not fully solve the problem, it just slows down the drift. But you probably already know that. Your patch works on my side. For information: I am approx. getting 1 dissimilar log-return value (between the platforms) per 50.000 calls now. Previously it approx. was 1 dissimilar value per 100/1000 calls. The different return values every 50.000 call stem somewhat from the rounding issue, already posted by @namdre:
One example is:
So after some time the results of this solution will drift. That is why I would call it a "workaround", but not a solution. From my view, the only "real" solution would be to add a platform independent math library. |
maye https://www.swmath.org/software/12390 (though this is LGPL). |
and it does not look very well maintained. |
Test cfmodel/drive_in_circles_small/EIDM generates
on each machine the runs are stable (same result for 100 repeats) and also consistent accross release/debug/clang build
The text was updated successfully, but these errors were encountered: