EIDM has platform-dependent behavior #8921

namdre · 2021-08-06T06:21:18Z

Test cfmodel/drive_in_circles_small/EIDM generates

25 collisions on my local RHEL6 machine
5 collisions on my Ubuntu 20.4 machine
12 collisions on our Ubuntu test server
6 collision on my windows machine

on each machine the runs are stable (same result for 100 repeats) and also consistent accross release/debug/clang build

Domsall · 2021-08-06T09:25:03Z

I ran a simulation of mine on a Linux System (OpenSuse) and on a Windows System and got the same fcd-output...
Do you know where this could come from? RNG-values? tanh-function?

namdre · 2021-08-06T11:22:18Z

I must be something subtle that requires many RNG-calls:

when disabling the 4 Randhelper calls, the differences disappear
when only enabling the random minGap, there is still no difference
when enabling minGap and myw_gap, the difference shows up after 2500 steps at precision 4 (210 steps at precision 20)
when enabling minGap myw_gap and myw_speed, the difference shows up after 2100 steps at precision 4 (98 steps at precision 20)

Domsall · 2021-08-30T17:37:02Z

I took a deeper look into the issue with RNG-calls and added a driverstate-device to Krauss-vehicles.
The fcd-output of the Linux simulation and the Windows simulation are also different.

I also tested if it has something to do with collisions, but even without collisions the values are not the same.

Could you check this behavior on your machines?

The scenario:
circles_collisions_EIDM_and_platform_dependancy.zip

Domsall · 2021-09-01T10:43:12Z

Update:

All RNG-calls are the same (values, call and rng-number).
Very small differences start showing at some point (position is different after 10-15th decimal)
At some point the position difference between Linux and Windows creates a step, where one vehicle is on lane "X" in Linux and on lane "Y" in Windows. In that step the RNG-call of the Linux vehicle and that of the Windows vehicle are different because the RNG's belong to a lane and not to a vehicle.
After this step, the simulations start drifting away from each other

namdre · 2021-09-01T11:36:25Z

Since the "normal" floating point math governing vehicle positions should work the same on all machines (and seems to do so for other models) I suspect that it's tanh or some other math library function used so far only by EIDM.

namdre · 2021-09-01T11:39:43Z

This appears to be "normal": https://stackoverflow.com/questions/21183477/windows-vs-linux-math-result-difference though it wasn't a problem for us so far.

Domsall · 2021-09-06T19:43:47Z

As mentioned above, the DriverState-Device suffers from a similar precision-leak as the EIDM does.

I tracked the issue down to the log-function call in randNorm. Summary from my understanding:

the processors often use extended double values between calculations
it then sometimes happens that on different systems the intermediate values are not perfectly the same
the log-function then outputs slightly different values (after 16 decimals)
This is often not a problem, but for the random walks, each error influences all future values
To make sure both systems behave the same, I "forced" the log-function output to the double precision of 15 decimals and now get the same fcd-output (precision 6) for each system with the following small patch:
log_patch.txt

This solution is not really elegant and may still drift away after some time, but works as intended for the above examples (circle example with the DriverState Krauss or the EIDM).

namdre · 2021-09-06T22:25:50Z

As far as my understanding of floating point rounding goes, there are many cases where the given approach fails and may actually increase the deviation between platforms.
Consider the case where one implementation returns the binary equivalent of 1.0 and the other 0.999...9 If these numbers are rounded by discarding some digits (which is approximately what your multiply-intcast-divide does), then the difference between values increases.

namdre · 2021-09-07T05:41:59Z

On further thought, my example is maybe "proving too much"(namely, that rounding generally doesn't work). Every rounding algorithm has edge cases where nearby values that fall on different sides of a threshold are rounded away from each other. Nevertheless, there are more cases where rounding serves to reduce the difference (as you yourself have observed).

Domsall · 2021-09-07T06:41:17Z

I compared the values from the randNorm-function and could see that "q" is always the same on each system and between 0 and 1.
But the output of the log-function is sometimes different (probably a "bit shift" to the next higher/lower double value). Nonetheless, the double precision float format ensures 15 significant decimal digits precision. So if I round to that decimal, both values should still be the same. I couldn't find any other idea when searching the internet.

behrisch · 2021-09-07T07:03:20Z

I agree that rounding does not solve it but it may be good enough. I would just prefer to do the rounding on the final result instead of just the log. This would also mask out deviations stemming from different sqrt behavior. And maybe there is a more efficient way of zeroing out the last bits of the mantissa (https://stackoverflow.com/a/5672983/5731587)

namdre · 2021-09-07T07:21:43Z

I'm pretty sure that sqrt behaves the same on all platforms or we'd have noticed (though I don't have a hard source for this).
Curiously, lots of input on this topic comes from game developers that try to optimize their network code: https://gafferongames.com/post/floating_point_determinism/

Domsall · 2021-09-07T07:31:31Z

Indeed, recasting and changing bits instead of using a "round"-function would work a lot better. But I also agree that sqrt should not be a problem.

behrisch · 2021-09-08T13:14:52Z

I'm pretty sure that sqrt behaves the same on all platforms or we'd have noticed (though I don't have a hard source for this).
Curiously, lots of input on this topic comes from game developers that try to optimize their network code: https://gafferongames.com/post/floating_point_determinism/

But this thread looks like we could also get near it with some compiler options and/or using _controlfp ?

behrisch · 2021-10-29T09:30:55Z

_controlfp is probably not the way to go. If we want to use it we need to enable /fp:strict which changes several test results even if I only run the netgen tests. And if I enable it it does at least not solve #8973.

Domsall · 2021-11-25T13:41:11Z

I dug a bit deeper and it all comes down to the log-function. I unfortunately could not find a good solution and hope I understood everything correctly.
Here are my findings:

The algorithms to run the log-calculation are compiler and hardware dependant. Changing options via _controlfp and /fp:strict (for windows) or fesetround and -frounding-math/-ffloat-store/-fexcess-precision=style (for linux) did not change anything for me.
Like your link above, there exist many different ways to cope with this problem.
One solution is to use a platform independent math library (see https://stackoverflow.com/questions/1129032/platform-independent-math-library), what you are already kind of doing (for example with your own RandomNumber-functions). But that slows things down.

The resulting difference in the log-calculation happens approx. every 100th call and consists of a rounding error (+- 1bit).

To give this issue a bump: What are you preferring going forward?

namdre · 2021-11-25T15:18:10Z

I think most users do not need to replicate the same simulation on different machines. The annoyance comes mostly from the developer side when trying to reproduce user examples. I would just document the platform-dependency of EIDM including the fact that it comes from log.

behrisch · 2021-11-26T17:02:48Z

But if you have a patch ready which solves it on the platforms you tested fell free to submit a PR

Domsall · 2021-11-29T08:40:17Z

The workaround I wrote about is here:

As mentioned above, the DriverState-Device suffers from a similar precision-leak as the EIDM does.

I tracked the issue down to the log-function call in randNorm. Summary from my understanding:

the processors often use extended double values between calculations

it then sometimes happens that on different systems the intermediate values are not perfectly the same

the log-function then outputs slightly different values (after 16 decimals)

This is often not a problem, but for the random walks, each error influences all future values

To make sure both systems behave the same, I "forced" the log-function output to the double precision of 15 decimals and now get the same fcd-output (precision 6) for each system with the following small patch:
log_patch.txt

This solution is not really elegant and may still drift away after some time, but works as intended for the above examples (circle example with the DriverState Krauss or the EIDM).

I also tried a bit-shift approach, but this did not work as intended. The log()-function only varies by +- 1 bit (rounding of the internal log algorithm), so if I make sure I cut off the number representation of this bit, both results are the same.

Like I stated above, it is not a great method, but I could not find any better one (except by adding a C-software-based log()-function). So if someone absolutely needs the platform independency, they can use this approach.

behrisch · 2021-11-30T07:28:38Z

I just applied the patch with a small adaption. Please recheck whether it still works with your setup.

Domsall · 2021-12-06T15:11:12Z

First of all, I am sorry for the erroneous patch. I just recognized that back then after testing I added an int32 instead of an int64... Secondly, I must admit that the workaround does not fully solve the problem, it just slows down the drift. But you probably already know that.

Your patch works on my side. For information: I am approx. getting 1 dissimilar log-return value (between the platforms) per 50.000 calls now. Previously it approx. was 1 dissimilar value per 100/1000 calls.

The different return values every 50.000 call stem somewhat from the rounding issue, already posted by @namdre:

As far as my understanding of floating point rounding goes, there are many cases where the given approach fails and may actually increase the deviation between platforms. Consider the case where one implementation returns the binary equivalent of 1.0 and the other 0.999...9 If these numbers are rounded by discarding some digits (which is approximately what your multiply-intcast-divide does), then the difference between values increases.

One example is:

One platform outputs "-0.136581060337999993237190210493"
The other platform outputs "-0.136581060337999965481614594864"
When we multiply those values by 1e12/1e13/etc., the last digits of the first value get rounded to 80 and the digits of the second value to 79.

So after some time the results of this solution will drift. That is why I would call it a "workaround", but not a solution.

From my view, the only "real" solution would be to add a platform independent math library.

namdre · 2022-01-06T11:50:39Z

maye https://www.swmath.org/software/12390 (though this is LGPL).
Anyway, I'd rate this as a low priority now

behrisch · 2022-01-06T12:12:34Z

maye https://www.swmath.org/software/12390 (though this is LGPL).

and it does not look very well maintained.

namdre added a:sumo c:carfollow bug labels Aug 6, 2021

namdre added a commit that referenced this issue Aug 6, 2021

tagging known bug refs #8921

3c6040a

behrisch assigned Domsall Aug 10, 2021

behrisch added this to the 1.11.0 milestone Aug 10, 2021

Domsall mentioned this issue Oct 27, 2021

review failing ToC tests #8973

Closed

Domsall mentioned this issue Nov 26, 2021

Add an EIDM doc-page #9670

Merged

behrisch added a commit that referenced this issue Nov 30, 2021

rounded log for platform stability #8921

665df96

namdre added a commit that referenced this issue Nov 30, 2021

patching expected results refs #21, #8921

73fbe3b

namdre added a commit that referenced this issue Nov 30, 2021

patching expected results refs #21, #8921

9c94ccc

namdre added a commit that referenced this issue Dec 1, 2021

patching expected results refs #21, #8921

1f1b9c8

namdre modified the milestones: 1.11.0, 2.0.0 Jan 6, 2022

kkusic mentioned this issue Jun 17, 2022

EIDM model: Vehicle performs emergency stop because there is no connection to the next edge #10954

Closed

namdre mentioned this issue May 31, 2023

Variation in output values using sumo-gui, libsumo and eclipse-sumo #13339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EIDM has platform-dependent behavior #8921

EIDM has platform-dependent behavior #8921

namdre commented Aug 6, 2021

Domsall commented Aug 6, 2021

namdre commented Aug 6, 2021

Domsall commented Aug 30, 2021

Domsall commented Sep 1, 2021

namdre commented Sep 1, 2021

namdre commented Sep 1, 2021

Domsall commented Sep 6, 2021

namdre commented Sep 6, 2021

namdre commented Sep 7, 2021

Domsall commented Sep 7, 2021

behrisch commented Sep 7, 2021

namdre commented Sep 7, 2021

Domsall commented Sep 7, 2021 •

edited

behrisch commented Sep 8, 2021

behrisch commented Oct 29, 2021

Domsall commented Nov 25, 2021

namdre commented Nov 25, 2021

behrisch commented Nov 26, 2021

Domsall commented Nov 29, 2021

behrisch commented Nov 30, 2021

Domsall commented Dec 6, 2021 •

edited

namdre commented Jan 6, 2022

behrisch commented Jan 6, 2022

EIDM has platform-dependent behavior #8921

EIDM has platform-dependent behavior #8921

Comments

namdre commented Aug 6, 2021

Domsall commented Aug 6, 2021

namdre commented Aug 6, 2021

Domsall commented Aug 30, 2021

Domsall commented Sep 1, 2021

namdre commented Sep 1, 2021

namdre commented Sep 1, 2021

Domsall commented Sep 6, 2021

namdre commented Sep 6, 2021

namdre commented Sep 7, 2021

Domsall commented Sep 7, 2021

behrisch commented Sep 7, 2021

namdre commented Sep 7, 2021

Domsall commented Sep 7, 2021 • edited

behrisch commented Sep 8, 2021

behrisch commented Oct 29, 2021

Domsall commented Nov 25, 2021

namdre commented Nov 25, 2021

behrisch commented Nov 26, 2021

Domsall commented Nov 29, 2021

behrisch commented Nov 30, 2021

Domsall commented Dec 6, 2021 • edited

namdre commented Jan 6, 2022

behrisch commented Jan 6, 2022

Domsall commented Sep 7, 2021 •

edited

Domsall commented Dec 6, 2021 •

edited