-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD is broken (Int32x4 and Float32x4) #53662
Comments
I notice that the first three rounds have a different checksum on JIT too, suggesting that the code is susceptible to differences in rounding of intermediate values. That's a little worrisome, since the code appears like it should be deterministic. Every operation is a SIMD 32-bit floating point addition or multiplication, or a comparison. Getting different results anyway suggests either not using the same SIMD operations before and after JIT optimization, having different rounding behavior, not using SIMD before optimizing at all, or just something being bugged. I get the same slowdown on Windows. For JIT run, I get:
It's stable from there. When compiled with
stable all the way through, compiled wioth
That's a nice 64 times slowdown, and getting the same sum as you. |
Here're a SO thread, a few guys checked assembly and told it is horrible (check comments): https://stackoverflow.com/questions/77201568/dart-simd-extensions-int32x4-float32x4-going-crazy-slow-in-aot-different-re Regarding the the correctness of the results, what is not OK is the result on Intel, it is way to far from what I would expect (10%). JIC, I've tested same problem with 8 different languages (22 configs) they all differ slightly in precision and check sum: https://github.com/maxim-saplin/mandelbrot/tree/main And performance in AoT is another problem, relevant for both Intel and Apple Silicon. |
The difference in sum is around ~1M, which is on the order of one iteration per pixel. That is a very big difference, bigger than what (I think) can reasonably be explained by rounding errors. Since the result changes when the JIT optimizes, it seems the value problem could be the optimization.
And ... the bug seems to be in Int32x4 breakCondition = sum.lessThan(escapeThreshold); // (escapeThreshold).greaterThan(sum); then the result becomes 78513692 on JIT as well. It seems to be NaN-related. If I look at the produced values, it turns out that some of the iterations end up with a value of NaN in the comparison, also in the "working" versions. It seems The swapped (Our |
…give the expected result for NaNs. TEST=lib/typed_data/float32x4_compare_test Bug: #53662 Change-Id: Ia5e1e5088fde84c60d30e0a1d50d1d2d3b50f2f0 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/328768 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
The correctness issue has been fixed, the performance issue is still open. |
Hi everyone, Are there any updates on this issue? I'm experiencing the same problem: my library, which heavily utilizes SIMD, is extremely slow in AoT mode. |
STEPS:
dart mandelbrot.dart
and remember the results (execution time and check sum)dart compile exe mandelbrot.dart
, run and remember the resultEXPECTED:
ACTUAL:
ENVIRONMENTS:
The text was updated successfully, but these errors were encountered: