New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Three subtests of small_blas_test fail in C++17/20 mode in VS2022 #782
Comments
Played around with this a bit more. The same tests fail too in C++17 mode. And they pass in C++14 mode. |
This is weird and serious, I will take a look. |
Hmm odd: I just tried to run the tests in x64-Debug mode (C++17), and they pass 😕. Switching back to x64-Release (which is RelWithDebInfo), and they fail again. |
Here's the full compiler command-line calls for both configurations (from Release:
Debug:
I don't see anything out of the ordinary. However I don't see any /fp: mode set - I saw earlier today you asked this on the mailing list Sameer (just got back home today). Perhaps this could have something to do with it? |
I suspect this is something in Eigen, but I need to check to see if I can replicate this either on Linux-86 or arm macros. |
It is unlikely an Eigen issue because disabling |
Does it also fail in pure release mode?
…On Sun, Mar 27, 2022, 7:27 AM Sergiu Deitsch ***@***.***> wrote:
It is unlikely an Eigen issue because disabling CUSTOM_BLAS let's the
small_blas_test pass. I was not able to reproduce the problem on a Linux
system.
—
Reply to this email directly, view it on GitHub
<#782 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANCABINMDSEE3LD4DRAV4TVCB47JANCNFSM5RX5EXWQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sorry, my previous observation was incorrect: CMake dropped my Pure release mode is also affected. |
The custom_blas setting does not affect the test. It only determines
whether the the code in smallblas is used or not in the library.
…On Sun, Mar 27, 2022, 7:49 AM Sergiu Deitsch ***@***.***> wrote:
Sorry, my previous observation was incorrect: CMake dropped my
CMAKE_CXX_STANDARD setting while switching between configurations.
small_blas_test does fail with CUSTOM_BLAS disabled as well.
Pure release mode is also affected.
—
Reply to this email directly, view it on GitHub
<#782 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANCABKGDIECZUBJYUGYNGLVCB7OXANCNFSM5RX5EXWQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I see, thanks. After further investigation, the culprit seems to be diff --git a/internal/ceres/small_blas_generic.h b/internal/ceres/small_blas_generic.h
index 3f3ea424..e3e13fa6 100644
--- a/internal/ceres/small_blas_generic.h
+++ b/internal/ceres/small_blas_generic.h
@@ -167,10 +167,11 @@ static inline void MTM_mat1x4(const int col_a,
#define CERES_GEMM_OPT_MTM_MAT1X4_MUL \
av = pa[ai]; \
pb = b + bi; \
- c0 += av * *pb++; \
- c1 += av * *pb++; \
- c2 += av * *pb++; \
- c3 += av * *pb++; \
+ c0 += av * pb[0]; \
+ c1 += av * pb[1]; \
+ c2 += av * pb[2]; \
+ c3 += av * pb[3]; \
+ pb += 4; \
ai += col_stride_a; \
bi += col_stride_b;
allows the tests to pass. The optimizer seems to be doing some wild things (probably rearranging the expressions) which causes the partially incorrect results. I tested in |
wow, nice debugging @sergiud. |
The same pattern occurs in CERES_GEMM_OPT_MMM_MAT1X4_MUL, should we consider changing that too? |
I think we should consider changing it everywhere in the header. If I distilled the problem correctly, Compiler Explorer confirms the reordering. |
(The return sum should not matter.) |
okay please send me a CL. Thank you for looking into this. |
The patch fixes it for me! I've tried a few combinations among C++17, 20, RelWithDebInfo, Release, and with Very impressive debugging indeed. May I ask how you've managed to narrow it down to that macro? In RelWithDebInfo mode, the debugger would just jump over all the lines for me basically, and in Debug mode, the problem didn't occur. Out of curiosity - would this not be an optimiser bug, or is it allowed to do that reordering? |
The tests fail only in C++17 mode (and above) with optimizations enabled. Fixes #782 Change-Id: Ia3b7221efdd9091d252a7323613b7e54794470ee
This is definitely an optimizer bug. The reordering should not happen.
…On Sun, Mar 27, 2022, 10:28 AM Patrik Huber ***@***.***> wrote:
Closed #782 <#782>.
—
Reply to this email directly, view it on GitHub
<#782 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANCABLMMTGVXYPUGKMNOK3VCCSDFANCNFSM5RX5EXWQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ok, thanks - I'll file a bug report in VS then! |
I agree that this is likely an optimizer bug. In general, the compiler is allowed to reorder operations as long there are no side effects. However, in this particular case reordering changes behavior.
I suspected vectorization as a possible cause since only certain parts of the matrix were affected. After seeing the unrolled loops being used, things went fast. After all, it is not the first time in my experience MSVC reorders code at will. |
I see! Thanks a lot for sharing! |
Hi,
With
CMAKE_CXX_STANDARD=20
on VS 2022, the three sub-testsBLAS.MatrixTransposeMatrixMultiply_9_9_9_Dynamic
,BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9
andBLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9_Dynamic
fail (all others pass). I'm using Eigen 3.4.0.Here's partial log output:
I was going to upload the full
LastTest.log
, but it's 1 GB (due to so much console output) - and I suspect it won't be needed./CC @sergiud
The text was updated successfully, but these errors were encountered: