Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three subtests of small_blas_test fail in C++17/20 mode in VS2022 #782

Closed
patrikhuber opened this issue Mar 27, 2022 · 20 comments
Closed

Three subtests of small_blas_test fail in C++17/20 mode in VS2022 #782

patrikhuber opened this issue Mar 27, 2022 · 20 comments
Assignees

Comments

@patrikhuber
Copy link
Contributor

Hi,

With CMAKE_CXX_STANDARD=20 on VS 2022, the three sub-tests BLAS.MatrixTransposeMatrixMultiply_9_9_9_Dynamic, BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9 and BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9_Dynamic fail (all others pass). I'm using Eigen 3.4.0.

Here's partial log output:

D:\ceres\ceres-solver\out\build\x64-Release> ctest . --rerun-failed -V
UpdateCTestConfiguration  from :D:/ceres/ceres-solver/out/build/x64-Release/DartConfiguration.tcl
UpdateCTestConfiguration  from :D:/ceres/ceres-solver/out/build/x64-Release/DartConfiguration.tcl
Test project D:/ceres/ceres-solver/out/build/x64-Release
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 85
    Start 85: small_blas_test

85: Test command: D:\ceres\ceres-solver\out\build\x64-Release\bin\small_blas_test.exe "--test_srcdir" "D:/ceres/ceres-solver/data"
85: Test timeout computed to be: 10000000
85: Running main() from gmock_main.cc
85: [==========] Running 26 tests from 1 test suite.
85: [----------] Global test environment set-up.
85: [----------] 26 tests from BLAS
85: [ RUN      ] BLAS.MatrixMatrixMultiply_5_3_7
85: [       OK ] BLAS.MatrixMatrixMultiply_5_3_7 (2 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiply_5_3_7_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiply_5_3_7_Dynamic (2 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiply_1_1_1
85: [       OK ] BLAS.MatrixMatrixMultiply_1_1_1 (0 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiply_1_1_1_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiply_1_1_1_Dynamic (0 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiply_9_9_9
85: [       OK ] BLAS.MatrixMatrixMultiply_9_9_9 (43 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiply_9_9_9_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiply_9_9_9_Dynamic (45 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_5_3_7
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_5_3_7 (2 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_5_3_7_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_5_3_7_Dynamic (3 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_1_1_1
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_1_1_1 (0 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_1_1_1_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_1_1_1_Dynamic (0 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_9_9_9
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_9_9_9 (44 ms)
85: [ RUN      ] BLAS.MatrixMatrixMultiplyNaive_9_9_9_Dynamic
85: [       OK ] BLAS.MatrixMatrixMultiplyNaive_9_9_9_Dynamic (44 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_5_3_7
85: [       OK ] BLAS.MatrixTransposeMatrixMultiply_5_3_7 (0 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_5_3_7_Dynamic
85: [       OK ] BLAS.MatrixTransposeMatrixMultiply_5_3_7_Dynamic (0 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_1_1_1
85: [       OK ] BLAS.MatrixTransposeMatrixMultiply_1_1_1 (0 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_1_1_1_Dynamic
85: [       OK ] BLAS.MatrixTransposeMatrixMultiply_1_1_1_Dynamic (0 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_9_9_9
85: [       OK ] BLAS.MatrixTransposeMatrixMultiply_9_9_9 (40 ms)
85: [ RUN      ] BLAS.MatrixTransposeMatrixMultiply_9_9_9_Dynamic
85: D:\ceres\ceres-solver\internal\ceres\small_blas_test.cc(199): error: The difference between (C_plus_ref - C_plus).norm() and 0.0 is 1251.7156226555614, which exceeds kTolerance, where
85: (C_plus_ref - C_plus).norm() evaluates to 1251.7156226555614,
85: 0.0 evaluates to 0, and
85: kTolerance evaluates to 1.1102230246251565e-15.
85: C += A' * B
85: row_stride_c : 10
85: col_stride_c : 10
85: start_row_c  : 0
85: start_col_c  : 0
85: Cref :
85:  286  331  376  421  466  511  556  601  646    1
85:  331  385  439  493  547  601  655  709  763    1
85:  376  439  502  565  628  691  754  817  880    1
85:  421  493  565  637  709  781  853  925  997    1
85:  466  547  628  709  790  871  952 1033 1114    1
85:  511  601  691  781  871  961 1051 1141 1231    1
85:  556  655  754  853  952 1051 1150 1249 1348    1
85:  601  709  817  925 1033 1141 1249 1357 1465    1
85:  646  763  880  997 1114 1231 1348 1465 1582    1
85:    1    1    1    1    1    1    1    1    1    1
85: C:
85:  286  331  326  361  466  511  466  501  646    1
85:  331  385  371  411  547  601  531  571  763    1
85:  376  439  416  461  628  691  596  641  880    1
85:  421  493  461  511  709  781  661  711  997    1
85:  466  547  506  561  790  871  726  781 1114    1
85:  511  601  551  611  871  961  791  851 1231    1
85:  556  655  596  661  952 1051  856  921 1348    1
85:  601  709  641  711 1033 1141  921  991 1465    1
85:  646  763  686  761 1114 1231  986 1061 1582    1
85:    1    1    1    1    1    1    1    1    1    1
85: D:\ceres\ceres-solver\internal\ceres\small_blas_test.cc(217): error: The difference between (C_minus_ref - C_minus).norm() and 0.0 is 1251.7156226555614, which exceeds kTolerance, where
85: (C_minus_ref - C_minus).norm() evaluates to 1251.7156226555614,
85: 0.0 evaluates to 0, and
85: kTolerance evaluates to 1.1102230246251565e-15.
85: C -= A' * B
85: row_stride_c : 10
85: col_stride_c : 10
85: start_row_c  : 0
85: start_col_c  : 0
85: Cref :
85:  -284  -329  -374  -419  -464  -509  -554  -599  -644     1
85:  -329  -383  -437  -491  -545  -599  -653  -707  -761     1
85:  -374  -437  -500  -563  -626  -689  -752  -815  -878     1
85:  -419  -491  -563  -635  -707  -779  -851  -923  -995     1
85:  -464  -545  -626  -707  -788  -869  -950 -1031 -1112     1
85:  -509  -599  -689  -779  -869  -959 -1049 -1139 -1229     1
85:  -554  -653  -752  -851  -950 -1049 -1148 -1247 -1346     1
85:  -599  -707  -815  -923 -1031 -1139 -1247 -1355 -1463     1
85:  -644  -761  -878  -995 -1112 -1229 -1346 -1463 -1580     1
85:     1     1     1     1     1     1     1     1     1     1
85: C:
85:  -284  -329  -324  -359  -464  -509  -464  -499  -644     1
85:  -329  -383  -369  -409  -545  -599  -529  -569  -761     1
85:  -374  -437  -414  -459  -626  -689  -594  -639  -878     1
85:  -419  -491  -459  -509  -707  -779  -659  -709  -995     1
85:  -464  -545  -504  -559  -788  -869  -724  -779 -1112     1
85:  -509  -599  -549  -609  -869  -959  -789  -849 -1229     1
85:  -554  -653  -594  -659  -950 -1049  -854  -919 -1346     1
85:  -599  -707  -639  -709 -1031 -1139  -919  -989 -1463     1
85:  -644  -761  -684  -759 -1112 -1229  -984 -1059 -1580     1
85:     1     1     1     1     1     1     1     1     1     1
85: D:\ceres\ceres-solver\internal\ceres\small_blas_test.cc(235): error: The difference between (C_assign_ref - C_assign).norm() and 0.0 is 1251.7156226555614, which exceeds kTolerance, where
85: (C_assign_ref - C_assign).norm() evaluates to 1251.7156226555614,
85: 0.0 evaluates to 0, and
85: kTolerance evaluates to 1.1102230246251565e-15.
85: C = A' * B
85: row_stride_c : 10
85: col_stride_c : 10
85: start_row_c  : 0
85: start_col_c  : 0
85: Cref :
85:  285  330  375  420  465  510  555  600  645    1
85:  330  384  438  492  546  600  654  708  762    1
85:  375  438  501  564  627  690  753  816  879    1
85:  420  492  564  636  708  780  852  924  996    1
85:  465  546  627  708  789  870  951 1032 1113    1
85:  510  600  690  780  870  960 1050 1140 1230    1
85:  555  654  753  852  951 1050 1149 1248 1347    1
85:  600  708  816  924 1032 1140 1248 1356 1464    1
85:  645  762  879  996 1113 1230 1347 1464 1581    1
85:    1    1    1    1    1    1    1    1    1    1
85: C:
85:  285  330  325  360  465  510  465  500  645    1
85:  330  384  370  410  546  600  530  570  762    1
85:  375  438  415  460  627  690  595  640  879    1
85:  420  492  460  510  708  780  660  710  996    1
85:  465  546  505  560  789  870  725  780 1113    1
85:  510  600  550  610  870  960  790  850 1230    1
85:  555  654  595  660  951 1050  855  920 1347    1
85:  600  708  640  710 1032 1140  920  990 1464    1
85:  645  762  685  760 1113 1230  985 1060 1581    1
85:    1    1    1    1    1    1    1    1    1    1
85: D:\ceres\ceres-solver\internal\ceres\small_blas_test.cc(199): error: The difference between (C_plus_ref - C_plus).norm() and 0.0 is 1251.7156226555614, which exceeds kTolerance, where
85: (C_plus_ref - C_plus).norm() evaluates to 1251.7156226555614,
85: 0.0 evaluates to 0, and
85: kTolerance evaluates to 1.1102230246251565e-15.
85: C += A' * B
85: row_stride_c : 10
85: col_stride_c : 11
85: start_row_c  : 0
85: start_col_c  : 0
85: Cref :
85:  286  331  376  421  466  511  556  601  646    1    1
85:  331  385  439  493  547  601  655  709  763    1    1
85:  376  439  502  565  628  691  754  817  880    1    1
85:  421  493  565  637  709  781  853  925  997    1    1
85:  466  547  628  709  790  871  952 1033 1114    1    1
85:  511  601  691  781  871  961 1051 1141 1231    1    1
85:  556  655  754  853  952 1051 1150 1249 1348    1    1
85:  601  709  817  925 1033 1141 1249 1357 1465    1    1
85:  646  763  880  997 1114 1231 1348 1465 1582    1    1
85:    1    1    1    1    1    1    1    1    1    1    1
85: C:
85:  286  331  326  361  466  511  466  501  646    1    1
85:  331  385  371  411  547  601  531  571  763    1    1
85:  376  439  416  461  628  691  596  641  880    1    1
85:  421  493  461  511  709  781  661  711  997    1    1
85:  466  547  506  561  790  871  726  781 1114    1    1
85:  511  601  551  611  871  961  791  851 1231    1    1
85:  556  655  596  661  952 1051  856  921 1348    1    1
85:  601  709  641  711 1033 1141  921  991 1465    1    1
85:  646  763  686  761 1114 1231  986 1061 1582    1    1
85:    1    1    1    1    1    1    1    1    1    1    1
85: D:\ceres\ceres-solver\internal\ceres\small_blas_test.cc(217): error: The difference between (C_minus_ref - C_minus).norm() and 0.0 is 1251.7156226555614, which exceeds kTolerance, where
85: (C_minus_ref - C_minus).norm() evaluates to 1251.7156226555614,
85: 0.0 evaluates to 0, and
85: kTolerance evaluates to 1.1102230246251565e-15.
85: C -= A' * B
85: row_stride_c : 10
85: col_stride_c : 11
85: start_row_c  : 0
85: start_col_c  : 0
85: Cref :
85:  -284  -329  -374  -419  -464  -509  -554  -599  -644     1     1
85:  -329  -383  -437  -491  -545  -599  -653  -707  -761     1     1
85:  -374  -437  -500  -563  -626  -689  -752  -815  -878     1     1
85:  -419  -491  -563  -635  -707  -779  -851  -923  -995     1     1
85:  -464  -545  -626  -707  -788  -869  -950 -1031 -1112     1     1
85:  -509  -599  -689  -779  -869  -959 -1049 -1139 -1229     1     1
85:  -554  -653  -752  -851  -950 -1049 -1148 -1247 -1346     1     1
85:  -599  -707  -815  -923 -1031 -1139 -1247 -1355 -1463     1     1
85:  -644  -761  -878  -995 -1112 -1229 -1346 -1463 -1580     1     1
85:     1     1     1     1     1     1     1     1     1     1     1
85: C:
85:  -284  -329  -324  -359  -464  -509  -464  -499  -644     1     1
85:  -329  -383  -369  -409  -545  -599  -529  -569  -761     1     1
85:  -374  -437  -414  -459  -626  -689  -594  -639  -878     1     1
85:  -419  -491  -459  -509  -707  -779  -659  -709  -995     1     1
85:  -464  -545  -504  -559  -788  -869  -724  -779 -1112     1     1
85:  -509  -599  -549  -609  -869  -959  -789  -849 -1229     1     1
85:  -554  -653  -594  -659  -950 -1049  -854  -919 -1346     1     1
85:  -599  -707  -639  -709 -1031 -1139  -919  -989 -1463     1     1
85:  -644  -761  -684  -759 -1112 -1229  -984 -1059 -1580     1     1
85:     1     1     1     1     1     1     1     1     1     1     1

[..... much more of this .....]

85: [  FAILED  ] BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9_Dynamic (136791 ms)
85: [ RUN      ] BLAS.MatrixVectorMultiply
85: [       OK ] BLAS.MatrixVectorMultiply (0 ms)
85: [ RUN      ] BLAS.MatrixTransposeVectorMultiply
85: [       OK ] BLAS.MatrixTransposeVectorMultiply (0 ms)
85: [----------] 26 tests from BLAS (411930 ms total)
85:
85: [----------] Global test environment tear-down
85: [==========] 26 tests from 1 test suite ran. (411930 ms total)
85: [  PASSED  ] 23 tests.
85: [  FAILED  ] 3 tests, listed below:
85: [  FAILED  ] BLAS.MatrixTransposeMatrixMultiply_9_9_9_Dynamic
85: [  FAILED  ] BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9
85: [  FAILED  ] BLAS.MatrixTransposeMatrixMultiplyNaive_9_9_9_Dynamic
85:
85:  3 FAILED TESTS
1/1 Test #85: small_blas_test ..................***Failed  412.24 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 413.93 sec

The following tests FAILED:
         85 - small_blas_test (Failed)
Errors while running CTest
Output from these tests are in: D:/ceres/ceres-solver/out/build/x64-Release/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

I was going to upload the full LastTest.log, but it's 1 GB (due to so much console output) - and I suspect it won't be needed.

/CC @sergiud

@patrikhuber patrikhuber changed the title Three subtests of small_blas_test fails in C++20 mode in VS2022 Three subtests of small_blas_test fail in C++20 mode in VS2022 Mar 27, 2022
@patrikhuber
Copy link
Contributor Author

Played around with this a bit more. The same tests fail too in C++17 mode. And they pass in C++14 mode.

@patrikhuber patrikhuber changed the title Three subtests of small_blas_test fail in C++20 mode in VS2022 Three subtests of small_blas_test fail in C++17/20 mode in VS2022 Mar 27, 2022
@sandwichmaker
Copy link
Contributor

This is weird and serious, I will take a look.

@patrikhuber
Copy link
Contributor Author

Hmm odd: I just tried to run the tests in x64-Debug mode (C++17), and they pass 😕. Switching back to x64-Release (which is RelWithDebInfo), and they fail again.

@patrikhuber
Copy link
Contributor Author

Here's the full compiler command-line calls for both configurations (from compile_commands.json):

Release:

{
  "directory": "D:/ceres/ceres-solver/out/build/x64-Release",
  "command": "C:\\PROGRA~1\\MIB055~1\\2022\\COMMUN~1\\VC\\Tools\\MSVC\\1431~1.311\\bin\\Hostx64\\x64\\cl.exe  /nologo /TP -DGFLAGS_IS_A_DLL=1 -DGLOG_NO_ABBREVIATED_SEVERITIES -DGOOGLE_GLOG_DLL_DECL=__declspec(dllimport) -DGOOGLE_GLOG_DLL_DECL_FOR_UNITTESTS=__declspec(dllimport) -DNOMINMAX -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -ID:\\ceres\\ceres-solver\\internal\\ceres -I\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\include\" -ID:\\ceres\\ceres-solver\\internal -ID:\\ceres\\ceres-solver\\include -ID:\\ceres\\ceres-solver\\out\\build\\x64-Release\\include -external:I D:\\vcpkg\\installed\\x64-windows\\include -external:I D:\\vcpkg\\installed\\x64-windows\\include\\eigen3 -external:W0 /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /Zi /O2 /Ob1 /DNDEBUG /wd4018 /wd4267 /wd4099 /wd4996 /wd4800 /wd4244 /wd4251 /bigobj -std:c++17 /Fointernal\\ceres\\CMakeFiles\\small_blas_test.dir\\small_blas_test.cc.obj /FdTARGET_COMPILE_PDB /FS -c D:\\ceres\\ceres-solver\\internal\\ceres\\small_blas_test.cc",
  "file": "D:\\ceres\\ceres-solver\\internal\\ceres\\small_blas_test.cc"
},

Debug:

{
  "directory": "D:/ceres/ceres-solver/out/build/x64-Debug",
  "command": "C:\\PROGRA~1\\MIB055~1\\2022\\COMMUN~1\\VC\\Tools\\MSVC\\1431~1.311\\bin\\Hostx64\\x64\\cl.exe  /nologo /TP -DGFLAGS_IS_A_DLL=1 -DGLOG_NO_ABBREVIATED_SEVERITIES -DGOOGLE_GLOG_DLL_DECL=__declspec(dllimport) -DGOOGLE_GLOG_DLL_DECL_FOR_UNITTESTS=__declspec(dllimport) -DNOMINMAX -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -ID:\\ceres\\ceres-solver\\internal\\ceres -I\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\include\" -ID:\\ceres\\ceres-solver\\internal -ID:\\ceres\\ceres-solver\\include -ID:\\ceres\\ceres-solver\\out\\build\\x64-Debug\\include -external:I D:\\vcpkg\\installed\\x64-windows\\include -external:I D:\\vcpkg\\installed\\x64-windows\\include\\eigen3 -external:W0 /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MDd /Zi /Ob0 /Od /RTC1 /wd4018 /wd4267 /wd4099 /wd4996 /wd4800 /wd4244 /wd4251 /bigobj -std:c++17 /Fointernal\\ceres\\CMakeFiles\\small_blas_test.dir\\small_blas_test.cc.obj /FdTARGET_COMPILE_PDB /FS -c D:\\ceres\\ceres-solver\\internal\\ceres\\small_blas_test.cc",
  "file": "D:\\ceres\\ceres-solver\\internal\\ceres\\small_blas_test.cc"
},

I don't see anything out of the ordinary. However I don't see any /fp: mode set - I saw earlier today you asked this on the mailing list Sameer (just got back home today). Perhaps this could have something to do with it?

@sandwichmaker
Copy link
Contributor

I suspect this is something in Eigen, but I need to check to see if I can replicate this either on Linux-86 or arm macros.

@sergiud
Copy link
Contributor

sergiud commented Mar 27, 2022

It is unlikely an Eigen issue because disabling CUSTOM_BLAS lets the small_blas_test pass. I was not able to reproduce the problem on a Linux system.

@sandwichmaker
Copy link
Contributor

sandwichmaker commented Mar 27, 2022 via email

@sergiud
Copy link
Contributor

sergiud commented Mar 27, 2022

Sorry, my previous observation was incorrect: CMake dropped my CMAKE_CXX_STANDARD setting while switching between configurations. small_blas_test does fail with CUSTOM_BLAS disabled as well.

Pure release mode is also affected.

@sandwichmaker
Copy link
Contributor

sandwichmaker commented Mar 27, 2022 via email

@sergiud
Copy link
Contributor

sergiud commented Mar 27, 2022

I see, thanks. After further investigation, the culprit seems to be CERES_GEMM_OPT_MTM_MAT1X4_MUL defined by MTM_mat1x4. Redefining the macro as

diff --git a/internal/ceres/small_blas_generic.h b/internal/ceres/small_blas_generic.h
index 3f3ea424..e3e13fa6 100644
--- a/internal/ceres/small_blas_generic.h
+++ b/internal/ceres/small_blas_generic.h
@@ -167,10 +167,11 @@ static inline void MTM_mat1x4(const int col_a,
 #define CERES_GEMM_OPT_MTM_MAT1X4_MUL \
   av = pa[ai];                        \
   pb = b + bi;                        \
-  c0 += av * *pb++;                   \
-  c1 += av * *pb++;                   \
-  c2 += av * *pb++;                   \
-  c3 += av * *pb++;                   \
+  c0 += av * pb[0];                   \
+  c1 += av * pb[1];                   \
+  c2 += av * pb[2];                   \
+  c3 += av * pb[3];                   \
+  pb += 4; \
   ai += col_stride_a;                 \
   bi += col_stride_b;
 

allows the tests to pass. The optimizer seems to be doing some wild things (probably rearranging the expressions) which causes the partially incorrect results.

I tested in RelWithDebInfo mode only though. @patrikhuber could you apply the patch locally and see if this fixes the issues for you?

@sandwichmaker
Copy link
Contributor

wow, nice debugging @sergiud.

@sandwichmaker
Copy link
Contributor

The same pattern occurs in CERES_GEMM_OPT_MMM_MAT1X4_MUL, should we consider changing that too?

@sergiud
Copy link
Contributor

sergiud commented Mar 27, 2022

I think we should consider changing it everywhere in the header. If I distilled the problem correctly, Compiler Explorer confirms the reordering.

@sergiud
Copy link
Contributor

sergiud commented Mar 27, 2022

(The return sum should not matter.)

@sandwichmaker
Copy link
Contributor

okay please send me a CL. Thank you for looking into this.

@patrikhuber
Copy link
Contributor Author

I tested in RelWithDebInfo mode only though. @patrikhuber could you apply the patch locally and see if this fixes the issues for you?

The patch fixes it for me! I've tried a few combinations among C++17, 20, RelWithDebInfo, Release, and with /permissive-, and the tests pass.

Very impressive debugging indeed. May I ask how you've managed to narrow it down to that macro? In RelWithDebInfo mode, the debugger would just jump over all the lines for me basically, and in Debug mode, the problem didn't occur.

Out of curiosity - would this not be an optimiser bug, or is it allowed to do that reordering?

keir pushed a commit that referenced this issue Mar 27, 2022
The tests fail only in C++17 mode (and above) with optimizations enabled.

Fixes #782

Change-Id: Ia3b7221efdd9091d252a7323613b7e54794470ee
@sandwichmaker
Copy link
Contributor

sandwichmaker commented Mar 27, 2022 via email

@patrikhuber
Copy link
Contributor Author

Ok, thanks - I'll file a bug report in VS then!

@sergiud
Copy link
Contributor

sergiud commented Mar 28, 2022

I agree that this is likely an optimizer bug. In general, the compiler is allowed to reorder operations as long there are no side effects. However, in this particular case reordering changes behavior.

May I ask how you've managed to narrow it down to that macro?

I suspected vectorization as a possible cause since only certain parts of the matrix were affected. After seeing the unrolled loops being used, things went fast. After all, it is not the first time in my experience MSVC reorders code at will.

@patrikhuber
Copy link
Contributor Author

I see! Thanks a lot for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants