[Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" #55707

Tom-Zheng · 2023-07-26T02:07:04Z

bug描述 Describe the Bug

The CPU kernel of triangular_solve breaks on Intel(R) Xeon(R) Silver 4314 CPU. This will cause test_triangular_solve_op failure.

test_lu_op and test_qr_op are also affected because they rely on triangular_solve.

Paddle version: release/2.5

Error info:

test_triangular_solve_op failed
 ..FFFFF.F.FFF.F......
======================================================================
FAIL: test_check_grad_normal (test_triangular_solve_op.TestTriangularSolveOp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/paddle/paddle/build/test/legacy_test/test_triangular_solve_op.py", line 70, in test_check_grad_normal
    self.check_grad(['X', 'Y'], 'Out', check_cinn=True)
  File "/opt/paddle/paddle/build/test/legacy_test/eager_op_test.py", line 2416, in check_grad
    self.check_grad_with_place(
  File "/opt/paddle/paddle/build/test/legacy_test/eager_op_test.py", line 2617, in check_grad_with_place
    self._assert_is_close(
  File "/opt/paddle/paddle/build/test/legacy_test/eager_op_test.py", line 2376, in _assert_is_close
    self.assertLessEqual(max_diff, max_relative_error, err_msg())
AssertionError: 8.356400757533255 not less than or equal to 1e-07 : Operator triangular_solve error, Gradient Check On Place(cpu) variable X (shape: (12, 12), dtype: float64) max gradient diff 8.356401e+00 over l
imit 1.000000e-07, the first error element is 9, expected -3.602969e-01, but got 7.001953e-02.
......(left out)

CPU info:

# lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
    CPU family:          6
    Model:               106
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            6
    CPU max MHz:         3400.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4800.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
                          rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
                         line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fs
                         gsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1
                          xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulq
                         dq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear pconfig flush_l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   768 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    20 MiB (16 instances)
  L3:                    24 MiB (1 instance)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-7,16-23
  NUMA node1 CPU(s):     8-15,24-31
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

YanhuiDua · 2023-07-26T04:07:55Z

你好，你的问题已经收到，分析中

zhwesky2010 · 2023-08-21T04:17:55Z

@Tom-Zheng 你好，请问运行的是CPU版本还是GPU版本的paddle，这个OP单测我们内部运行没问题 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/8988447/job/23602891

Tom-Zheng · 2023-08-21T04:56:06Z

@Tom-Zheng 你好，请问运行的是CPU版本还是GPU版本的paddle，这个OP单测我们内部运行没问题 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/8988447/job/23602891

请看描述, 用"Intel(R) Xeon(R) Silver 4314 CPU"才能复现该问题.

Tom-Zheng · 2023-08-21T04:56:44Z

我们运行的是GPU版的Paddle, 但该UT是CPU failure, 因此CPU版也应该能够复现.

zhwesky2010 · 2023-08-23T03:22:40Z

@Tom-Zheng 我们在内部的多种CPU机型上运行都是可以通过的。

Intel Core i9-9900 CPU:

Intel(R) Xeon(R) CPU：

triangular_solve计算在CPU上，使用的是intel提供的mklml库，可能是该库在这种CPU上有计算问题？

所以可以测一下openblas版本的paddle，是否有同样问题：

如果你想安装avx、openblas的 Paddle 包，可以通过以下命令将 wheel 包下载到本地，再使用python -m pip install [name].whl本地安装（[name]为 wheel 包名称）：

python -m pip download paddlepaddle==2.5.1 -f https://www.paddlepaddle.org.cn/whl/windows/openblas/avx/stable.html --no-index --no-deps

同时确认GPU版本是否有同样问题。如果openblas、GPU都可以运行通过，则可以基本确定是intel mklml库的原因。

Tom-Zheng · 2023-08-31T03:03:36Z

Will come back to this issue in Q4.

Tom-Zheng · 2023-09-06T02:52:13Z

The problem is gone after updating CBLAS from v0.3.18 to v0.3.24.

Tom-Zheng added status/new-issue 新建 type/bug-report 报bug NVIDIA labels Jul 26, 2023

YanhuiDua assigned zhwesky2010 Jul 26, 2023

Tom-Zheng changed the title ~~test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU"~~ [Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" Aug 31, 2023

Tom-Zheng changed the title ~~[Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU"~~ [Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" Aug 31, 2023

Tom-Zheng closed this as completed Sep 7, 2023

paddle-bot bot added status/close 已关闭 and removed status/new-issue 新建 labels Sep 7, 2023

megemini mentioned this issue Dec 12, 2023

【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel #59847

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" #55707

[Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" #55707

Tom-Zheng commented Jul 26, 2023 •

edited

YanhuiDua commented Jul 26, 2023

zhwesky2010 commented Aug 21, 2023 •

edited

Tom-Zheng commented Aug 21, 2023

Tom-Zheng commented Aug 21, 2023

zhwesky2010 commented Aug 23, 2023 •

edited

Tom-Zheng commented Aug 31, 2023

Tom-Zheng commented Sep 6, 2023

[Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" #55707

[Targeting Q4] test_triangular_solve_op fails on "Intel(R) Xeon(R) Silver 4314 CPU" #55707

Comments

Tom-Zheng commented Jul 26, 2023 • edited

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

YanhuiDua commented Jul 26, 2023

zhwesky2010 commented Aug 21, 2023 • edited

Tom-Zheng commented Aug 21, 2023

Tom-Zheng commented Aug 21, 2023

zhwesky2010 commented Aug 23, 2023 • edited

Tom-Zheng commented Aug 31, 2023

Tom-Zheng commented Sep 6, 2023

Tom-Zheng commented Jul 26, 2023 •

edited

zhwesky2010 commented Aug 21, 2023 •

edited

zhwesky2010 commented Aug 23, 2023 •

edited