[pooling] Fix FP16 precision issues (averaging mode) plus more #2118

atamazov · 2023-04-28T21:56:48Z

The main goal if resolving issue #2109. This is done by switching to mixed precision for FP16 average pooling. Other changes include adjustments of verification tolerance for FP32 inference and for FP16 training (to avoid false failures) and observing available amount of scratch memory in order to avoid building of kernels that would fail anyway. Plus lots of refactoring. Details:

[pooling kernels]
- [Forward] 🔴 Fix: Engage mixed-precision in FP16 average pooling
  - Implemented in e939b33
- [Backward] 🟡 Fix: Build only necessary kernels. This allows POOLING_OP_AVE to handle larger configs.
  - Implemented in 75ed210
[pooling solver] 🔴 Fix: Make PoolingForward2d not applicable when scratch limit is exceeded.
- Implemented in b451c21
[tests][pooling] 🟡 Added regression tests for issue [pooling] FP16 precision issue (averaging mode) #2109
[driver][pooling]
- [Backward][FP16] 🟡 Fix: Raised verification tolerance to avoid false failures (1e-6 -> 5e-3)
- [Forward][FP32] 🟡 Fix: Raised verification tolerance to avoid false failures (1e-6 -> 1e-5).
- [NFC][Debugging]
  - Improved verification and printing of its results.
  - Added different initialization modes on input for Forward FP16 pooling.
  - Added emulation of validation errors for Forward pooling.
Other by-products that may be useful in all solvers/kernels
- 🟢 Added MaxWaveScratchSize to TargetProperties
  - Implemented in 1527854
- [NFC][float_types.h] This can be used in any kernel:
  - Support for for building both mixed-precision and "pure" kernels from the single source.
  - Support for uniform initialization of BF16 variables.
  - Both features implemented in 643a8d6

ℹ️ Note for reviewers

All significant commits (which change the behavior of the library OR may be useful) are listed above. I recommend looking at them first. Other commits are just refactorings.

[Attribution] @junliume @johnny-keker

… available. Some refactoring of macros.

…s nominal for all ops except FP16 averaging. Some refactoring of macros.

…mixed-precision and "pure" kernels from the single source. Support uniform initialization of BF16 variables.

…n in FP16 average pooling

… collecting validation stats. Improve message printed when validation fails. Support emulation of validation errors.

…initialization modes on input for FP16 (for debugging). Improve message about verification result (add max error and num_flops).

…sary kernels. This allows POOLING_OP_AVE to handle larger configs.

… and printing of its results. Raised verification tolerance for FP16. Quality fixes. Refactoring.

…ification tolerance for FP32.

…or: Compute some kernel parameters in one place (to reuse in IsApplicable)

…e previous commit

…rties

… not applicable when scratch limit is exceeded.

… for issue ROCm#2109

junliume · 2023-05-01T05:05:03Z

@atamazov OCL tidy has failed at:

[2023-04-30T05:46:34.808Z] /var/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/include/miopen/pooling/solvers.hpp:49:10: warning: function 'miopen::solver::pooling::PoolingForward2d::IsApplicable' has a definition with different parameter names [readability-inconsistent-declaration-parameter-name]

[2023-04-30T05:46:34.808Z]     bool IsApplicable(const ExecutionContext& context,

[2023-04-30T05:46:34.808Z]          ^

[2023-04-30T05:46:34.808Z] /var/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/solver/pooling/forward2d.cpp:134:24: note: the definition seen here

[2023-04-30T05:46:34.808Z] bool PoolingForward2d::IsApplicable(const ExecutionContext& ec,

[2023-04-30T05:46:34.808Z]                        ^

[2023-04-30T05:46:34.808Z] /var/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/include/miopen/pooling/solvers.hpp:49:10: note: differing parameters are named here: ('context'), in definition: ('ec')

[2023-04-30T05:46:34.808Z]     bool IsApplicable(const ExecutionContext& context,

[2023-04-30T05:46:34.808Z]          ^                                    ~~~~~~~

[2023-04-30T05:46:34.808Z]                                               ec

OCL_tidy_log.log

atamazov · 2023-05-01T15:38:10Z

@junliume This should be fixed in dd522f0

junliume · 2023-05-01T17:07:31Z

src/solver/pooling/forward2d.cpp

+// Compute amount of private memory required for holding the arrays defined
+// in the "mloPoolingG" kernel:
+//
+// #define MLO_BOT_DATA_SZ0 \


@atamazov : there is another tidy issue

[2023-05-01T16:28:47.204Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/solver/pooling/forward2d.cpp:95:1: error: multi-line comment [-Werror=comment] [2023-05-01T16:28:47.204Z] 95 | // #define MLO_BOT_DATA_SZ0 \ [2023-05-01T16:28:47.204Z] | ^ [2023-05-01T16:28:47.204Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/solver/pooling/forward2d.cpp:97:1: error: multi-line comment [-Werror=comment] [2023-05-01T16:28:47.204Z] 97 | // #define MLO_BOT_DATA_SZ1 \ [2023-05-01T16:28:47.204Z] | ^

And another issue:

cc1plus: error: unrecognized command line option ‘-Wno-option-ignored’ [-Werror]

@junliume This unrecognized command line option ‘-Wno-option-ignored’ [-Werror] is from Jenkinsfile or from ./cmake/EnableCompilerWarnings.cmake, this PR has nothing to do with it.

Multi-line comment will be fixed soon.

…6-use-fp32-acc

atamazov · 2023-05-01T21:49:58Z

@junliume

This unrecognized command line option ‘-Wno-option-ignored’ [-Werror] is from Jenkinsfile or from ./cmake/EnableCompilerWarnings.cmake, this PR has nothing to do with it.

Merged from fresh develop. Hope this helps.

Multi-line comment will be fixed soon.

46cc255

Ready for the new CI testing round, thanks.

…ctness for FP16 (issue ROCm#2160). [tests] Added regression test for issue ROCm#2160. [tests] Added regression test for issue ROCm#2110 (1). [tests] Fixed issues in test_miopendriver_regression_float and test_miopendriver_regression_half introduced in ROCm#2118. [tests] Introduced WORKAROUND_ISSUE_2110_2.

* pooling-fp16-use-fp32-acc(01) [NFC][pooling kernels] Make ACCUM types available. Some refactoring of macros. * pooling-fp16-use-fp32-acc(02) [NFC][pooling kernels] Make ACCUM macros nominal for all ops except FP16 averaging. Some refactoring of macros. * pooling-fp16-use-fp32-acc(03) [NFC][kernels] Allow for building both mixed-precision and "pure" kernels from the single source. Support uniform initialization of BF16 variables. * pooling-fp16-use-fp32-acc(04) [pooling kernels] Engage mixed-precision in FP16 average pooling * pooling-fp16-use-fp32-acc(05) [driver][pooling][inference] Support or collecting validation stats. Improve message printed when validation fails. Support emulation of validation errors. * pooling-fp16-use-fp32-acc(06) [driver][pooling][inference] Different initialization modes on input for FP16 (for debugging). Improve message about verification result (add max error and num_flops). * pooling-fp16-use-fp32-acc(07) [NFC][pooling kernels] Build only necessary kernels. This allows POOLING_OP_AVE to handle larger configs. * pooling-fp16-use-fp32-acc(08) [driver][pooling] Improved verification and printing of its results. Raised verification tolerance for FP16. Quality fixes. Refactoring. * pooling-fp16-use-fp32-acc(09) [driver][pooling][inference] Raised verification tolerance for FP32. * pooling-fp16-use-fp32-acc(10) [NFC][pooling] PoolingForward2d: Refactor: Compute some kernel parameters in one place (to reuse in IsApplicable) * pooling-fp16-use-fp32-acc(11) [NFC][pooling] PoolingForward2d: Improve previous commit * pooling-fp16-use-fp32-acc(12) Added MaxWaveScratchSize to TargetProperties * pooling-fp16-use-fp32-acc(13) [pooling] PoolingForward2d: Make solver not applicable when scratch limit is exceeded. * pooling-fp16-use-fp32-acc(14) [tests][pooling] Added regression tests for issue #2109

* pooling-naive(01) [NFC] Added MaxLocalMemorySize to target properties * pooling-naive(02) [NFC][pooling][backward] Refactor to compute some kernel params only once when LDS size check is implemented. * pooling-naive(03) [pooling][backward2d] Implement LDS (local memory) size check * pooling-naive(04) [pooling][forward_2d] Small fixes: alignment of private arrays and logging. * pooling-naive(05) [NFC] Reformat comments. * pooling-naive(06) [NFC][driver] Added const to argument of GetTensor... functions. * pooling-naive(07) [pooling][forward] Add naive solver with temporary host-side-only implementation. * pooling-naive(08) [pooling] FIX: Remove implementation specific details from NetworkConfig (a key of InvokerCache). These details lead correctness issues with other implementations due to usage of wrong invokers. * pooling-naive(09) [pooling][forward] Naive solver: move computations from kernel to solver. * pooling-naive(10) [pooling][forward] Naive solver: reorg host-only code to better match device execution model. * pooling-naive(11) [tests][pooling] Enable index verification in Full tests (--all). * pooling-naive(12) [pooling][forward] Naive solver: Mark it dynamic. * pooling-naive(13) [pooling][forward] Naive host-only solver: Tidy fixes. Developed and applied some strategy to prevent overflows. Unified names of kernel args. Partially implemented host code for running the kernel on GPU. * pooling-naive(14) [pooling][forward] Naive solver: Added OCL kernel. * pooling-naive(15) [pooling][forward] Naive solver: Convert pooling op from argument to compile-time parameter. * pooling-naive(16) [pooling][forward] Naive solver: Fixed kernel build errors and warnings. Finished code that enqueues GPU kernel. Many limitations due to grid size constraints! * pooling-naive(17) [pooling][forward] Naive solver: Switch back to trivial grid, {N,C,D} instead of {N,C,H}. Explain some grid-related design choices. Some refactoring. * pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(19) [pooling][forward] FIX: Remove useless info (dx, dy) from NetworkConfig for Forward * pooling-naive(20) [pooling][forward] Compute grid and workgroup. CPU emulation: Emulate grid. * pooling-naive(21) [pooling][forward] Kernel: Engage optimized grid and workgroup sizes. * pooling-naive(22) [pooling][forward] Reorder arguments and unify their names. * pooling-naive(23) [pooling][forward] Naive solver: Remove emulation on CPU. * pooling-naive(24) [pooling][forward] Take the new solver into use. * pooling-naive(25) [pooling][forward[3D] PoolingForwardNd: Fixed correctness for FP16 (issue #2160). [tests] Added regression test for issue #2160. [tests] Added regression test for issue #2110 (1). [tests] Fixed issues in test_miopendriver_regression_float and test_miopendriver_regression_half introduced in #2118. [tests] Introduced WORKAROUND_ISSUE_2110_2. * pooling-naive(26) [pooling][forward_3d] PoolingForwardNd: implement size checks. This enables PoolingForwardNaive for 3D when limits of PoolingForwardNd are hit. * pooling-naive(27) Fix tidy errors * pooling-naive(28) Added 2D/3D regression tests for issue #2110 (1). * pooling-naive(29) Partially revert: pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(30) Fix tidy error

atamazov added 14 commits April 23, 2023 00:09

pooling-fp16-use-fp32-acc(01) [NFC][pooling kernels] Make ACCUM types…

e65d290

… available. Some refactoring of macros.

pooling-fp16-use-fp32-acc(02) [NFC][pooling kernels] Make ACCUM macro…

479b28b

…s nominal for all ops except FP16 averaging. Some refactoring of macros.

pooling-fp16-use-fp32-acc(03) [NFC][kernels] Allow for building both …

643a8d6

…mixed-precision and "pure" kernels from the single source. Support uniform initialization of BF16 variables.

pooling-fp16-use-fp32-acc(04) [pooling kernels] Engage mixed-precisio…

e939b33

…n in FP16 average pooling

pooling-fp16-use-fp32-acc(05) [driver][pooling][inference] Support or…

bd6bd45

… collecting validation stats. Improve message printed when validation fails. Support emulation of validation errors.

pooling-fp16-use-fp32-acc(06) [driver][pooling][inference] Different …

89726f5

…initialization modes on input for FP16 (for debugging). Improve message about verification result (add max error and num_flops).

pooling-fp16-use-fp32-acc(07) [NFC][pooling kernels] Build only neces…

75ed210

…sary kernels. This allows POOLING_OP_AVE to handle larger configs.

pooling-fp16-use-fp32-acc(08) [driver][pooling] Improved verification…

4c66dfc

… and printing of its results. Raised verification tolerance for FP16. Quality fixes. Refactoring.

pooling-fp16-use-fp32-acc(09) [driver][pooling][inference] Raised ver…

2a80bcd

…ification tolerance for FP32.

pooling-fp16-use-fp32-acc(10) [NFC][pooling] PoolingForward2d: Refact…

e7823c9

…or: Compute some kernel parameters in one place (to reuse in IsApplicable)

pooling-fp16-use-fp32-acc(11) [NFC][pooling] PoolingForward2d: Improv…

4464003

…e previous commit

pooling-fp16-use-fp32-acc(12) Added MaxWaveScratchSize to TargetPrope…

1527854

…rties

pooling-fp16-use-fp32-acc(13) [pooling] PoolingForward2d: Make solver…

b451c21

… not applicable when scratch limit is exceeded.

pooling-fp16-use-fp32-acc(14) [tests][pooling] Added regression tests…

0db3437

… for issue ROCm#2109

atamazov mentioned this pull request Apr 28, 2023

[ROCm] Global (average) Pooling unusable. microsoft/onnxruntime#15482

Closed

junliume added bug urgency_blocker correctness labels Apr 29, 2023

junliume added this to the ROCm 5.6 milestone Apr 30, 2023

pooling-fp16-use-fp32-acc(15) Fix tidy errors

dd522f0

junliume reviewed May 1, 2023

View reviewed changes

atamazov added 2 commits May 2, 2023 00:42

pooling-fp16-use-fp32-acc(16) Fix tidy error

46cc255

pooling-fp16-use-fp32-acc(17) Merge branch 'develop' into pooling-fp1…

2015780

…6-use-fp32-acc

junliume approved these changes May 3, 2023

View reviewed changes

junliume merged commit f510b9a into ROCm:develop May 3, 2023

junliume added the TESTING_CI_PASSED label May 3, 2023

atamazov mentioned this pull request May 3, 2023

[pooling] FP16 precision issue (averaging mode) #2109

Closed

atamazov mentioned this pull request May 24, 2023

[pooling][forward][2D/3D] Naive implementation (#2110) plus more #2163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pooling] Fix FP16 precision issues (averaging mode) plus more #2118

[pooling] Fix FP16 precision issues (averaging mode) plus more #2118

atamazov commented Apr 28, 2023 •

edited

Loading

junliume commented May 1, 2023

atamazov commented May 1, 2023

junliume May 1, 2023

junliume May 1, 2023

atamazov May 1, 2023 •

edited

Loading

atamazov commented May 1, 2023 •

edited

Loading

[pooling] Fix FP16 precision issues (averaging mode) plus more #2118

[pooling] Fix FP16 precision issues (averaging mode) plus more #2118

Conversation

atamazov commented Apr 28, 2023 • edited Loading

ℹ️ Note for reviewers

junliume commented May 1, 2023

atamazov commented May 1, 2023

junliume May 1, 2023

Choose a reason for hiding this comment

junliume May 1, 2023

Choose a reason for hiding this comment

atamazov May 1, 2023 • edited Loading

Choose a reason for hiding this comment

atamazov commented May 1, 2023 • edited Loading

atamazov commented Apr 28, 2023 •

edited

Loading

atamazov May 1, 2023 •

edited

Loading

atamazov commented May 1, 2023 •

edited

Loading