-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pooling] Fix FP16 precision issues (averaging mode) plus more #2118
Conversation
… available. Some refactoring of macros.
…s nominal for all ops except FP16 averaging. Some refactoring of macros.
…mixed-precision and "pure" kernels from the single source. Support uniform initialization of BF16 variables.
…n in FP16 average pooling
… collecting validation stats. Improve message printed when validation fails. Support emulation of validation errors.
…initialization modes on input for FP16 (for debugging). Improve message about verification result (add max error and num_flops).
…sary kernels. This allows POOLING_OP_AVE to handle larger configs.
… and printing of its results. Raised verification tolerance for FP16. Quality fixes. Refactoring.
…ification tolerance for FP32.
…or: Compute some kernel parameters in one place (to reuse in IsApplicable)
…e previous commit
… not applicable when scratch limit is exceeded.
@atamazov OCL tidy has failed at:
|
src/solver/pooling/forward2d.cpp
Outdated
// Compute amount of private memory required for holding the arrays defined | ||
// in the "mloPoolingG" kernel: | ||
// | ||
// #define MLO_BOT_DATA_SZ0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atamazov : there is another tidy issue
[2023-05-01T16:28:47.204Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/solver/pooling/forward2d.cpp:95:1: error: multi-line comment [-Werror=comment]
[2023-05-01T16:28:47.204Z] 95 | // #define MLO_BOT_DATA_SZ0 \
[2023-05-01T16:28:47.204Z] | ^
[2023-05-01T16:28:47.204Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-2118/src/solver/pooling/forward2d.cpp:97:1: error: multi-line comment [-Werror=comment]
[2023-05-01T16:28:47.204Z] 97 | // #define MLO_BOT_DATA_SZ1 \
[2023-05-01T16:28:47.204Z] | ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another issue:
cc1plus: error: unrecognized command line option ‘-Wno-option-ignored’ [-Werror]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume This unrecognized command line option ‘-Wno-option-ignored’ [-Werror]
is from Jenkinsfile or from ./cmake/EnableCompilerWarnings.cmake, this PR has nothing to do with it.
Multi-line comment will be fixed soon.
Merged from fresh develop. Hope this helps.
Ready for the new CI testing round, thanks. |
…ctness for FP16 (issue ROCm#2160). [tests] Added regression test for issue ROCm#2160. [tests] Added regression test for issue ROCm#2110 (1). [tests] Fixed issues in test_miopendriver_regression_float and test_miopendriver_regression_half introduced in ROCm#2118. [tests] Introduced WORKAROUND_ISSUE_2110_2.
* pooling-fp16-use-fp32-acc(01) [NFC][pooling kernels] Make ACCUM types available. Some refactoring of macros. * pooling-fp16-use-fp32-acc(02) [NFC][pooling kernels] Make ACCUM macros nominal for all ops except FP16 averaging. Some refactoring of macros. * pooling-fp16-use-fp32-acc(03) [NFC][kernels] Allow for building both mixed-precision and "pure" kernels from the single source. Support uniform initialization of BF16 variables. * pooling-fp16-use-fp32-acc(04) [pooling kernels] Engage mixed-precision in FP16 average pooling * pooling-fp16-use-fp32-acc(05) [driver][pooling][inference] Support or collecting validation stats. Improve message printed when validation fails. Support emulation of validation errors. * pooling-fp16-use-fp32-acc(06) [driver][pooling][inference] Different initialization modes on input for FP16 (for debugging). Improve message about verification result (add max error and num_flops). * pooling-fp16-use-fp32-acc(07) [NFC][pooling kernels] Build only necessary kernels. This allows POOLING_OP_AVE to handle larger configs. * pooling-fp16-use-fp32-acc(08) [driver][pooling] Improved verification and printing of its results. Raised verification tolerance for FP16. Quality fixes. Refactoring. * pooling-fp16-use-fp32-acc(09) [driver][pooling][inference] Raised verification tolerance for FP32. * pooling-fp16-use-fp32-acc(10) [NFC][pooling] PoolingForward2d: Refactor: Compute some kernel parameters in one place (to reuse in IsApplicable) * pooling-fp16-use-fp32-acc(11) [NFC][pooling] PoolingForward2d: Improve previous commit * pooling-fp16-use-fp32-acc(12) Added MaxWaveScratchSize to TargetProperties * pooling-fp16-use-fp32-acc(13) [pooling] PoolingForward2d: Make solver not applicable when scratch limit is exceeded. * pooling-fp16-use-fp32-acc(14) [tests][pooling] Added regression tests for issue #2109
* pooling-naive(01) [NFC] Added MaxLocalMemorySize to target properties * pooling-naive(02) [NFC][pooling][backward] Refactor to compute some kernel params only once when LDS size check is implemented. * pooling-naive(03) [pooling][backward2d] Implement LDS (local memory) size check * pooling-naive(04) [pooling][forward_2d] Small fixes: alignment of private arrays and logging. * pooling-naive(05) [NFC] Reformat comments. * pooling-naive(06) [NFC][driver] Added const to argument of GetTensor... functions. * pooling-naive(07) [pooling][forward] Add naive solver with temporary host-side-only implementation. * pooling-naive(08) [pooling] FIX: Remove implementation specific details from NetworkConfig (a key of InvokerCache). These details lead correctness issues with other implementations due to usage of wrong invokers. * pooling-naive(09) [pooling][forward] Naive solver: move computations from kernel to solver. * pooling-naive(10) [pooling][forward] Naive solver: reorg host-only code to better match device execution model. * pooling-naive(11) [tests][pooling] Enable index verification in Full tests (--all). * pooling-naive(12) [pooling][forward] Naive solver: Mark it dynamic. * pooling-naive(13) [pooling][forward] Naive host-only solver: Tidy fixes. Developed and applied some strategy to prevent overflows. Unified names of kernel args. Partially implemented host code for running the kernel on GPU. * pooling-naive(14) [pooling][forward] Naive solver: Added OCL kernel. * pooling-naive(15) [pooling][forward] Naive solver: Convert pooling op from argument to compile-time parameter. * pooling-naive(16) [pooling][forward] Naive solver: Fixed kernel build errors and warnings. Finished code that enqueues GPU kernel. Many limitations due to grid size constraints! * pooling-naive(17) [pooling][forward] Naive solver: Switch back to trivial grid, {N,C,D} instead of {N,C,H}. Explain some grid-related design choices. Some refactoring. * pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(19) [pooling][forward] FIX: Remove useless info (dx, dy) from NetworkConfig for Forward * pooling-naive(20) [pooling][forward] Compute grid and workgroup. CPU emulation: Emulate grid. * pooling-naive(21) [pooling][forward] Kernel: Engage optimized grid and workgroup sizes. * pooling-naive(22) [pooling][forward] Reorder arguments and unify their names. * pooling-naive(23) [pooling][forward] Naive solver: Remove emulation on CPU. * pooling-naive(24) [pooling][forward] Take the new solver into use. * pooling-naive(25) [pooling][forward[3D] PoolingForwardNd: Fixed correctness for FP16 (issue #2160). [tests] Added regression test for issue #2160. [tests] Added regression test for issue #2110 (1). [tests] Fixed issues in test_miopendriver_regression_float and test_miopendriver_regression_half introduced in #2118. [tests] Introduced WORKAROUND_ISSUE_2110_2. * pooling-naive(26) [pooling][forward_3d] PoolingForwardNd: implement size checks. This enables PoolingForwardNaive for 3D when limits of PoolingForwardNd are hit. * pooling-naive(27) Fix tidy errors * pooling-naive(28) Added 2D/3D regression tests for issue #2110 (1). * pooling-naive(29) Partially revert: pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(30) Fix tidy error
* pooling-naive(01) [NFC] Added MaxLocalMemorySize to target properties * pooling-naive(02) [NFC][pooling][backward] Refactor to compute some kernel params only once when LDS size check is implemented. * pooling-naive(03) [pooling][backward2d] Implement LDS (local memory) size check * pooling-naive(04) [pooling][forward_2d] Small fixes: alignment of private arrays and logging. * pooling-naive(05) [NFC] Reformat comments. * pooling-naive(06) [NFC][driver] Added const to argument of GetTensor... functions. * pooling-naive(07) [pooling][forward] Add naive solver with temporary host-side-only implementation. * pooling-naive(08) [pooling] FIX: Remove implementation specific details from NetworkConfig (a key of InvokerCache). These details lead correctness issues with other implementations due to usage of wrong invokers. * pooling-naive(09) [pooling][forward] Naive solver: move computations from kernel to solver. * pooling-naive(10) [pooling][forward] Naive solver: reorg host-only code to better match device execution model. * pooling-naive(11) [tests][pooling] Enable index verification in Full tests (--all). * pooling-naive(12) [pooling][forward] Naive solver: Mark it dynamic. * pooling-naive(13) [pooling][forward] Naive host-only solver: Tidy fixes. Developed and applied some strategy to prevent overflows. Unified names of kernel args. Partially implemented host code for running the kernel on GPU. * pooling-naive(14) [pooling][forward] Naive solver: Added OCL kernel. * pooling-naive(15) [pooling][forward] Naive solver: Convert pooling op from argument to compile-time parameter. * pooling-naive(16) [pooling][forward] Naive solver: Fixed kernel build errors and warnings. Finished code that enqueues GPU kernel. Many limitations due to grid size constraints! * pooling-naive(17) [pooling][forward] Naive solver: Switch back to trivial grid, {N,C,D} instead of {N,C,H}. Explain some grid-related design choices. Some refactoring. * pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(19) [pooling][forward] FIX: Remove useless info (dx, dy) from NetworkConfig for Forward * pooling-naive(20) [pooling][forward] Compute grid and workgroup. CPU emulation: Emulate grid. * pooling-naive(21) [pooling][forward] Kernel: Engage optimized grid and workgroup sizes. * pooling-naive(22) [pooling][forward] Reorder arguments and unify their names. * pooling-naive(23) [pooling][forward] Naive solver: Remove emulation on CPU. * pooling-naive(24) [pooling][forward] Take the new solver into use. * pooling-naive(25) [pooling][forward[3D] PoolingForwardNd: Fixed correctness for FP16 (issue #2160). [tests] Added regression test for issue #2160. [tests] Added regression test for issue #2110 (1). [tests] Fixed issues in test_miopendriver_regression_float and test_miopendriver_regression_half introduced in #2118. [tests] Introduced WORKAROUND_ISSUE_2110_2. * pooling-naive(26) [pooling][forward_3d] PoolingForwardNd: implement size checks. This enables PoolingForwardNaive for 3D when limits of PoolingForwardNd are hit. * pooling-naive(27) Fix tidy errors * pooling-naive(28) Added 2D/3D regression tests for issue #2110 (1). * pooling-naive(29) Partially revert: pooling-naive(18) [pooling][forward] Naive solver: Prevent UB when grid exceeds OCL limits * pooling-naive(30) Fix tidy error
The main goal if resolving issue #2109. This is done by switching to mixed precision for FP16 average pooling. Other changes include adjustments of verification tolerance for FP32 inference and for FP16 training (to avoid false failures) and observing available amount of scratch memory in order to avoid building of kernels that would fail anyway. Plus lots of refactoring. Details:
ℹ️ Note for reviewers
All significant commits (which change the behavior of the library OR may be useful) are listed above. I recommend looking at them first. Other commits are just refactorings.
[Attribution] @junliume @johnny-keker