Fix for distinct compile/evaluate #533
Conversation
|
@alexandraBara @pfultz2 Do we understand the relationship between the multi-threading from "Find" and "Run-and-Measure"? |
|
Tuning with MIOPEN_FIND_ENFORCE=4 MIOPEN_COMPILE_AND_RUN=0 MIOPEN_CUSTOM_CACHE_DIR=/root/test MIOPEN_COMPILE_PARALLEL_LEVEL=32 |
I thought the upper bound for MIOPEN_COMPILE_PARALLEL_LEVEL is 20, so i am not sure if setting it above 20 will have the effect you are thinking it would. But this wouldn't influence the outcome of the generic search. This behavior is correct the first time around when it only compiles. We let generic_search fail so the rest of the code doesnt try to load the binaries in memory. However you should not see it fail the second time around (when you execute), after you unset MIOPEN_COMPILE_AND_RUN=0 . At this point the kernels are compiled and written to cache and the execution should continue, generic search will try and benchmark these kernels. @zjing14 with the latest update to this PR you should be able to compile/run in the same execution. The only env var you need is to specify compile level threading: MIOPEN_COMPILE_PARALLEL_LEVEL=20. |
Codecov Report
@@ Coverage Diff @@
## develop #533 +/- ##
========================================
Coverage 52.22% 52.22%
========================================
Files 297 297
Lines 46020 46018 -2
========================================
+ Hits 24032 24033 +1
+ Misses 21988 21985 -3
Continue to review full report at Codecov.
|
|
All please rereview this PR. The update is that this has been fixed so that end users no longer need to explicitly set a flag to compiler and evaluate all at once. |
What bug does this PR fix? |
atamazov
left a comment
There was a problem hiding this comment.
Before continuation with this PR, I highly recommend resolving review comments in the baseline: #307 (review)
| continue; | ||
| kernels.push_back(kernel); | ||
| } | ||
| } |
There was a problem hiding this comment.
We definitely need signs of life in this loop. Please add a new monitoring method to Heartbeat and use it here.
There was a problem hiding this comment.
i added this: #if MIOPEN_ENABLE_SQLITE_KERN_CACHE around the first loop, i can add a comment
| kernels.push_back(kernel); | ||
| } | ||
| } | ||
| std::vector<Program> programs = PrecompileKernels(profile_h, kernels); |
There was a problem hiding this comment.
If BUILD_DEV=On, then this is waste of time, because binary cache is switched OFF.
There was a problem hiding this comment.
is this a problem? we are aware that for this to work we need bin_cache turned on.
There was a problem hiding this comment.
Yes, because it affects all developers that work with auto-tuning. For example, recently I tried to test auto-tuning locally and found that it takes too long, so I just stopped trying.
There was a problem hiding this comment.
(I thought it was a compiler issue but actual reason is #307)
There was a problem hiding this comment.
We may want to add the runtime program handles to the KernelCache (at least for development builds). That would make the loop not redundant. Alternatively, we may disable the loop when the binary cache is disabled and compile_and_run != "0".
There was a problem hiding this comment.
I would prefer the simplest variant + comment about alternative solution.
There was a problem hiding this comment.
i added this: #if MIOPEN_ENABLE_SQLITE_KERN_CACHE around the first loop, i can add a comment
There was a problem hiding this comment.
@alexandraBara Please do this
- Find IsCacheDisabled() in binary_cache.cpp
- Rename it to IsBinaryCacheDisabled()
- Make it global (remove static and add its declaration to binary_cache.hpp)
- Use it here instead of MIOPEN_ENABLE_SQLITE_KERN_CACHE
|
BTW I have nothing against resolving #307 (review) right in this PR. |
atamazov
left a comment
There was a problem hiding this comment.
I am approving this as I feel this is urgent PR. Please fix the following ASAP (possibly in some other PR):
- #533 (comment)
- #533 (comment)
- leftovers of #307 (review)
| #endif | ||
|
|
||
| for(const auto& current_config : all_configs) | ||
| if(IsEnabled(MIOPEN_DEBUG_COMPILE_ONLY{})) |
There was a problem hiding this comment.
🐛 Auto-tuning is not functional after this change. I am curious how this PR was tested.
* kernel parallel compile fix * added PrecompileKernels function call as default behavior in generic search
* kernel parallel compile fix * added PrecompileKernels function call as default behavior in generic search
…ated to: SWDEV-304151) (#1361) * Also resolves #533 (comment)
Compiling all kernels at once with Precompile kernels. This enables us to compile in parallel for Tuna.
Ran local tests to get speed for compiling/evaluating separately on MI100.
./build/bin/MIOpenDriver conv -F 2 -n 128 -g 1 -k 1024 -c 1024 -H 35 -W 35 -y 3 -x 3 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -V 0 -w 1 -t 1 -i 1
common env vars:
export MIOPEN_FIND_ENFORCE=4
export MIOPEN_LOG_LEVEL=6
export MIOPEN_DEBUG_CONV_GEMM=0
export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_DIRECT=0
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1
Regular compile+evaluate at the same time for igemm:
-- 5 iteration: avg 282 sec
-- 10 iteration: avg 145 sec
Separate compile, evaluate using extra env vars for compile:
export MIOPEN_COMPILE_AND_RUN=0
export MIOPEN_CUSTOM_CACHE_DIR=/root/test
First run with MIOPEN_COMPILE_PARALLEL_LEVEL=20
-- compile avg: 12 sec
-- run avg: 8 sec
-- total avg: 20 sec
Second run with MIOPEN_COMPILE_PARALLEL_LEVEL=10
-- compile avg: 19 sec
-- run avg: 8 sec
-- total avg: 27 sec
**UPDATE: parallel compile for kernels has been added as default behavior to generic_search. We can now compile in parallel and execute in the same run. No env vars needed other than set the compile level with: MIOPEN_COMPILE_PARALLEL_LEVEL=20.
We can still compile/execute separately using:
export MIOPEN_COMPILE_AND_RUN=0**