Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix the error of reloading the model library on the ROCm platform: "MIOpen Error: No invoker was registered for convolution forward.” #16190

Merged
merged 1 commit into from Dec 5, 2023

Conversation

Liangxijun-1001
Copy link
Contributor

My env:AMD RX 7600
I debugged the MIOpen source code for a specific reason. During the model compilation phase, the Python side calls the conv2d.setup interface, invoking MIOpen's findConvForwardAlgorithm interface to find the appropriate algorithm. Subsequently, the corresponding <layer-Problem, Algorithm> pair is registered within the current invokers. Then, without terminating the current process, during the inference stage, the invoker can identify the Algorithm for the corresponding layer-problem and perform direct inference.

However, if the current process is exited and the pre-compiled model is executed without the prior invocation of findConvForwardAlgorithm during the compilation phase, the corresponding <layer-Problem, Algorithm> pair won't be registered within the invokers. As a result, the inference stage will report an error stating "MIOpen Error: No invoker was registered for convolution forward."

Based on the distinction between MIOpen and cuDNN invocation provided by the MIOpen official documentation, the typical sequence for calling Convolution APIs in MIOpen is as follows:

miopenConvolution*GetWorkSpaceSize(): This function returns the workspace size required by the Find() operation.

miopenFindConvolution*Algorithm(): This function returns performance information about various algorithms.

miopenConvolution*(): Actual convolution operation.

The official documentation emphasizes that calling miopenFindConvolution*Algorithm() is mandatory before using any Convolution API.

Additionally, according to the documentation found at https://rocm.docs.amd.com/projects/MIOpen/en/latest/convolution.html#miopenfindconvolutionforwardalgorithm, the last parameter of the miopenFindConvolutionForwardAlgorithm interface, exhaustiveSearch, should be set to 1 (true):

If exhaustiveSearch == 0, MIOpen will seek the first kernel with a configuration match. If no configuration match is found, a default configuration will be returned.

If exhaustiveSearch == 1, MIOpen will search for the best kernel for the provided configuration. If a match is not found, an exhaustive search is performed by running individual algorithms.

For further details refer to this link: https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html

cc: @tqchen @masahi @Lunderberg

@Liangxijun-1001
Copy link
Contributor Author

Liangxijun-1001 commented Dec 2, 2023

cc: @jinhongyii @junrushao

@tqchen
Copy link
Member

tqchen commented Dec 2, 2023

Thakns @Liangxijun-1001 . I think in this case, maybe we should ignore the fwd_algo in entry(decided in compile time) and then use the algorithm returned by the miopenFindConvolutionForwardAlgorithm

entry_ptr->handle, entry_ptr->conv_entry.input_desc, x->data,
entry_ptr->conv_entry.filter_desc, w->data, entry_ptr->conv_entry.conv_desc,
entry_ptr->conv_entry.output_desc, y->data, request_algo_count, &returned_algo_count,
perfs, workspace, workspace_size, exhaustive_search));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider reset entry_ptr->fwd_algo to simply the best returned algo

@Liangxijun-1001
Copy link
Contributor Author

Liangxijun-1001 commented Dec 4, 2023

Thakns @Liangxijun-1001 . I think in this case, maybe we should ignore the fwd_algo in entry(decided in compile time) and then use the algorithm returned by the miopenFindConvolutionForwardAlgorithm
My MIOpen tag:v2.18
Currently, I am following the official invocation steps! In the inference stage, it is necessary to call the 'find' function when executing each convolution operation(https://rocm.docs.amd.com/projects/MIOpen/en/latest/find_and_immediate.html + https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/docs/MIOpen_Porting_Guide.md). This 'find' function will invoke the Find-DB operation to search the database. The following is the log after I have enabled the relevant macro switches.
export MIOPEN_ENABLE_LOGGING=0
export MIOPEN_LOG_LEVEL=7
releated logs:

input_shape: (1, 3, 224, 224)
precessing image: ../../dataset/images/test/kitten.jpg
origina image shape: (720, 720, 3)
resize image shape: (1, 3, 224, 224)
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1102
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1102
MIOpen(HIP): Info [SetStream] stream: 0, device_id: 0
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 5.7.23365, MIOpen version 2.20.0.f185a6464-dirty
MIOpen(HIP): Info2 [GetWorkSpaceSize]
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [IsNetworkedFilesystem] Filesystem type at '/home/liangnus/.config/miopen/' is: 0xef53 'EXT2/3/4_SUPER_MAGIC'
MIOpen(HIP): Info2 [GetLibPath] Lib Path: /opt/rocm-5.7.1/lib/libMIOpen.so.1.0.50701
MIOpen(HIP): Info2 [GetInstalledPathFile] inexact find database search
MIOpen(HIP): Info2 [GetInstalledPathFile] Iterating over find db directory /opt/rocm-5.7.1/share/miopen/db/
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx900_64.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx803_36.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx90a68.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx90a6e.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx803_36.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx906_64.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx906_60.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx906_60.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx803_64.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx1030_36.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx900_64.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx900_56.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx906_64.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx90878.HIP.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx900_56.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx803_64.OpenCL.fdb
MIOpen(HIP): Info [GetInstalledPathFile] Checking find db file: gfx90878.OpenCL.fdb
MIOpen(HIP): Info [Measure] ReadonlyRamDb::Prefetch time: 5e-05 ms
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.206036 ms
MIOpen(HIP): Info2 [ValidateUnsafe] DB file is newer than cache: 3876535210630, 3116245083235
MIOpen(HIP): Info2 [FindRecord] RamDb file is newer than cache, prefetching
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.211256 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 3-224-224-7x7-64-112-112-1-3x3-2x2-1x1-0-NCHW-FP32-F in cache for file /home/liangnus/.config/miopen//gfx1102_16.HIP.2_20_0_f185a6464-dirty.ufdb.txt
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.286266 ms
MIOpen(HIP): Info [GetWorkSpaceSize] 7375872
MIOpen(HIP): Command [LogCmdFindConvolution] ./bin/MIOpenDriver conv -n 1 -c 3 -H 224 -W 224 -k 64 -y 7 -x 7 -p 3 -q 3 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 7375872
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info2 [ValidateUnsafe] DB file is newer than cache: 3876535210630, 3116245344375
MIOpen(HIP): Info2 [FindRecord] RamDb file is newer than cache, prefetching
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.214552 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 3-224-224-7x7-64-112-112-1-3x3-2x2-1x1-0-NCHW-FP32-F in cache for file /home/liangnus/.config/miopen//gfx1102_16.HIP.2_20_0_f185a6464-dirty.ufdb.txt
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.262162 ms
@masahi can you review this PR?

…m platform: MIOpen Error: No invoker was registered for convolution forward.

Signed-off-by: Liangxijun-1001 <lxjqq365@126.com>
@masahi masahi merged commit 37329bf into apache:main Dec 5, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants