Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update versions for sqlite and protobuf #2347

Merged
merged 4 commits into from
Oct 30, 2023
Merged

Conversation

ahsan-ca
Copy link
Contributor

Updated versions for sqlite and protobuf are needed to fix
security vulnerabilities.

@ahsan-ca ahsan-ca self-assigned this Oct 18, 2023
@ahsan-ca ahsan-ca linked an issue Oct 18, 2023 that may be closed by this pull request
requirements.txt Outdated
@@ -21,12 +21,12 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#####################################################################################
google/protobuf@v3.11.0 -DCMAKE_POSITION_INDEPENDENT_CODE=On -X subdir -Dprotobuf_BUILD_TESTS=Off
google/protobuf@v3.20.2 -DCMAKE_POSITION_INDEPENDENT_CODE=On -X subdir -Dprotobuf_BUILD_TESTS=Off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if .pb files or .onnx needs to be re-generated.
Did all tests pass ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they might need to be regenerated. There seems to be an issue on the develop branch preventing it from building:

/code/AMDMIGraphX/AMDMIGraphX/src/targets/gpu/include/migraphx/gpu/ck.hpp:34:10:
fatal error: 'ck/host/device_batched_gemm_softmax_gemm.hpp' file not found
#include "ck/host/device_batched_gemm_softmax_gemm.hpp"

So I checked out an earlier version of develop and made the version changes for protobuf and sqlite and was able to get build/tests to pass. I put that up as PR but realized there were conflicts with the requirements.txt on develop. So I rebased it to resolve conflicts, but I was not sure about the commits ROCmSoftwarePlatform/composable_kernel@70eefcf4f263aa5c25f3c9ff0db8f6f199ef0fb9 that may need to be updated. I was planning to ask this in the afternoon meeting along with some other questions on this. In the meantime, I think its better to convert it to draft

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that error is from CK.
You need to rebuild your dependencies using rbuild develop or rbuild prepare.

Copy link
Contributor Author

@ahsan-ca ahsan-ca Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried rbuild develop but I still saw the same error. The I tried rbuild prepare with the command rbuild prepare --deps-dir ../AMDMIGraphX/. But I still see the same error. I am doing this in the build directory in a docker container. Do you know what I might be missing? :)
Edit: May be the deps-dir is incorrect?

Copy link
Contributor Author

@ahsan-ca ahsan-ca Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I think I got it fixed. Doing rbuild prepare -d depend and using -DCMAKE_PREFIX_PATH=depend seems to have done the trick.
Edit: No it did not fix it :). Lakhinder mentioned spinning a new docker container is another way. I'll try that.
Second Edit: I was able to get this resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if .pb files or .onnx needs to be re-generated. Did all tests pass ?

@umangyadav All tests except the test_verify ones I mentioned in the other message pass.

@ahsan-ca ahsan-ca marked this pull request as draft October 18, 2023 15:47
@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Merging #2347 (1d8e0bb) into develop (cbee4b7) will not change coverage.
Report is 6 commits behind head on develop.
The diff coverage is n/a.

❗ Current head 1d8e0bb differs from pull request most recent head 8f1dd36. Consider uploading reports for the commit 8f1dd36 to get more accurate results

@@           Coverage Diff            @@
##           develop    #2347   +/-   ##
========================================
  Coverage    91.36%   91.36%           
========================================
  Files          440      440           
  Lines        16530    16530           
========================================
  Hits         15101    15101           
  Misses        1429     1429           

@ahsan-ca
Copy link
Contributor Author

ahsan-ca commented Oct 18, 2023

Seeing this error now:

RuntimeError: /code/AMDMIGraphX/AMDMIGraphX/src/targets/gpu/hip.cpp:67: get_available_gpu_memory: Failed getting available memory: invalid argument

Not related to this PR.

rocm versions:
Inside docker: 5.7.0-63
On bare metal it is 5.7.0-59

Machine running this on:
Linux ixt-rack-186 5.15.0-86-generic #96~20.04.1-Ubuntu SMP Thu Sep 21 13:23:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Oct 18, 2023

Test Batch Rate new
5929fe
Rate old
0a4254
Diff Compare
torchvision-resnet50 64 2,848.23 2,848.84 -0.02%
torchvision-resnet50_fp16 64 6,483.65 6,474.01 0.15%
torchvision-densenet121 32 2,098.55 2,098.18 0.02%
torchvision-densenet121_fp16 32 3,678.99 3,681.38 -0.06%
torchvision-inceptionv3 32 1,593.17 1,593.19 -0.00%
torchvision-inceptionv3_fp16 32 2,589.35 2,592.14 -0.11%
cadene-inceptionv4 16 706.73 706.91 -0.03%
cadene-resnext64x4 16 696.86 697.53 -0.10%
slim-mobilenet 64 8,348.55 8,361.19 -0.15%
slim-nasnetalarge 64 226.65 226.45 0.09%
slim-resnet50v2 64 2,674.17 2,674.22 -0.00%
bert-mrpc-onnx 8 824.15 824.56 -0.05%
bert-mrpc-tf 1 388.89 388.58 0.08%
pytorch-examples-wlang-gru 1 302.36 299.93 0.81%
pytorch-examples-wlang-lstm 1 314.91 306.64 2.70%
torchvision-resnet50_1 1 599.08 601.13 -0.34%
torchvision-inceptionv3_1 1 337.16 336.20 0.29%
cadene-dpn92_1 1 396.51 395.46 0.27%
cadene-resnext101_1 1 329.63 329.48 0.05%
slim-vgg16_1 1 465.09 464.93 0.03%
slim-mobilenet_1 1 2,016.76 2,054.41 -1.83%
slim-inceptionv4_1 1 216.94 216.37 0.27%
onnx-taau-downsample 1 306.46 305.32 0.37%
dlrm-criteoterabyte 1 21.65 21.68 -0.11%
dlrm-criteoterabyte_fp16 1 40.71 40.71 -0.01%
agentmodel 1 5,708.02 5,769.94 -1.07%
unet_fp16 2 55.95 55.92 0.04%
resnet50v1_fp16 1 945.49 947.58 -0.22%
bert_base_cased_fp16 64 970.04 970.06 -0.00%
bert_large_uncased_fp16 32 304.71 304.67 0.01%
bert_large_fp16 1 166.91 167.45 -0.32%
distilgpt2_fp16 16 1,277.75 1,277.04 0.06%

This build is OK for merge ✅

@migraphx-bot
Copy link
Collaborator


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

🔴torchvision-inceptionv3_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

🔴slim-inceptionv4_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

    :white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

    :white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

@ahsan-ca
Copy link
Contributor Author

ahsan-ca commented Oct 20, 2023

The get_available_gpu_memory error seem to be machine dependant. I was trying this on another machine and I do not see these errors.
However, I see the following errors. But these errors also appear on develop branch, so do not seem that this PR caused the errors.

[   RUN    ] gemm_2args_mm_5
Benchmarking gpu::mlir_op: 21 configs
Fastest solution: 64,64,32,4,4,2
FAILED: gpu
RMS Error: 0.504296
Max diff: 1.61328
Mismatch at 24: 0.121094 != -0.171875

module: "main"
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = dot(@2,2) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


ref:
module: "main"
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = dot(@2,2) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


gpu:
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
output = @param:output -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@4 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@5 = gpu::code_object[code_object=3752,symbol_name=mlir_dot,global=384,local=64,](@4,2,output) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0



void run_verify::verify(const std::string &, const migraphx::program &, const migraphx::compile_options &) const
/code/AMDMIGraphX/AMDMIGraphX/test/verify/run_verify.cpp:264:
    FAILED: passed [ 0 ]
[  FAILED  ] gemm_2args_mm_5: Test failure
[   RUN    ] gemm_2args_mm_6
Benchmarking gpu::mlir_op: 21 configs
Fastest solution: 64,64,32,8,4,2
FAILED: gpu
RMS Error: 0.292345
Max diff: 1.59375
Mismatch at 24: 0.582031 != 0

module: "main"
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = dot(@2,@3) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


ref:
module: "main"
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = dot(@2,@3) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


gpu:
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
output = @param:output -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0
@6 = gpu::code_object[code_object=3880,symbol_name=mlir_dot,global=384,local=64,](@4,@3,output) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0



void run_verify::verify(const std::string &, const migraphx::program &, const migraphx::compile_options &) const
/code/AMDMIGraphX/AMDMIGraphX/test/verify/run_verify.cpp:264:
    FAILED: passed [ 0 ]
[  FAILED  ] gemm_2args_mm_6: Test failure

test_verify: /usr/local/cget/build/tmp-82ea508a033d44d9b28ff38af027eb7d/rocMLIR-507bb94ce7873786486d296ec81d2eadaab49003/external/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1394: static RankedTensorType mlir::tensor::CollapseShapeOp::inferCollapsedType(RankedTensorType, ArrayRef<AffineMap>): Assertion `isReassociationValid(reassociation) && "invalid reassociation"' failed.
CMake Error at gdb/test_test_verify_general/run.cmake:16 (message):
  Test failed

@ahsan-ca ahsan-ca marked this pull request as ready for review October 20, 2023 16:35
@umangyadav
Copy link
Member

umangyadav commented Oct 20, 2023

However, I see the following errors. But these errors also appear on develop branch, so do not seem that this PR caused the errors.

They don't seem to be failing on CI with MLIR. Can you check if your MLIR dependency is up-to-date ?
it may actually have to do with SQLite3 version.
CC: @manupak @krzysz00

@krzysz00
Copy link
Contributor

(Ping noted, no idea what I can do to help here)

@manupak
Copy link
Contributor

manupak commented Oct 23, 2023

We dont have a (meaningful) sqlite dependency ( tagging @jerryyin to confirm as jerry worked on it last).
I checked on 7604ecf in develop and I dont see these issues either.

@manupak
Copy link
Contributor

manupak commented Oct 23, 2023

I suspect its dev environment issue ? as 1) I cant reproduce it and 2) CI cant reproduce it.. So might worth checking the MLIR commit hash being used in the dev env.

@ahsan-ca
Copy link
Contributor Author

I suspect its dev environment issue ? as 1) I cant reproduce it and 2) CI cant reproduce it.. So might worth checking the MLIR commit hash being used in the dev env.

Thanks for checking it out. I'll investigate it more at my end. I also suspect this to be an env issue at my end.

@ahsan-ca
Copy link
Contributor Author

The get_available_gpu_memory error seem to be machine dependant. I was trying this on another machine and I do not see these errors. However, I see the following errors. But these errors also appear on develop branch, so do not seem that this PR caused the errors.

[   RUN    ] gemm_2args_mm_5
Benchmarking gpu::mlir_op: 21 configs
Fastest solution: 64,64,32,4,4,2
FAILED: gpu
RMS Error: 0.504296
Max diff: 1.61328
Mismatch at 24: 0.121094 != -0.171875

module: "main"
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = dot(@2,2) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


ref:
module: "main"
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = dot(@2,2) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


gpu:
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
output = @param:output -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0
2 = @param:2 -> float_type, {2, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@4 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@5 = gpu::code_object[code_object=3752,symbol_name=mlir_dot,global=384,local=64,](@4,2,output) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0



void run_verify::verify(const std::string &, const migraphx::program &, const migraphx::compile_options &) const
/code/AMDMIGraphX/AMDMIGraphX/test/verify/run_verify.cpp:264:
    FAILED: passed [ 0 ]
[  FAILED  ] gemm_2args_mm_5: Test failure
[   RUN    ] gemm_2args_mm_6
Benchmarking gpu::mlir_op: 21 configs
Fastest solution: 64,64,32,8,4,2
FAILED: gpu
RMS Error: 0.292345
Max diff: 1.59375
Mismatch at 24: 0.582031 != 0

module: "main"
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = dot(@2,@3) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


ref:
module: "main"
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@2 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = dot(@2,@3) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0


gpu:
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
2 = @param:2 -> float_type, {1, 3, 3, 4}, {36, 12, 4, 1}, target_id=0
1 = @param:1 -> float_type, {2, 1, 2, 3}, {6, 6, 3, 1}, target_id=0
@3 = multibroadcast[out_lens={2, 3, 3, 4},out_dyn_dims={}](2) -> float_type, {2, 3, 3, 4}, {0, 12, 4, 1}, target_id=0
@4 = multibroadcast[out_lens={2, 3, 2, 3},out_dyn_dims={}](1) -> float_type, {2, 3, 2, 3}, {6, 0, 3, 1}, target_id=0
output = @param:output -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0
@6 = gpu::code_object[code_object=3880,symbol_name=mlir_dot,global=384,local=64,](@4,@3,output) -> float_type, {2, 3, 2, 4}, {24, 8, 4, 1}, target_id=0



void run_verify::verify(const std::string &, const migraphx::program &, const migraphx::compile_options &) const
/code/AMDMIGraphX/AMDMIGraphX/test/verify/run_verify.cpp:264:
    FAILED: passed [ 0 ]
[  FAILED  ] gemm_2args_mm_6: Test failure

test_verify: /usr/local/cget/build/tmp-82ea508a033d44d9b28ff38af027eb7d/rocMLIR-507bb94ce7873786486d296ec81d2eadaab49003/external/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1394: static RankedTensorType mlir::tensor::CollapseShapeOp::inferCollapsedType(RankedTensorType, ArrayRef<AffineMap>): Assertion `isReassociationValid(reassociation) && "invalid reassociation"' failed.
CMake Error at gdb/test_test_verify_general/run.cmake:16 (message):
  Test failed

The failures seem to be happening on Navi32, the tests run successfully on an MI100 system. I have opened an issue for this with the details: #2365

FYI @manupak @umangyadav

@TedThemistokleous
Copy link
Collaborator

@ahsan-ca once CI passes set this to ready-to-merge unless anyone has any other unresolved comments

@ahsan-ca
Copy link
Contributor Author

@ahsan-ca once CI passes set this to ready-to-merge unless anyone has any other unresolved comments

@TedThemistokleous There are a few failures, but I am unable to sign in to Jenkins to see the failures. How does one get access to Jenkins to see the failures?

@causten
Copy link
Collaborator

causten commented Oct 26, 2023

/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fno-omit-frame-pointer -fsanitize=undefined,address -fno-sanitize-recover=undefined,address CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()

referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:20470: test/CMakeFiles/test_tf.dir/all] E

@ahsan-ca
Copy link
Contributor Author

ahsan-ca commented Oct 26, 2023

/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fno-omit-frame-pointer -fsanitize=undefined,address -fno-sanitize-recover=undefined,address CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()

referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:20470: test/CMakeFiles/test_tf.dir/all] E

I suspect this to be caused by an issue in Protobuf : ABI may depend on NDEBUG. This issue has been fixed in Protobuf v21.3. I have updated the Protobuf version to v21.3 to see if it fixes the error and how this version interacts with other packages. Need to wait for Jenkins tests to see how this plays out.

@ahsan-ca
Copy link
Contributor Author

/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fno-omit-frame-pointer -fsanitize=undefined,address -fno-sanitize-recover=undefined,address CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()

referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:20470: test/CMakeFiles/test_tf.dir/all] E

I suspect this to be caused by an issue in Protobuf : ABI may depend on NDEBUG. This issue has been fixed in Protobuf v21.3. I have updated the Protobuf version to v21.3 to see if it fixes the error and how this version interacts with other packages. Need to wait for Jenkins tests to see how this plays out.

Ah, this did not work out. Looking into it.

@ahsan-ca ahsan-ca force-pushed the update-python-packages branch 6 times, most recently from e544df2 to 59516c0 Compare October 26, 2023 23:02
@ahsan-ca
Copy link
Contributor Author

/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fno-omit-frame-pointer -fsanitize=undefined,address -fno-sanitize-recover=undefined,address CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()

referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:20470: test/CMakeFiles/test_tf.dir/all] E

I suspect this to be caused by an issue in Protobuf : ABI may depend on NDEBUG. This issue has been fixed in Protobuf v21.3. I have updated the Protobuf version to v21.3 to see if it fixes the error and how this version interacts with other packages. Need to wait for Jenkins tests to see how this plays out.

Ah, this did not work out. Looking into it.

The install dependencies step failed with Protobuf v21.3. I also tried Protobuf v24.3 but it still failed the install dependencies step.

@ahsan-ca
Copy link
Contributor Author

ahsan-ca commented Oct 27, 2023

The original version of Protobuf 3.11.0 had the security vulnerability CVE-2021-22570. Protobuf version 3.15.0 or greater fixes this vulnerability (Source).
I am now trying Protobuf version 3.19.6 (as Protobuf <= 3.19, most likely, does not have the undefined reference issue) to see if it does not break our CI. If it does not, then we should be good as it satisfies the requirement of fixing security vulnerability CVE-2021-22570

@ahsan-ca
Copy link
Contributor Author

The original version of Protobuf 3.11.0 had the security vulnerability CVE-2021-22570. Protobuf version 3.15.0 or greater fixes this vulnerability. I am now trying Protobuf version 3.19.6 (as Protobuf <= 3.19, most likely, does not have the undefined reference issue) to see if it does not break our CI. If it does not, then we should be good as it satisfies the requirement of fixing security vulnerability CVE-2021-22570

All the CI tests pass so far, just waiting on Jenkins tests to pass before this is ready to be merged.

@causten
Copy link
Collaborator

causten commented Oct 27, 2023

[ 22%] Linking CXX executable ../bin/test_tf
cd /var/jenkins/workspace/AMDMIGraphX_PR-2347/build/test && /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/test_tf.dir/link.txt --verbose=1
/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=undefined CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf  -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread 
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()
>>> referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:25531: test/CMakeFiles/test_tf.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

@ahsan-ca
Copy link
Contributor Author

[ 22%] Linking CXX executable ../bin/test_tf
cd /var/jenkins/workspace/AMDMIGraphX_PR-2347/build/test && /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/test_tf.dir/link.txt --verbose=1
/opt/rocm/llvm/bin/clang++ -Werror -g -O2 -fsanitize=undefined -fno-sanitize=vptr,function -fno-sanitize-recover=undefined CMakeFiles/test_tf.dir/tf/tf_test.cpp.o -o ../bin/test_tf  -Wl,-rpath,/var/jenkins/workspace/AMDMIGraphX_PR-2347/build/lib ../lib/libmigraphx_tf.so.2008000.0 ../lib/libmigraphx.so.2008000.0 -lpthread 
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: google::protobuf::internal::InternalMetadata::~InternalMetadata()
>>> referenced by ../lib/libmigraphx_tf.so.2008000.0
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_tf.dir/build.make:99: bin/test_tf] Error 1
make[2]: Leaving directory '/var/jenkins/workspace/AMDMIGraphX_PR-2347/build'
make[1]: *** [CMakeFiles/Makefile2:25531: test/CMakeFiles/test_tf.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Ah, the same error persists.

@ahsan-ca
Copy link
Contributor Author

Seems like 3.18.0 worked, I am now trying 3.19.0 in a bid to get the most recent version of Protobuf that does not have the error we saw with later versions of Protobuf.

@causten causten merged commit d1305d0 into develop Oct 30, 2023
8 of 10 checks passed
@causten causten deleted the update-python-packages branch October 30, 2023 02:26
@ahsan-ca
Copy link
Contributor Author

Seems like 3.18.0 worked, I am now trying 3.19.0 in a bid to get the most recent version of Protobuf that does not have the error we saw with later versions of Protobuf.

For Posterity; 3.19.0 worked so updated Protobuf to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bump protobuf and sqlite3 to resolve CVE issues
8 participants