Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Tidy up and test with ubuntu 22.04 #1004

Merged
merged 6 commits into from
Apr 29, 2023
Merged

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Apr 5, 2023

Hi @illuhad,

This was a bit harder than I imagined but I think it's looking good.
The last commit is there for you to see why specific configurations are being excluded from the job matrix.
Ideally I'd keep to a single ROCm version, but testing two different versions has its merits too I suppose.

Cheers,
-Nuno

@nmnobre nmnobre requested a review from illuhad April 5, 2023 20:11
Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nuno! It seems that it fails on more configurations than what you are excluding, is that correct? The Rocm __oclc_abi_version linker failure is a known issue that happens due to a bug either in ROCm or clang with certain ROCm and clang versions. One known fix is to compile using -Xclang -mlink-bitcode-file -Xclang /path/to/rocm/amdgcn/bitcode/oclc_abi_version_400.bc, i.e. manually the link the required bitcode file.

@nmnobre
Copy link
Member Author

nmnobre commented Apr 12, 2023

Thanks Nuno! It seems that it fails on more configurations than what you are excluding, is that correct?

Nope, the exclusions do cover all failing configurations. :-)

The Rocm __oclc_abi_version linker failure is a known issue that happens due to a bug either in ROCm or clang with certain ROCm and clang versions. One known fix is to compile using -Xclang -mlink-bitcode-file -Xclang /path/to/rocm/amdgcn/bitcode/oclc_abi_version_400.bc, i.e. manually the link the required bitcode file.

I see, I'll have a look, that's happening with clang 13 and 14, and ROCm 5.3.

@nmnobre nmnobre force-pushed the 22.04 branch 27 times, most recently from 9faccc7 to 12f9704 Compare April 12, 2023 23:35
@nmnobre
Copy link
Member Author

nmnobre commented Apr 12, 2023

One known fix is to compile using -Xclang -mlink-bitcode-file -Xclang /path/to/rocm/amdgcn/bitcode/oclc_abi_version_400.bc, i.e. manually the link the required bitcode file.

I was able to confirm that adding that line to the compiler flags does fix it.
I set OPENSYCL_ROCM_CXX_FLAGS by copying the default flags and appending said line... but the result is both long and ugly.
I was hoping I'd be able to append some extra flags to the default set of flags, I thought maybe OPENSYCL_SYCLCC_EXTRA_COMPILE_OPTIONS would be the way to go, but it doesn't seem to work. How do you suggest I add those flags?

@illuhad
Copy link
Collaborator

illuhad commented Apr 24, 2023

I was able to confirm that adding that line to the compiler flags does fix it.
I set OPENSYCL_ROCM_CXX_FLAGS by copying the default flags and appending said line... but the result is both long and ugly.
I was hoping I'd be able to append some extra flags to the default set of flags, I thought maybe OPENSYCL_SYCLCC_EXTRA_COMPILE_OPTIONS would be the way to go, but it doesn't seem to work. How do you suggest I add those flags?

My guess is that OPENSYCL_SYCLCC_EXTRA_COMPILE_OPTIONS only works for integrated multipass, but not explicit multipass.
[I think that option just appends additional flags to the syclcc invocation - but for explicit multipass, syclcc does an extra device compiler invocation with its own set of flags]

I'm not sure if there's a convenient way to circumvent the issue in CI. Would it make sense to revert to the latest ROCm version where this is not an issue?

@nmnobre
Copy link
Member Author

nmnobre commented Apr 26, 2023

I'm not sure if there's a convenient way to circumvent the issue in CI. Would it make sense to revert to the latest ROCm version where this is not an issue?

I've tried ROCm 5.2.5 and unfortunately apt reports unsatisfiable dependencies on 22.04.
Additionally, ROCm 5.4.3 still has the same linking problem with clang 13 and 14.

I understand ideally we'd drop 20.04 altogether, but seeing we can't easily do that, would you be happy with:

  • 🚂 20.04 w/ clang 11 to 14 and w/ ROCm 4.0.1 (this is a subset of the config we're already using)
  • 🚄 22.04 w/ clang 15 and 16 and w/ ROCm 5.3.3 (as a replacement for 20.04 w/ clang 15 and 16 and w/ ROCm 4.0.1)

This would keep the same number of jobs we're already using at the expense of also dropping the following, even though they are working:

  • 20.04 w/ clang 15 and 16 and w/ ROCm 4.0.1 and 5.3.3

@illuhad
Copy link
Collaborator

illuhad commented Apr 26, 2023

Yeah we should probably keep 20.04 for some more time.

I suppose there is not much that we can do otherwise apart from what you suggest :(

@nmnobre
Copy link
Member Author

nmnobre commented Apr 27, 2023

So, to recap and for the avoidance of doubt, we are excluding, since they are not working, the following 14 configs:

  • clang 11 to 14 w/ ROCm 5.4.3 since there are bitcode, header or linking incompatibilities (there are also no available packages for clang 11 and 12 on ubuntu 22.04, thus the exclusion for the CBS tests);
  • ubuntu 22.04 w/ ROCm 4.0.1 since when installing ROCm there are unsatisfiable dependencies.

In addition, I've added a new commit which excludes the following four, working configs:

  • clang 15 to 16 w/ ubuntu 20.04 since mostly redundant.

We are thus left with six, one for each clang version, of the total 24 possible job configs.

Let me know if you'd like to avoid the same kind of redundancy in the CBS tests as well, or if you'd rather keep all the working tests, including the working ones I excluded above, just so we exhaust the space as best as we can.

@illuhad
Copy link
Collaborator

illuhad commented Apr 27, 2023

That looks okay to me.

Let me know if you'd like to avoid the same kind of redundancy in the CBS tests as well, or if you'd rather keep all the working tests, including the working ones I excluded above, just so we exhaust the space as best as we can.

Maybe @fodinabor would like to comment on this?

@fodinabor
Copy link
Collaborator

IMO, a single Ubuntu version with the CBS tests should be fine since they are mostly concerned with testing the different LLVM_VERSION_MAJOR code paths... Could also do the 11, 12 on 20.04 and everything else on 22.04 to test both? 🤷🏼

@nmnobre
Copy link
Member Author

nmnobre commented Apr 28, 2023

IMO, a single Ubuntu version with the CBS tests should be fine since they are mostly concerned with testing the different LLVM_VERSION_MAJOR code paths... Could also do the 11, 12 on 20.04 and everything else on 22.04 to test both? 🤷🏼

I agree. Done.

Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@illuhad illuhad merged commit 57d752d into AdaptiveCpp:develop Apr 29, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants