Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph.spec.in: bump gcc-toolset to 13 and use it on rhel>=8 #55886

Merged
merged 5 commits into from Mar 21, 2024

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Mar 2, 2024

since gts 13 is out, and the newer the toolchain the merrier. and because clang prefers using a newer gts when its gcc is around. so let's bump up the gcc-toolset from 11 to 13.. both RHEL8 and RHEL9 have gcc-toolset 13, so let's enable it on
rhel>=8, more future-proof this way.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc-13 also seems to have sufficiently good coroutine support to be usable. This seems simpler than switching to clang-16/17.

@rzarzynski @cbodley Does this look ok to you?

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 2, 2024

changelog

  • do not install -annobin subpackage anymore, as it does not exist in gts-13.

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 2, 2024

@athanatos Hi Sam, FWIW, Scylladb didn't switch to GCC because of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056, but it was fixed one year ago. (wanted to add more proof to support your comment at #55825 (comment) )

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 2, 2024

-- Performing Test Cxx_Compiler_BZ107852_Free
CMake Error at /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/19.0.0-1808-g79ff8417/rpm/el8/BUILD/ceph-19.0.0-1808-g79ff8417/x86_64-redhat-linux-gnu/CMakeFiles/CMakeScratch/TryCompile-lZ77IE/CMakeLists.txt:26 (target_link_libraries):
  Target "cmTC_2649c" links to:

    fmt::fmt

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

see https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/77261//consoleFull.

strange enough: fmt::fmt was not available when Seastar was checking for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107852

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 2, 2024

see also ceph/seastar@d382f24

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 2, 2024

i am proposing a solution at ceph/seastar@d382f24#r139292281

@athanatos
Copy link
Contributor

Pushed to ci as wip-sjust-testing-2024-03-04

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 5, 2024

Pushed to ci as wip-sjust-testing-2024-03-04

@athanatos i was testing it in https://shaman.ceph.com/builds/ceph/wip-pr-55886-kefu/8352c8cc4786283da2db0240841e2e5500699e02/crimson/384525/ as well. but i hit a FTBFS addressed by #55905. probably you could cherrypick #55905 in your branch?

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 5, 2024

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 5, 2024

now blocked by #55910

@cbodley
Copy link
Contributor

cbodley commented Mar 6, 2024

@rzarzynski @cbodley Does this look ok to you?

👍👍 would this allow us to drop the 'crimson' build flavor in shaman?

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 6, 2024

this is a great question indeed. i think the answer is "YES! but ...".

so i think we cannot retire the crimson flavor yet.

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 6, 2024

just pushed the rebased change to https://shaman.ceph.com/builds/ceph/wip-pr-55886-kefu/22ef2c527ade01c8105677709484d3e2fe2b5bbf/

once it's green. i think it's good to go. or we need a better coverage by running the packages with teuthology ?

@cbodley
Copy link
Contributor

cbodley commented Mar 6, 2024

so i think we cannot retire the crimson flavor yet.

thanks @tchaikov. i've just been feeling the weight of our build matrix lately. the centos8 and ubuntu focal builds should get retired soonish. it would be great if we could find a way to prioritize the crimson part after squid

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 6, 2024

i see. yeah, we do need to find a better way to package crimson-osd. sooner or latter, user will need to migrate to crimson osd. and we will probably have a period of time allowing user to opt in crimson osd. so the co-existing of these two packages would be a must have by then.

@athanatos
Copy link
Contributor

athanatos commented Mar 6, 2024

I think the best way to unify the packages would be to have a single ceph-osd which execs classic-osd or crimson-osd based on a command line flag or config.

athanatos and others added 3 commits March 11, 2024 18:28
gcc-toolset-13 lto can trigger a linker bug resulting in a segfault in
SafeTimer (and perhaps elsewhere).  See
https://tracker.ceph.com/issues/63867 for details.  This patch disables
lto for now now so that we can switch to gcc-toolset-13.

Fixes: https://tracker.ceph.com/issues/63867
Signed-off-by: Samuel Just <sjust@redhat.com>
since gts 13 is out, and GCC-13 brings better support of C++20 coroutines,
and because clang prefers using a newer gts when its gcc is around. so
let's bump up the gcc-toolset from 11 to 13 when building crimson. because
gcc-toolset-13 LTO triggers a linker bug resulting in a segfault in SafeTimer,
see https://tracker.ceph.com/issues/63867 , we cannot switch the classic
build to gts 13 without proving that it does not incur performance
regressions.

since annobin plugin package was renamed to
gcc-toolset-13-gcc-plugin-annobin, let's update its name accordingly.

and use -runtime subpackage instead of -build, as
macros.gcc-toolset-13-enable is now located in -runtime subpackage
since devtoolset12

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
both RHEL8 and RHEL9 have gcc-toolset 13, and we need to use gts-13
for building crimson, so let's enable it when building crimson,
and we need to use gts-11 when building on RHEL7. hence this change.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov
Copy link
Contributor Author

@tchaikov
Copy link
Contributor Author

@tchaikov
Copy link
Contributor Author

jenkins test make check

@tchaikov
Copy link
Contributor Author

@athanatos hi Sam, could you take another look? i think it's good to merge now. hopefully, all three tests will pass when you check them.

@athanatos
Copy link
Contributor

make check is failing, but for a reason present in other PRs that we're investigating. I'll schedule this one for a crimson run and another 10 run classic run (one of the three failed, though for a reason I've seen elsewhere).

@athanatos
Copy link
Contributor

athanatos commented Mar 12, 2024

I think if the above seem ok, it's probably good to merge. @rzarzynski @cbodley final look? The summary of where we are now is that moving to gcc-toolset-13 triggers a linker bug in SafeTimer, so we're now leaving classic builds at 11 by default for centos 8 and 9, using 13 for crimson, and disabling lto for gcc-toolset-13. This should take care of the issues users are seeing with classic on newer distros which default to gcc-toolset-13, allow usage of coroutines in crimson, and leave lto enabled for centos 8 and 9 classic builds for now until we have a gcc-toolset-13 which doesn't have that problem.

https://tracker.ceph.com/issues/63867 is the gcc-toolset-13 ceph linker crash.

@tchaikov
Copy link
Contributor Author

jenkins test make check

@athanatos
Copy link
Contributor

https://pulpito.ceph.com/sjust-2024-03-12_01:15:24-crimson-rados-wip-kefu-testing-2024-03-11-distro-default-smithi/ mostly failed. I'm traveling this week, so it may be a bit before I get a chance to look at the logs.

@Svelar
Copy link
Member

Svelar commented Mar 13, 2024

wondering if we can parameterize the ceph-osd systemd service unit, so that it can launch crimson or the classic osd based on user's preference. but we need to persist a mark so user cannot switch to another osd after the the store is initialized. as the disk layouts of crimson and bluestore are different.

Could it be picked up in crimson weekly?

@athanatos
Copy link
Contributor

athanatos commented Mar 13, 2024

Building wip-sjust-cb2b48d7-2024-03-12 to confirm that the above failures are due to this PR, haven't had a chance to look into the logs yet.

@rzarzynski rzarzynski added the DNM label Mar 13, 2024
@rzarzynski
Copy link
Contributor

Putting DNM due to the comment above.

@athanatos
Copy link
Contributor

I think the failures are unrelated to this PR -- main isn't particularly clean at the moment. I'd like to clean up main before merging this though.

@athanatos
Copy link
Contributor

athanatos commented Mar 19, 2024

https://tracker.ceph.com/issues/64996 may be triggered (caused?) by this switch to gcc-13 -- investigating. Nope, not sure why it's popping up now but I can reproduce it on gcc-11 as well.

@tchaikov
Copy link
Contributor Author

wondering if we can parameterize the ceph-osd systemd service unit, so that it can launch crimson or the classic osd based on user's preference. but we need to persist a mark so user cannot switch to another osd after the the store is initialized. as the disk layouts of crimson and bluestore are different.

Could it be picked up in crimson weekly?

i think it might be answered by the attendees =)

@tchaikov
Copy link
Contributor Author

tchaikov commented Mar 21, 2024

I think the failures are unrelated to this PR -- main isn't particularly clean at the moment. I'd like to clean up main before merging this though.

@athanatos hi Sam, have you started working on this? otherwise i'd like to take a look at main.cc this weekend. but i might be slow.

@athanatos
Copy link
Contributor

@tchaikov By main, I meant the crimson-rados suite on ceph/main, not main.cc. I have a set of fixes in a test branch (wip-sjust-crimson-testing-2024-03-20) along with this PR. It finally got through ci, so I just scheduled another suite run. http://pulpito.ceph.com/sjust-2024-03-21_02:35:36-crimson-rados-wip-sjust-crimson-testing-2024-03-20-distro-default-smithi/ Once we get a clean-ish suite run with this PR, we can merge.

@tchaikov
Copy link
Contributor Author

@athanatos ahh, thanks. i misread your comment.

@athanatos
Copy link
Contributor

https://pulpito.ceph.com/sjust-2024-03-21_02:35:36-crimson-rados-wip-sjust-crimson-testing-2024-03-20-distro-default-smithi/

Failures unrelated, caused by existing known bugs.

@athanatos athanatos removed the DNM label Mar 21, 2024
@athanatos athanatos merged commit dbfb661 into ceph:main Mar 21, 2024
11 checks passed
@tchaikov tchaikov deleted the wip-ceph.spec-gcc-13 branch March 22, 2024 00:31
@Svelar Svelar mentioned this pull request Apr 25, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants