-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debian/rules: Fix build_type for massive performance gain #54891
Conversation
Signed-off-by: Mark Nelson <mark.nelson@clyso.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should respect the CXXFLAGS
set by user instead of overriding it with RelWithDebInfo
.
IMHO, #54918 is a better fix of this performance issue.
have you verified your fix? I've already verified this fix, and it's both more explicit and much clearer what we are doing in the debian rules file. It also matches what Canonical is doing. I don't think you should be blocking this fix. |
Does this warning need to be updated too? Are there two separate issues here? i.e. what does do_cmake.sh do by default, and which flags were enabled for the builds in download.ceph.com? |
@tchaikov i like your pr. extra cflags from debian should apply to our subproject builds too 👍 but that only applies to the rocksdb build. it sounds like ceph builds are now defaulting to Debug, so we're not adding optimization flags from we do need some strategy to guarantee that our releases get built with optimizations. do we know how rpm builds handle that? i don't see anything about CMAKE_BUILD_TYPE in ceph.spec.in, so it seems odd that debian/rules would need to override it cc @batrick who recently added support for debug builds in ceph/ceph-build#2167. it doesn't look like that script uses debian/rules, so @markhpc's override of CMAKE_BUILD_TYPE may not interfere with that |
i did check the compiling flags passed to compiler when compling rocksdb. please let me know if there's anything else i am supposed to do?
explicit ? no. i am not sure what
clearer? maybe. it depends on how you tell the impact and verify the change.
i did read the ubuntu bug and their fix. i am not convinced with the reasoning of "somebody is doing foo and bar, so we should follow their leads". i am more interested in the real reasonings.
to block this fix is not my intention. i want to improve this project and how it interacts with the rest of the world in general in a better way if i can. that's why i created a fix couple days ago. |
@athanatos hi Sam, it's not about "worthwhile". it's more about correctness. i wanted to fix it in the explicit and correct way. in debian and fedora's packagings, they intend to pass various cflags / cxxflags generated from their own toolings. for instance, debian provides also, for instance, various distro downstreams have their own optimization and hardening policies. but these policies are not necessarily encoded into the upstream softwares' building systems.
|
@cbodley hi Casey, could you share more details on "ceph builds are now defaulting to Debug" ? what is the context of this? like when, how and where? what's the packaging you are talking about?
this is exactly the point i try to explain above. the cflags/cxxflags is what the distros are using.
|
Dan, they are two different things. one is the debian packaging recipe, the other is |
sorry @tchaikov, i just meant to say that the ceph builds aren't setting a CMAKE_BUILD_TYPE so cmake itself wasn't adding optimization flags. that's why rocksdb wasn't adding them when cmake/modules/BuildRocksDB.cmake forwards the CMAKE_BUILD_TYPE i see that both the debian and rpm builds for shaman are exporting their own C/CXXFLAGS that include -O2, so your fix in #54918 covers both cases 👍 |
I see the other PR merged. Are we closing this? Edit: Alright, talked to Casey. He thinks the other PR is most likely to cover the cases we need to cover so I'm going to close this. I will note however that I'm pretty frustrated that I spent several hours building and meticulously testing this on a customer cluster to verify that the flags got properly set and that it fully fixed the performance issue. I figured that a one liner to the debian rules file wouldn't be a controversial change. I won't make the mistake of actually verifying fixes before they are approved again given how easy it is for them to be blocked. I hope the other PR actually fixes the issue since I don't think it was actually ever tested before being merged. And for posterity, I will note that the reason we have bugs like this in the first place is that we don't take the time to actually sit down, verify, and sanity check that the things we submit into the project behave the way we intend. I think it's incredibly disappointing that we value subjective correctness over actual verification and testing for even what should be a simple fix like this, but here we are. |
source: ceph/ceph#54891 build packages with 'RelWithDebInfo' to avoid to build rocksdb in debug This is already the default in ubuntu packages https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1894453
source: ceph/ceph#54891 build packages with 'RelWithDebInfo' to avoid to build rocksdb in debug This is already the default in ubuntu packages https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1894453
As Fabian correctly noticed, from the two PR's, namely PR #54918[0] and PR #54891[1], only the first one is necessary, that's why the second one was closed upstream, so drop it here too to avoid a unnecessary divergence from upstream. [0]: ceph/ceph#54918 [1]: ceph/ceph#54891 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Recently I observed very slow 4K random write performance on an NVMe cluster running our 17.2.7 builds for Ubuntu Focal. After significant diagnosis, I was able to observe that this was due to the fact that our Debian builds are compiling RocksDB with CMAKE_BUILD_TYPE=None. For instance, looking at the console logs for the 17.2.7 Focal build:
https://jenkins.ceph.com/job/ceph-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=focal,DIST=focal,MACHINE_SIZE=gigantic/569/consoleFull
One can see this directly in the console log:
DCMAKE_CXX_COMPILER=/usr/bin/c++ -DWITH_SNAPPY=TRUE -DWITH_LZ4=TRUE -Dlz4_INCLUDE_DIRS=/usr/include -Dlz4_LIBRARIES=/usr/lib/x86_64-linux-gnu/liblz4.so -DWITH_ZLIB=TRUE -DPORTABLE=ON -DCMAKE_AR=/usr/bin/ar -DCMAKE_BUILD_TYPE=None -DFAIL_ON_WARNINGS=OFF -DUSE_RTTI=1 -DCMAKE_C_FLAGS=-Wno-stringop-truncation "-DCMAKE_CXX_FLAGS='-Wno-deprecated-copy -Wno-pessimizing-move'" "-GUnix Makefiles" /build/ceph-17.2.7/src/rocksdb
This is the same issue that both Gentoo and Canonical ran into several years ago in their builds. i.e:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1894453
After retesting with CMAKE_BUILD_TYPE properly set to RelWithDebInfo, 4K random write performance on the cluster doubled and RocksDB compaction times were reduced by roughly 2-3X.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e