MDBF-710 - Update cmake flags for valgrind build#481
MDBF-710 - Update cmake flags for valgrind build#481RazvanLiviuVarzaru merged 4 commits intoMariaDB:devfrom
Conversation
grooverdan
left a comment
There was a problem hiding this comment.
compile-pentium64-valgrind-max shouldn't be used as a basis. As you can see, it has much obsolete junk it in and including it in the bb config just adds to our maintaince burden.
The valgrind test should test what is the default -DWITH_VALGRIND=1. The management of what that should exclude is a server developer responsibility to maintain.
As seen, some recommended change will incur extra coverage. These should be examined during a staging build to see if they are incurring additional faults.
Lastly - there should be a description in the commit message covering the rational for every change. This needs to be there for future reference because one day, potential yourself, will ask why was this changed. Developers who request changes should be able to say why.
| steps.Compile( | ||
| command=[ | ||
| "sh", | ||
| "bash", |
There was a problem hiding this comment.
unclear why this is changed.
There was a problem hiding this comment.
It was useful in other factories to make use of shell brace expansion so it seemed like a good idea to have a convention for using bash.
echo $0
sh
echo DPLUGIN_{ARCHIVE,TOKUDB}
DPLUGIN_{ARCHIVE,TOKUDB}
bash
DPLUGIN_ARCHIVE DPLUGIN_TOKUDB
master-docker-nonstandard/master.cfg
Outdated
| "cmake . -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DENABLE_ASSEMBLER=1 -DWITH_EXTRA_CHARSETS=complex -DENABLE_THREAD_SAFE_CLIENT=1 -DWITH_BIG_TABLES=1 -DWITH_PLUGIN_ARIA=1 -DWITH_ARIA_TMP_TABLES=1 -DWITH_JEMALLOC=NO -DCMAKE_BUILD_TYPE=Debug -DSECURITY_HARDENED=OFF -DWITH_VALGRIND=1 -DHAVE_LIBAIO_H=0 -DCMAKE_DISABLE_FIND_PACKAGE_URING=1 -DCMAKE_DISABLE_FIND_PACKAGE_LIBAIO=1 -DWITH_SSL=bundled -DWITH_MAX=AUTO -DWITH_EMBEDDED_SERVER=1 -DWITH_LIBEVENT=bundled -DPLUGIN_PLUGIN_FILE_KEY_MANAGEMENT=NO -DPLUGIN_TEST_SQL_DISCOVERY=DYNAMIC -DPLUGIN_TOKUDB=NO -DPLUGIN_ROCKSDB=NO -DPLUGIN_AUTH_GSSAPI=NO -DENABLE_LOCAL_INFILE=1 -DMYSQL_SERVER_SUFFIX=-valgrind-max -DWITH_DBUG_TRACE=OFF && make -j%(kw:jobs)s", | ||
| """cmake . \\ | ||
| -DCMAKE_C_COMPILER_LAUNCHER=ccache \\ | ||
| -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \\ |
There was a problem hiding this comment.
I like the reformatting for readability.
master-docker-nonstandard/master.cfg
Outdated
| -DCMAKE_C_COMPILER_LAUNCHER=ccache \\ | ||
| -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \\ | ||
| -DCMAKE_C_FLAGS="-Wall -Wextra -Wunused -Wwrite-strings -Wno-uninitialized -Wno-strict-aliasing -Wformat-security -Wvla -Wimplicit-fallthrough=2 -mtune=native -m64 -DEXTRA_DEBUG -DSAFE_MUTEX -DSAFEMALLOC -O0 -g3 -gdwarf-2 -DFORCE_INIT_OF_VARS -Wuninitialized -DHAVE_valgrind -USAFEMALLOC -DTRASH_FREE_MEMORY -UFORCE_INIT_OF_VARS -Wno-uninitialized -DMYSQL_SERVER_SUFFIX=-valgrind-max" \\ | ||
| -DCMAKE_CXX_FLAGS="-Wall -Wextra -Wunused -Wwrite-strings -Wno-uninitialized -Wno-strict-aliasing -Wformat-security -Wvla -Wno-unused-parameter -Wno-invalid-offsetof -Wnon-virtual-dtor -Wimplicit-fallthrough=2 -felide-constructors -fexceptions -mtune=native -m64 -DEXTRA_DEBUG -DSAFE_MUTEX -DSAFEMALLOC -O0 -g3 -gdwarf-2 -DFORCE_INIT_OF_VARS -Wuninitialized -DHAVE_valgrind -USAFEMALLOC -DTRASH_FREE_MEMORY -UFORCE_INIT_OF_VARS -Wno-uninitialized -DMYSQL_SERVER_SUFFIX=-valgrind-max" \\ |
There was a problem hiding this comment.
I'm objecting to this at a fundamental level.
Buildbots objective is (IMO), variation welcome:
- provide developers a test that impacts users that they many not run locally
- provide reviewers a validation that the above has occured without needing to perform the validation itself.
Generic {C,XX}FLAGS should be managed by developers. The warnings flags belong in cmake/maintainer.cmake such that every developer (and BB) gets the warnings, and developers get it at their local compile so they can correct warnings generated before it gets to buildbot. Most warnings are already in cmake/maintainers.cmake and there's no need to repeat them. If there's others, developers can add them there like MariaDB/server#3333. If there are really valgrind specific compile flags to add, developers can add them to configure.cmake under IF(WITH_VALGRIND).
On the defines:
- EXTRA_DEBUG - is populated as a define by the cmake EXTRA_DEBUG
- HAVE_valgrind - defined by -DWITH_VALGRIND=1
- -UUNSAFEMALLOC - by CMakeList.txt already gets disable when -DWITH_VALGRIND=1
- TRASH_FREE_MEMORY - same
- FORCE_INIT_OF_VARS - not currently default - but should be in configure.cmake and config.h.cmake so it applies to everyone doing a -DWITH_VALGRIND build
- MYSQL_SERVER_SUFFIX - already set as cmake option
There should be no BB config that ever sets CFLAGS. Even MSAN/ASAN builders that have compile flags should be set by CMake configuration options within the server.
master-docker-nonstandard/master.cfg
Outdated
| -DWITH_DBUG_TRACE=OFF \\ | ||
| -DPLUGIN_S3=STATIC \\ | ||
| -DPLUGIN_FILE_KEY_MANAGEMENT=DYNAMIC \\ | ||
| -DPLUGIN_HASHICORP_KEY_MANAGEMENT=DYNAMIC \\ |
There was a problem hiding this comment.
The only options that should be enable here are high level build options:
- CMAKE_BUILD_TYPE
- WITH_VALGRIND
- WITH_EMBEDDED
- PLUGIN_TOKUDB=NO
Things to keep because the server CMake things aren't up to date:
- SECURITY_HARDENING=OFF
- EXTRA_DEBUG=1 (missing CMake definition as an option to say what exactly it does)
Maybe:
- WITH_MAX=1 which enables all plugins - so there's no need to list them all.
Things to remove:
- DENABLE_ASSEMBLER- option just doesn't exist
- WITH_EXTRA_CHARSET - complex is a subset of the default "all" - we want to test valgrind on all charset - so remove
- DENABLE_THREAD_SAFE_CLIENT - no option exists
- DWITH_BIG_TABLES - is the default - remove
- DWITH_PLUGIN_ARIA - default - remove
- DWITH_ARIA_TMP_TABLES - not an option
- SECURITY_HARDENING - leave for now, but if its truely incompatible with VALGRIND - it should be disable in CMakeLists.txt (top level) in the server
- DHAVE_LIBAIO_H / DCMAKE_DISABLE_FIND_PACKAGE_URING / DCMAKE_DISABLE_FIND_PACKAGE_LIBAIO - omit - valgrind on areas of the code covered by these options are important to test. Both these packages should be on the valgrind builder.
- WITH_SSL - is default - assuming ssl-dev package exists on builder - omit
- WITH_MAX - leave - technically only tested if there - AUTO is misleelding - make it =ON for consistent
- WITH_LIBEVENT - ensure the libevent-dev package is part of the container and omit
- PLUGIN_PLUGIN_FILE_KEY_MANAGEMENT=NO - can't see a reason to omit from testing
- DPLUGIN_TEST_SQL_DISCOVERY - default - omit
- PLUGIN_TOKUDB=NO - keep for now
- DPLUGIN_ROCKSDB - default - omit
- DPLUGIN_AUTH_GSSAPI - default - omit
- DENABLE_LOCAL_INFILE - not an option - omit
- MYSQL_SERVER_SUFFIX - its junk that doesn't contribute to the build and mtr tests don't look at it
- WITH_DBUG_TRACE - leave for now, but the server CMakeList.txt should not enable under -DWITH_VALGRIND=1
- PLUGIN_S3=STATIC - unclear why this is set to static - being static of dynamic shouldn't affect the valgrind result - omit for testing
- DPLUGIN_FILE_KEY_MANAGEMENT - notice this different form earlier option - omit and test it
- DPLUGIN_HASHICORP_KEY_MANAGEMENT - is default omit
There was a problem hiding this comment.
Sorry for taking me so long to respond. I took some time to investigate and digest those written by you. Also thank you very much for such a detailed review which opened new territories for me to explore.
I want to confirm a few things with you, case by case.
[1] DWITH_BIG_TABLES
Variable doesn’t seem do to anything if provided to CMAKE. I don’t see it in any CMAKE invokation, and moreover, BIG_TABLES it’s defined under a block of macros which states that compilation will break without them.
https://github.com/MariaDB/server/blob/7c5fdc9b6a45f4e8bad1ae5c7c93b457be9891fa/config.h.cmake#L510
[2] DHAVE_LIBAIO_H / DCMAKE_DISABLE_FIND_PACKAGE_URING / DCMAKE_DISABLE_FIND_PACKAGE_LIBAIO
When running without setting these to OFF, I’d expect that LIBAIO and URING will be used, instead from the –trace-expand I get that they are optional by default.
• /buildbot/tpool/CMakeLists.txt(5): OPTION(WITH_URING Require that io_uring be used OFF )
• /buildbot/tpool/CMakeLists.txt(6): OPTION(WITH_LIBAIO Require that libaio is used, unless uring is there OFF ).
If I run with -DWITH_LIBAIO it won’t check the existence of the package because there is a typo in the code, and also if URING is present, the code block for checking and applying settings when LIBAIO is present, won’t run.
`_ELSEIF(WITH_LIBAIO)
SET(LIBAIO_REQIRED REQUIRED)
ELSE()
FIND_PACKAGE(LIBAIO QUIET ${LIBAIO_REQUIRED})_`
Anyway, since LIBURING is present on the image it will still apply the settings under IF(FOUND_URING).
[3] FORCE_INIT_OF_VARS
- Not sure if I understood correctly if this macro should be defined or not for a Valgrind build.
Looked in the server code and it seems that the default is undefined, the only exception being the connect storage engine.
[4] WITH_LIBEVENT
Yes, it is part of the container image.
libevent_pthreads-2.1.so.7 (libc6,x86-64) => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7
libevent_openssl-2.1.so.7 (libc6,x86-64) => /lib/x86_64-linux-gnu/libevent_openssl-2.1.so.7
libevent_extra-2.1.so.7 (libc6,x86-64) => /lib/x86_64-linux-gnu/libevent_extra-2.1.so.7
libevent_core-2.1.so.7 (libc6,x86-64) => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7
libevent-2.1.so.7 (libc6,x86-64) => /lib/x86_64-linux-gnu/libevent-2.1.so.7
[5] – MYSQL_SERVER_SUFFIX
I discovered some MTR tests that depends on the version if is set to Valgrind.
But I suppose that even if not setting the SERVER_SUFFIX, “Valigrind” will still be appended to the version (based on the line below) and tests will work just fine.
https://github.com/MariaDB/server/blob/7c5fdc9b6a45f4e8bad1ae5c7c93b457be9891fa/sql/mysqld.cc#L8953
[6]: PLUGIN_TOKUDB = NO
What is the rationale of this being disabled for Valgrind builds?
Also, it will override WITH_MAX=ON ?
[7] WITH_SSL
If I understand correctly the default is “system” so it will invoke Find_Package in CMAKE to satisfy this requirement.
On the Valgrind image found:
libssl-dev/jammy-updates,jammy-security 3.0.2-0ubuntu1.16 amd64 [upgradable from: 3.0.2-0ubuntu1.7]
There was a problem hiding this comment.
[1] correct
[2] well spotted - MariaDB/server#3375 , 10.5 still depends on libaio, 10.6+ will be uring.
[3] correct
[4] don't forget devel package needed for include headers - libevent shared libraries are insufficient
[5] $VALGRIND_TEST is set by the mtr option --valgrind of mtr
[6] turn out this is disabled by default. - effectively unmaintained - https://mariadb.com/kb/en/tokudb/
[7] yes
Valgrind image - consider a bump to nobel 3.22, or fedora40 (3.23). The 3.23 might be required for __builtin_strcmp (or -fno-builtin-strcmp, or the current msse hack, which is counter intuitive to testing as close to production.)
There was a problem hiding this comment.
For [4] - totally missed that line when looking through the package list. Yes, it is on the current image:
- libevent-dev/jammy,now 2.1.12-stable-1build3 amd64 [installed]
There was a problem hiding this comment.
for [6] - I guess I can for sure omit the option PLUGIN_TOKUDB = NO
as for every storage engine's CMakeLists.txt file, a call to MYSQL_ADD_PLUGIN is made. As TOKUDB is missing, we should be fine.
Even for 10.5 where tokudb is present I see it's disabled by MDEV-19780
- WITH_VALGRIND should handle most of the logic in CMAKE for configuring a Valgrind build - to increase the coverage WITH_MAX=1 enables all the plugins - SECURITY_HARDENED it's a candidate for removal if it's proven to be incompatbile with Valgrind - WITH_DBUG_TRACE - candidate for removal if the server CMAKE will handle this under Valgrind.
grooverdan
left a comment
There was a problem hiding this comment.
yep, tests so far are good. Admittedly with a higher valgrind version.
| -DPLUGIN_TOKUDB=NO \\ | ||
| -DEXTRA_DEBUG=1 \\ | ||
| -DSECURITY_HARDENED=OFF \\ | ||
| && make -j%(kw:jobs)s""", |
There was a problem hiding this comment.
Tested on Fedora 40, valgrind-3.23.0, with g{cc,++} = 14.1.1
Need one Archive storage engine patch to compile with gcc, and one innodb patch to stop a valgrind hit on a two assertions. Will submit Monday.
Tests that timed out for me and would need not-valgrind exclusions in the server codebase:
- perfschema.statement_program_non_nested
- perfschema.statement_program_nesting_event_check
WITH_MAX - not needed - the auto-detection is fine.
WITH_DBUG_TRACE_OFF - good.
EXTRA_DEBUG=1, probably ok.
Didn't need to run SECURITY_HARDENED=OFF.
As we don't run mtr --embedded, compiling the EMBEDDED_SERVER is a waste. Lets omit.
mtr run:
- remove
^perfschema\.short_option_1$- it was a quick and succesful test. - encryption test are slow which is probably why its here, most seem to pass, but 4 mins each (out of 116 tests).
- --mysqld="--loose-innodb-purge-threads=1" - not required. Passes without this and since 4 is default, detecting confict/race conditions is good.
There was a problem hiding this comment.
Just re-pushed with the latest info, thank you for the suggestions.
I was testing locally on the current Valgrid image,
and it seems like setting debug off it will skip a lot of tests with the message "Requires Debug build".
Is this ok, it won't decrease Valgrind's coverage or it's a better to run tests as in a server production like build configuration?
As for bumping to a new Valgrind version by using Noble/Fedora40, can we push these changes for the current image and then come up with a builder upgrade? The CI for the current Valgrind image is missing from the buildbot configuration and I should look closely how to reproduce the original Dockerfile in the new fashion of building the complete docker file from multiple files.
There was a problem hiding this comment.
Yes, there are more debug mode tests.
These are tests that start/stop threads at different points to get repeatable results. For the most part the same execution path occurs without a debug build and most debug tests. For the testing of a memory leak the execution. The absence of control points adds a bit more entropy so might catch new errors. There may be a couple of code paths less tested. By for the most, its testing something closer to server production builds.
An older valgrind might depend bit more on the original sse cflags for the strcmp. Older gcc versions are probably keeping of those feature being exposed. Sure bump for now. The valgrind should be a basic package add to the base image and reusing that. Then just the buildbot changes to to use that image.
|
Last thought, remove -DCMAKE_BUILD_TYPE=Debug. The valgrind is looking for memory leaks. Debug is a bunch of instrumenation around syncronising between points. Its hiding race conditions that could cause leaks in production systems. |
- Compiling with embedded server it's not useful if MTR is not configured to run embedded tests - WITH-MAX = 1 removal - use auto-detection instead - PLUGIN_TOKUDB removal - TOKUDB is disabled in 10.5 and remove from newer server versions - perfschema short option - tests run quickly enough
Hi Daniel, I do not agree with dropping Debug mode. We have more test coverage on Debug builds. On the race conditions side, perhaps, but it also changes what code gets run / not run (elided due to optimizations). Furthermore, the Valgrind "virtual one CPU" probably makes race conditions more unlikely in the first place. So I would look for having a Debug and a Release build for valgrind (but not as part of this task), if you want to see if we catch more race conditions. After some time we can evaluate this and drop Debug if it really doesn't provide more value. As for the general approach to reviewing this patch: Also, with all the back and forth, it seems we have lost track of the reason for the initial request for this change, which was:
Was that discussion considered and will this patch actually have a proper answer to Monty's complaint? |
I'd say its more important to test what optimizations are generated.
But, sure, can keep Debug build as if this is a the blocker in getting the merged.
The task, adopt Monty's cmake/cflags, was intimately flawed as shown that most had no effect, or were the default. If there's no rational of changes they shouldn't be there originally. Reducing down to nothing shows that its runnable by everyone.
Added one single test.
Its pretty much done now.
opensuse was updated #462. valgrind mentioned tangentially. And just showed how bloated the config became. And how much time playing around with things that should be in the server codebase change. Simplify and don't touch it again except for version bumps.
Opensuse bump to 15.5 in #462 and we have a test for it now. |
|
@RazvanLiviuVarzaru Please add the MDBF number as part of this PR title. Let's try to do that as general practice. It is not trivial to identify the task associated with the work otherwise. The MDBF number is MDBF-710. As for the details of this task, @grooverdan 's follow up review comments, I discussed this with @RazvanLiviuVarzaru over the phone. Making a summary here: First of all, we have identified a fundamental issue with this MDBF task.
|
I don't understand your point. I agree that the task doesn't actually cover the "WHY" part more than stating that Monty uses those flags, but the tasks doesn't imply anything else than checking if the flags specified by Monty are used or not and if are not, to be added. All other assumptions as in debuging some sort of failure are not in any way related to the missing "WHY" part or to any other aspect. TBH, independent of even it being merged, I considered this to be a valid task since we like it or not, we do have flags maintained in buildbot and making changing to the flags is one of the most frequent buildbot changes. So, I really don't see your point on why the benefits are minimal or only applicable to valgrind. Now, related to "WHY", Monty is the main advocate of running valgrind in buildbot. Since, apart from him, there are various requests to even turn of the builder entirely, I assumed that it's worth checking the flags and see what's missing from Monty's configuration. Since he is the main user, I assumed the configuration that he uses has the chances of catching up the most bugs. Moreover, I think Daniel's review brings up a lot of stuff that we may transform in new issues and improve things in the future.
I don't understand what are the warning flags and why this task can only be valid if Valgrind fails. Also, there is no mention of any failure in the task. As for the "WHY", please see above. Also, if you brought this up, I really don't understand why looking at the Suse failure is in any way related to this task and should even be mentioned here and not in a separate task that actually doesn't have anything to do with this one.
Again, I am not sure who doesn't consider that to be the task. The task itself doesn't mention anything related to any failures and states that the flags should be compared. As to "why", see above. Also, since Valgrind always fails, I find a bit of a stretch to say that there isn't a problem with the builder in the first place, but not the place to discuss this. |
grooverdan
left a comment
There was a problem hiding this comment.
Acceptable.
-DCMAKE_BUILD_TYPE=Debug also acceptable.
Increases the test coverage and race conditions are unlikely to occur due do Valgrin's virtual one CPU arhitecture
|
Re-applied BUILD_TYPE = DEBUG All changes resulting from the discussions are applied so we can merge when appropriate. |
No description provided.