ARROW-17436: [C++] Use -O2 instead of -O3 for RELEASE builds #13661

wesm · 2022-07-20T20:59:00Z

Motivated by investigation in #13654. To be discussed

wesm · 2022-07-20T20:59:15Z

@ursabot benchmark please

ursabot · 2022-07-20T20:59:17Z

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks:
@ursabot please benchmark

To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java
@ursabot please benchmark lang=JavaScript

To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2 --iterations=3

For other command=cpp-micro options, please see https://github.com/ursacomputing/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

wesm · 2022-07-20T20:59:29Z

@ursabot please benchmark

ursabot · 2022-07-20T20:59:34Z

Benchmark runs are scheduled for baseline = 1214083 and contender = 46e3195. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 46e31957 ec2-t3-xlarge-us-east-2
[Failed] 46e31957 test-mac-arm
[Failed] 46e31957 ursa-i9-9960x
[Finished] 46e31957 ursa-thinkcentre-m75q
[Failed] 1214083f ec2-t3-xlarge-us-east-2
[Failed] 1214083f test-mac-arm
[Failed] 1214083f ursa-i9-9960x
[Finished] 1214083f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

wesm · 2022-07-20T21:01:47Z

cc @pitrou

wesm · 2022-07-20T21:25:43Z

Here's a dump of symbols that shrink the most in -O2:

https://gist.github.com/wesm/4a2815077ed37b671d6160b8abec5e7c

I'd be interested to see if e.g. unsafe numeric casts are significantly affected by this

cyb70289 · 2022-07-21T02:32:16Z

Looks great.

I believe the big regression from some tests are not real.
E.g., arrow-bit-util-benchmark : BenchmarkBitmapVisitUInt8And/32768/0 drops from 13.983 GiB/s (O3) to 503.375 MiB/s (O2).
Tested on my local host with clang-12, the result is 134MB/s, both O2 and O3. The huge gap is probably due to aggressive inline and optimization which makes the micro-benchmark far from reality.

One catch is gcc -O2 disables vectorization, while clang -O2 keeps it. We may need additional -fxxxx if want to keep some useful features.

pitrou · 2022-07-21T07:19:10Z

Hmm, perhaps the bit-util micro-benchmarks are a bit pathologic, but other regressions seem real and quite significant...

cyb70289 · 2022-07-21T07:28:35Z

Not sure of gcc version used in conbench.
From gcc-10 man:

-O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following optimization
    flags:

    -fgcse-after-reload -finline-functions -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops
    -fpredictive-commoning -fsplit-paths -ftree-loop-distribute-patterns -ftree-loop-distribution -ftree-loop-vectorize
    -ftree-partial-pre -ftree-slp-vectorize -funswitch-loops -fvect-cost-model -fversion-loops-for-strides

Perhaps try -O2 -ftree-loop-vectorize? This does impact arrow-bit-util-benchmark gcc benchmark result per my test.

pitrou · 2022-07-21T07:35:40Z

There's -ftree-vectorize which apparently enables all vectorization (both "tree" and "SLP").

pitrou · 2022-07-21T07:36:56Z

Also, apparently with gcc 12 the following flags are enabled with -O2: -ftree-loop-vectorize -ftree-slp-vectorize -fvect-cost-model=very-cheap

pitrou · 2022-07-21T08:15:41Z

cpp/cmake_modules/SetupCxxFlags.cmake

 if(NOT MSVC)
+  string(REPLACE "-O3 -DNDEBUG" "" CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE}")


Why remove -DNDEBUG here?

Especially, RelWithDebInfo needs -DNDEBUG otherwise runtime assertions are enabled.

pitrou · 2022-07-21T08:46:00Z

@ursabot please benchmark lang=C++

ursabot · 2022-07-21T08:46:04Z

Benchmark runs are scheduled for baseline = 1214083 and contender = e4e430b. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] e4e430b4 test-mac-arm
[Finished] e4e430b4 ursa-thinkcentre-m75q
[Failed] 1214083f ec2-t3-xlarge-us-east-2
[Failed] 1214083f test-mac-arm
[Failed] 1214083f ursa-i9-9960x
[Finished] 1214083f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

pitrou · 2022-07-21T11:32:10Z

Ok, so I run locally with gcc 9.4.0 (Ubuntu 20.04) on AMD Zen 2.

Build times (with ccache disabled)

-O3:

real	1m25,888s
user	21m58,527s
sys	0m48,888s

-O2:

real	1m21,713s
user	20m57,462s
sys	0m48,843s

-O2 -ftree-vectorize:

real	1m23,461s
user	21m11,970s
sys	0m49,485s

Lib sizes

-O3:

   text	   data	    bss	    dec	    hex	filename
23230116	 326105	2197509	25753730	188f882	build-bundled-release/release/libarrow.so
1276669	  24928	   2346	1303943	 13e587	build-bundled-release/release/libarrow_testing.so

-O2:

22574451	 326665	2197509	25098625	17ef981	build-bundled-release/release/libarrow.so
1268546	  25144	   2346	1296036	 13c6a4	build-bundled-release/release/libarrow_testing.so

-O2 -ftree-vectorize:

22639315	 326713	2197509	25163537	17ff711	build-bundled-release/release/libarrow.so
1257242	  25144	   2346	1284732	 139a7c	build-bundled-release/release/libarrow_testing.so

Compute benchmarks

-O2 vs -O3: https://gist.github.com/pitrou/e76b89f7e7e1f0e9a361410286c4a198
-O2 -ftree-vectorize vs -O3: https://gist.github.com/pitrou/feefd0819a513e6068dcd17b12d875e6
-O2 -ftree-vectorize vs -O2: https://gist.github.com/pitrou/083d0abd3f404be47716ae31cddda2db

All in all, in this case:

the size and build time reductions are significant but small (less than 5%)
-ftree-vectorize brings a net performance improvement over bare -O2
performance seems globally comparable between -O2 -ftree-vectorize and -O3, despite some large disparities in individual micro-benchmarks

pitrou · 2022-07-21T11:36:13Z

Hmm, I notice that -DNDEBUG was not passed anymore, so perhaps that affected some benchmarks :-(

Edit: re-ran benchmarks and updated the gists above.

pitrou · 2022-07-21T11:39:13Z

@ursabot please benchmark lang=C++

ursabot · 2022-07-21T11:39:16Z

Benchmark runs are scheduled for baseline = 1214083 and contender = 6dd7bab. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 6dd7bab5 test-mac-arm
[Finished] 6dd7bab5 ursa-thinkcentre-m75q
[Failed] 1214083f ec2-t3-xlarge-us-east-2
[Failed] 1214083f test-mac-arm
[Failed] 1214083f ursa-i9-9960x
[Finished] 1214083f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

pitrou · 2022-07-21T13:12:55Z

cc @save-buffer @westonpace for further opinions

wesm · 2022-07-21T13:34:51Z

I would guess there are places that benefit from loop unswitching also.

pitrou · 2022-07-21T14:03:15Z

I would guess there are places that benefit from loop unswitching also.

Hmm, I don't think it's our duty to micro-optimize compiler options, though. There are too many moving parts (compiler brand, compiler version, architecture, etc.).

wesm · 2022-07-21T14:06:00Z

I would say that we should just keep O3 and keep an eye on symbol sizes in case we need to intervene occasionally. On the whole I think the symbol sizes we have are not too bad.

wesm · 2022-07-21T19:15:17Z

@ursabot please benchmark lang=C++

ursabot · 2022-07-21T19:15:36Z

Benchmark runs are scheduled for baseline = 8a2acaa and contender = 7e5ca1a. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 7e5ca1a5 test-mac-arm
[Finished] 7e5ca1a5 ursa-thinkcentre-m75q
[Failed] 8a2acaa4 ec2-t3-xlarge-us-east-2
[Failed] 8a2acaa4 test-mac-arm
[Failed] 8a2acaa4 ursa-i9-9960x
[Finished] 8a2acaa4 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

wesm · 2022-07-21T19:15:59Z

I added -ftree-vectorize for gcc

pitrou · 2022-07-21T19:41:35Z

You undid the changes I had already pushed :-(

cpp/cmake_modules/SetupCxxFlags.cmake

save-buffer · 2022-07-21T20:11:29Z

I've been trying to get caught up on the context here - I took a look at #13654. My current understanding is:

The problem we are trying to solve are insanely large functions generated by the codegen framework when using -O3
The theory is that it has to do with -O3 applying tons of crazy optimizations that leads to lots of bloat due to too much vectorized code
Does that sound right?

So looking at the results, -O3 adds about 1MB (to ~22MB) to the total binary size, so I think that's not an issue itself. However, there is something to be said about bloating individual kernels. Reading the other PR, it seems like one of the kernels was 40 KB big? That's quite alarming as chips these days have about 32 KB of icache. In the worst case, that's quite a bit of thrashing.
That particular disassembly looks to me like the compiler is vectorizing and unrolling the loop after vectorizing it.

As for solutions: Looking at the benchmarks, it seems like the current code is pretty unstable with regards to what the compiler generates when it comes to flags. I'm not sure messing with compiler flags will be one-size-fits-all as each combination of flags causes large changes in the generated code. I did like the changes in #13654.

I really liked this point, which very much aligns with my experience and intuition that abstract templates lead to unstable code generation:

our approach (so much for "zero cost abstractions") for generalizing to abstract between writing to an array versus packing a bitmap is causing too much code to be generated.

So in my mind, two solutions we could have are:

Keep existing code and compilation flags but explicitly disable them for problematic kernels (using something like #pragma GCC push_options and #pragma GCC pop_options, though I'm not sure if there's a way to do this on MSVC).
Change the code to use fewer templates and more raw for loops. If we're feeling really adventurous, we could write a Python or Jinja script that generates the kernels as the simplest possible for loop (I know this is the approach used in a lot of databases). I have never seen a problem with this style of code even on -O3.

wesm · 2022-07-21T21:06:10Z

@save-buffer thanks for your comments

Change the code to use fewer templates and more raw for loops. If we're feeling really adventurous, we could write a Python or Jinja script that generates the kernels as the simplest possible for loop (I know this is the approach used in a lot of databases). I have never seen a problem with this style of code even on -O3.

I agree also with this -- I know that some feel that manually generating code when you can have templates "do it for you" is an antipattern, but it seems at least that the code in compute/kernels/codegen_internal.h has gone a little too far introducing abstractions where we are putting too much blind faith in the compiler (e.g. the "OutputAdapter").

Not a priority by any means among our myriad priorities but perhaps something for us to occasionally hack at in our idle moments (I did #13654 when I was bored on an airplane)

wesm · 2022-07-21T22:50:01Z

In https://conbench.ursa.dev/compare/runs/e938638743e84794ad829524fae04fbd...20727b1b390e4b30be10f49db7f06f3f/ it seems that there are several hundred microbenchmarks with > 10% performance regressions but also over 100 microbenchmarks with > 10% performance improvement. I'd say it's a coin toss whether to move to -O2 (with -ftree-vectorize) versus -O3.

westonpace · 2022-07-22T17:39:26Z

I would very much like to run the TPC-H benchmarks on this change. They are failing in conbench at the moment. There is a fix for these benchmarks in PR right now (#13679) so maybe we can run it after. That will at least give us some sense of the impact at a macro-level.

wesm · 2022-07-22T18:21:35Z

Makes sense. Let’s get more data and make a decision after 9.0.0 goes out.

pitrou · 2022-08-09T09:20:58Z

@westonpace Are you planning to get/report TPC-H benchmark numbers for this?

westonpace · 2022-08-09T17:02:31Z

@ursabot please benchmark lang=R

ursabot · 2022-08-09T17:02:36Z

Benchmark runs are scheduled for baseline = 8a2acaa and contender = 47fcf77. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-i9-9960x
[Skipped ⚠️ Only ['C++', 'Java'] langs are supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 47fcf771 test-mac-arm
[Failed] 47fcf771 ursa-i9-9960x
[Failed] 8a2acaa4 ec2-t3-xlarge-us-east-2
[Failed] 8a2acaa4 test-mac-arm
[Failed] 8a2acaa4 ursa-i9-9960x
[Finished] 8a2acaa4 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

westonpace · 2022-08-09T20:36:45Z

Ah, I think a rebase is needed to get these passing. I'll do that real quick.

westonpace · 2022-08-10T19:22:00Z

@pitrou I got the results here: https://conbench.ursa.dev/compare/runs/b724609840e242afbf4e1e26682afbe3...b742cce58407420db4da8e461604a1db/

There were no significant changes (one of the queries was 15% faster and everything else was within +/- 5%) so I think I'm +1 for this change.

pitrou · 2022-08-16T17:12:24Z

@github-actions crossbow submit -g cpp -g python -g r

github-actions · 2022-08-16T19:36:59Z

https://issues.apache.org/jira/browse/ARROW-17436

github-actions · 2022-08-16T19:38:35Z

Revision: de53440

Submitted crossbow builds: ursacomputing/crossbow @ actions-0c44f1532a

Task	Status
conda-linux-gcc-py37-cpu-r40
conda-linux-gcc-py37-cpu-r41
conda-osx-clang-py37-r40
conda-osx-clang-py37-r41
conda-win-vs2017-py37-r40
conda-win-vs2017-py37-r41
homebrew-r-autobrew
homebrew-r-brew
r-binary-packages
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-conda-python-3.10
test-conda-python-3.7
test-conda-python-3.7-hdfs-2.9.2
test-conda-python-3.7-hdfs-3.2.1
test-conda-python-3.7-kartothek-latest
test-conda-python-3.7-kartothek-master
test-conda-python-3.7-pandas-0.24
test-conda-python-3.7-pandas-latest
test-conda-python-3.7-spark-v3.1.2
test-conda-python-3.8
test-conda-python-3.8-hypothesis
test-conda-python-3.8-pandas-latest
test-conda-python-3.8-pandas-nightly
test-conda-python-3.8-spark-v3.2.0
test-conda-python-3.9
test-conda-python-3.9-dask-latest
test-conda-python-3.9-dask-master
test-conda-python-3.9-pandas-master
test-conda-python-3.9-spark-master
test-debian-10-cpp-amd64
test-debian-10-cpp-i386
test-debian-11-cpp-amd64
test-debian-11-cpp-i386
test-debian-11-python-3
test-fedora-35-cpp
test-fedora-35-python-3
test-fedora-r-clang-sanitizer
test-r-arrow-backwards-compatibility
test-r-depsource-bundled
test-r-depsource-system
test-r-dev-duckdb
test-r-devdocs
test-r-gcc-11
test-r-gcc-12
test-r-install-local
test-r-linux-as-cran
test-r-linux-rchk
test-r-linux-valgrind
test-r-minimal-build
test-r-offline-maximal
test-r-offline-minimal
test-r-rhub-debian-gcc-devel-lto-latest
test-r-rhub-debian-gcc-release-custom-ccache
test-r-rhub-ubuntu-gcc-release-latest
test-r-rocker-r-base-latest
test-r-rstudio-r-base-4.1-opensuse153
test-r-rstudio-r-base-4.2-centos7-devtoolset-8
test-r-rstudio-r-base-4.2-focal
test-r-ubuntu-22.04
test-r-versions
test-ubuntu-18.04-cpp
test-ubuntu-18.04-cpp-release
test-ubuntu-18.04-cpp-static
test-ubuntu-18.04-r-sanitizer
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-14
test-ubuntu-20.04-cpp-17
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-20.04-python-3
test-ubuntu-22.04-cpp

cyb70289

LGTM

ursabot · 2022-08-17T09:41:43Z

Benchmark runs are scheduled for baseline = 682c63a and contender = 9d1bbaf. 9d1bbaf is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-i9-9960x
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 9d1bbaff ec2-t3-xlarge-us-east-2
[Finished] 9d1bbaff test-mac-arm
[Failed] 9d1bbaff ursa-i9-9960x
[Finished] 9d1bbaff ursa-thinkcentre-m75q
[Finished] 682c63a3 ec2-t3-xlarge-us-east-2
[Failed] 682c63a3 test-mac-arm
[Failed] 682c63a3 ursa-i9-9960x
[Finished] 682c63a3 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kou · 2022-08-22T04:31:18Z

I don't know why but it seems that the "AMD64 Windows R 3.6 RTools 35" CI job is failed since this change is merged into master:

https://github.com/apache/arrow/runs/7866259947?check_suite_focus=true#step:11:699

Error: Error: package or namespace load failed for 'arrow' in inDL(x, as.logical(local), as.logical(now), ...):
 unable to load shared object 'D:/a/arrow/arrow/r/check/arrow.Rcheck/00LOCK-arrow/00new/arrow/libs/i386/arrow.dll':
  LoadLibrary failure:  A dynamic link library (DLL) initialization routine failed.
`

wesm · 2022-08-22T08:25:59Z

That’s really strange. Where is the log for the job that builds the artifacts that depends on? Is the tarball it is downloading stale by chance?

pitrou · 2022-08-22T09:37:39Z

I don't know. Perhaps @paleolimbot wants to take a look.

But regardless, we're now having a discussion to drop RTools 3.5 on the ML, so I'm not sure that matters much.

paleolimbot · 2022-08-24T18:43:13Z

As you noted we're discussing dropping support for the failing platform. I don't currently have a development environment for RTools 35...while I could set one up, I'm not keen to spend a bunch of time doing that if we're about to drop support. I'll open a discussion with the R developers as to how we'll solve the issue.

…13661) Motivated by investigation in apache#13654. To be discussed Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

github-actions bot added the Component: C++ label Jul 20, 2022

pitrou reviewed Jul 21, 2022

View reviewed changes

wesm force-pushed the cpp-build-o2 branch from 6dd7bab to 7e5ca1a Compare July 21, 2022 19:14

pitrou reviewed Jul 21, 2022

View reviewed changes

cpp/cmake_modules/SetupCxxFlags.cmake Outdated Show resolved Hide resolved

cpp/cmake_modules/SetupCxxFlags.cmake Outdated Show resolved Hide resolved

cpp/cmake_modules/SetupCxxFlags.cmake Show resolved Hide resolved

wesm and others added 5 commits August 16, 2022 19:06

Use -O2 instead of -O3

2557a25

Strip optimization flags from cmake defaults

6812fda

Try to enable vectorization on gcc

9fa7135

Restore -DNDEBUG

f707d46

Fix cmake-format

de53440

pitrou force-pushed the cpp-build-o2 branch from 47fcf77 to de53440 Compare August 16, 2022 17:08

pitrou changed the title ~~[C++][DONOTMERGE] Use -O2 instead of -O3 for RELEASE builds~~ ARROW-17436: [C++] Use -O2 instead of -O3 for RELEASE builds Aug 16, 2022

pitrou requested a review from cyb70289 August 16, 2022 17:11

cyb70289 approved these changes Aug 17, 2022

View reviewed changes

pitrou merged commit 9d1bbaf into apache:master Aug 17, 2022

dragosmg mentioned this pull request Aug 22, 2022

ARROW-14071: [R] Try to arrow_eval user-defined functions - explicit approach #13789

Closed

kou mentioned this pull request Feb 13, 2023

ARROW-18231: [C++][CMake] Add support for overriding optimization level #15022

Merged

		if(NOT MSVC)
		string(REPLACE "-O3 -DNDEBUG" "" CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE}")

ARROW-17436: [C++] Use -O2 instead of -O3 for RELEASE builds #13661

ARROW-17436: [C++] Use -O2 instead of -O3 for RELEASE builds #13661

Conversation

wesm commented Jul 20, 2022

wesm commented Jul 20, 2022

ursabot commented Jul 20, 2022

wesm commented Jul 20, 2022

ursabot commented Jul 20, 2022 • edited Loading

wesm commented Jul 20, 2022

wesm commented Jul 20, 2022

cyb70289 commented Jul 21, 2022

pitrou commented Jul 21, 2022

cyb70289 commented Jul 21, 2022

pitrou commented Jul 21, 2022

pitrou commented Jul 21, 2022

pitrou Jul 21, 2022

Choose a reason for hiding this comment

pitrou Jul 21, 2022

Choose a reason for hiding this comment

pitrou commented Jul 21, 2022

ursabot commented Jul 21, 2022 • edited Loading

pitrou commented Jul 21, 2022 • edited Loading

Build times (with ccache disabled)

Lib sizes

Compute benchmarks

pitrou commented Jul 21, 2022 • edited Loading

pitrou commented Jul 21, 2022

ursabot commented Jul 21, 2022 • edited Loading

pitrou commented Jul 21, 2022

wesm commented Jul 21, 2022

pitrou commented Jul 21, 2022

wesm commented Jul 21, 2022

wesm commented Jul 21, 2022

ursabot commented Jul 21, 2022 • edited Loading

wesm commented Jul 21, 2022

pitrou commented Jul 21, 2022

save-buffer commented Jul 21, 2022 • edited Loading

wesm commented Jul 21, 2022

wesm commented Jul 21, 2022

westonpace commented Jul 22, 2022

wesm commented Jul 22, 2022

pitrou commented Aug 9, 2022

westonpace commented Aug 9, 2022

ursabot commented Aug 9, 2022 • edited Loading

westonpace commented Aug 9, 2022

westonpace commented Aug 10, 2022

pitrou commented Aug 16, 2022

github-actions bot commented Aug 16, 2022

github-actions bot commented Aug 16, 2022

cyb70289 left a comment

Choose a reason for hiding this comment

ursabot commented Aug 17, 2022

kou commented Aug 22, 2022

wesm commented Aug 22, 2022

pitrou commented Aug 22, 2022

paleolimbot commented Aug 24, 2022

ursabot commented Jul 20, 2022 •

edited

Loading

ursabot commented Jul 21, 2022 •

edited

Loading

pitrou commented Jul 21, 2022 •

edited

Loading

pitrou commented Jul 21, 2022 •

edited

Loading

ursabot commented Jul 21, 2022 •

edited

Loading

ursabot commented Jul 21, 2022 •

edited

Loading

save-buffer commented Jul 21, 2022 •

edited

Loading

ursabot commented Aug 9, 2022 •

edited

Loading