Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8227: [C++] Refine SIMD feature definitions #6794

Closed
wants to merge 4 commits into from

Conversation

cyb70289
Copy link
Contributor

@cyb70289 cyb70289 commented Apr 1, 2020

This patch moves SIMD feature definitions from source code to cmake,
and supports more flexible Arm64 CPU feature settings.

Binary building is controlled by two factors: compiler capability and
build requirement. Compiler capability is detected in cmake by trying
flags like "-mavx2". Build requirement is passed by cmake command line
such as "-DARROW_SIMD_LEVEL=AVX2". Combining these two factors, we can
define SIMD feature macros ARROW_HAVE_AVX2, which controls conditional
compiling of related SIMD implementations in source code.

Currently we set compiler options(e.g. -msse4.2) in cmake but define
SIMD features by checking compiler macros in source code like below:
#if defined(SSE4_2)
#define ARROW_HAVE_SSE4_2 1
#endif
Putting them together in cmake eases maintenance.

This patch moves SIMD feature definitions from source code to cmake,
and supports more flexible Arm64 CPU feature settings.

Binary building is controlled by two factors: compiler capability and
build requirement. Compiler capability is detected in cmake by trying
flags like "-mavx2". Build requirement is passed by cmake command line
such as "-DARROW_SIMD_LEVEL=AVX2". Combining these two factors, we can
define SIMD feature macros ARROW_HAVE_AVX2, which controls conditional
compiling of related SIMD implementations in source code.

Currently we set compiler options(e.g. -msse4.2) in cmake but define
SIMD features by checking compiler macros in source code like below:
  #if defined(__SSE4_2__)
  #define ARROW_HAVE_SSE4_2 1
  #endif
Putting them together in cmake eases maintenance.
@github-actions
Copy link

github-actions bot commented Apr 1, 2020

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 1, 2020

@kou @jianxind this patch changes some of your previous code, please review if it's okay, thanks.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

set(CXX_SUPPORTS_SSE4_2 TRUE)
else()
set(ARROW_SSE2_FLAG "-msse2")
set(ARROW_SSE42_FLAG "-msse4.2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ARROW_SSE4_2_FLAG name for consistency?
Other SSE4.2 variables use SSE4_2 instead of SSE42.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kou
Copy link
Member

kou commented Apr 1, 2020

@github-actions crossbow submit -g linux-arm

@github-actions
Copy link

github-actions bot commented Apr 1, 2020

Revision: 3b5ad09

Submitted crossbow builds: ursa-labs/crossbow @ actions-60

Task Status
centos-7-aarch64 Github Actions
centos-8-aarch64 Github Actions
debian-buster-arm64 Github Actions
debian-stretch-arm64 Github Actions
ubuntu-bionic-arm64 Github Actions
ubuntu-eoan-arm64 Github Actions
ubuntu-focal-arm64 Github Actions
ubuntu-xenial-arm64 Github Actions

@frankdjx
Copy link
Contributor

frankdjx commented Apr 1, 2020

ok for me, verified avx512 build and unittest locally.

@kou
Copy link
Member

kou commented Apr 1, 2020

Umm, it seems that we can't use arm_acle.h even when we can use -march=armv8-a+crc+crypto on QEMU:

https://github.com/ursa-labs/crossbow/runs/550852423

2020-04-01T07:26:57.1647002Z cd /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/src/arrow && /usr/bin/c++  -DARROW_EXPORTING -DARROW_HAVE_ARMV8_CRC -DARROW_HAVE_ARMV8_CRYPTO -DARROW_HAVE_NEON -DARROW_HDFS -DARROW_JEMALLOC -DARROW_JEMALLOC_INCLUDE_DIR="" -DARROW_WITH_BACKTRACE -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DURI_STATIC_BUILD -I/root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/src -I/root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src -I/root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/generated -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/thirdparty/flatbuffers/include -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/boost_ep-prefix/src/boost_ep -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/thrift_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/protobuf_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/jemalloc_ep-prefix/src -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/thirdparty/hadoop/include -isystem /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/build/orc_ep-install/include  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -O3 -DNDEBUG  -Wall -Wno-attributes -march=armv8-a+crc+crypto  -DNDEBUG -fPIC   -std=gnu++11 -o CMakeFiles/arrow_objlib.dir/ipc/json_integration.cc.o -c /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/arrow/ipc/json_integration.cc
2020-04-01T07:26:58.3357847Z In file included from /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/arrow/json/rapidjson_defs.h:35:0,
2020-04-01T07:26:58.3360518Z                  from /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/arrow/ipc/json_internal.h:24,
2020-04-01T07:26:58.3362499Z                  from /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/arrow/ipc/json_integration.cc:29:
2020-04-01T07:26:58.3363049Z /root/rpmbuild/BUILD/apache-arrow-0.16.1.dev402/cpp/src/arrow/util/neon_util.h:25:22:fatal error: arm_acle.h: No such file or directory
2020-04-01T07:26:58.3363498Z  #include <arm_acle.h>

Should we specify -DARROW_ARMV8_ARCH explicitly for Linux packages?
Or should we improve our feature detection?
(Or should we stop using QEMU?)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. This is a very nice improvement.
Is it desirable to use CMAKE_SYSTEM_PROCESSOR at the toplevel to select some of the CMake behaviour? For example, it doesn't make sense to check Altivec support when compiling for x86 or ARM...

# power compiler flags
check_cxx_compiler_flag("-maltivec" CXX_SUPPORTS_ALTIVEC)
set(ARROW_ALTIVEC_FLAG "-maltivec")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should check any -m flag under MSVC. It will fail and takes a bit of time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(by a bit of time, I mean each of these checks seem to take 1 second on my Windows VM)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, better to avoid these unnecessary checking. Will do.
CMAKE_SYSTEM_PROCESSOR may break cross building, but I don't think it's reasonable to cross build a complex application like Arrow.

if(CXX_SUPPORTS_SSE4_2)
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} ${ARROW_SSE4_2_FLAG}")
add_definitions(-DARROW_HAVE_SSE4_2 -DARROW_HAVE_SSE2)
elseif(CXX_SUPPORTS_SSE2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should fall back here. Is it useful to support compilers with SSE2 but without SSE4.2 support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Arrow supported compilers(gcc4.8+, clang-7+, msvc2015+), SSE2(actually SSE4.2) must be available.
There are some source code with "ifdef SSE2". I will remove SSE2 definition here and refine the code using it.

else()
set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} ${ARROW_ARMV8_CRC_FLAG}")
if(CXX_SUPPORTS_ARMV8)
if(NOT CXX_SUPPORTS_ARMV8_ARCH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difference between CXX_SUPPORTS_ARMV8 and CXX_SUPPORTS_ARMV8_ARCH. Can you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be clear after refined with CMAKE_SYSTEM_PROCESSOR.

CXX_SUPPORTS_ARMV8 detects if compiler supports arm64 (-march=armv8-a)
CXX_SUPPORTS_ARMV8_ARCH detects if compiler supports arch requirement from cmake command line (-DARROW_ARMV8_ARCH=armv8.1-a+crypto)

@@ -22,31 +22,20 @@

#include "arrow/util/macros.h"

#ifdef ARROW_USE_SIMD
Copy link
Member

@pitrou pitrou Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ARROW_USE_SIMD still referenced in the codebase? If not, can you remove it from CMake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used anymore in codebase.
It's used in cmake, acts as a central switch to turn on/off all simd flags such as ARROW_HAVE_SSE4_2.
Looks it's still useful? It's listed in benchmark doc. Code comment below:
#Disable this option to exercise non-SIMD fallbacks
define_option(ARROW_USE_SIMD "Build with SIMD optimizations" ON)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. Let's keep it then.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 2, 2020

@kou , about arm_acle.h error, it's because centos-7 is using gcc 4.8, which doesn't include this header file. See: https://stackoverflow.com/a/44681660/6837980

And from build log: https://github.com/ursa-labs/crossbow/runs/550852423
2020-04-01T04:39:57.5772349Z -- Building using CMake version: 3.14.6 2020-04-01T04:39:58.6977786Z -- The C compiler identification is GNU 4.8.5 2020-04-01T04:39:59.7948458Z -- The CXX compiler identification is GNU 4.8.5

I see we're testing arm64 build inside a container based on arm64v8/centos:7.
CentOS 8 has released, it includes gcc8.3. I would recommend upgrading the base image to arm64v8/centos:8.

[EDIT]
Looks centos:8 is already in test matrix and failed with other reasons, same for many other arm64 tests.
Are these tests important for arrow release? I can help improving if necessary.

@kou
Copy link
Member

kou commented Apr 2, 2020

@cyb70289 Thanks for the information.
We can drop support for CentOS 7 aarch64 package if it blocks our development.
Should we drop support for CentOS 7 aarch64? (I don't have opinion for this.)

Looks centos:8 is already in test matrix and failed with other reasons, same for many other arm64 tests.
Are these tests important for arrow release? I can help improving if necessary.

The test failure of them is Thrift download error. It'll be fixed another try. So we don't need to do anything for this.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 2, 2020

We can drop support for CentOS 7 aarch64 package if it blocks our development.
Should we drop support for CentOS 7 aarch64? (I don't have opinion for this.)

@kou CentOS7 eol is 2024, looks we should keep it. But gcc4.8 is too old with poor support for aarch64. I think we can install a newer gcc version on centos7 in our aarch64 CI job. Will do some tests to see if it's workable.

@kou
Copy link
Member

kou commented Apr 3, 2020

We can use g++ 8 on CentOS 7 aarch64 by devtoolset-8 package.
But I couldn't use binary build with g++ 8 on CentOS 7 (libstdc++-4.8.5-39.el7.x86_64, I tested on x86_64 not on aarch64, sorry). I used -D_GLIBCXX_USE_CXX11_ABI=0 for g++ 4 and g++ 5 or later compatibility but it doesn't help.

I used https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row-wise-conversion-example.cc for test:

$ c++ -o row-wise-conversion-example{,.cc} $(pkg-config --cflags --libs arrow) -std=gnu++11 -O0 -g3 -DNDEBUG
$ gdb --args ./row-wise-conversion-example
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /row-wise-conversion-example...done.
(gdb) r
Starting program: /./row-wise-conversion-example 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff3dff700 (LWP 12925)]

Program received signal SIGSEGV, Segmentation fault.
0x00000000004135bd in __gnu_cxx::new_allocator<arrow::ListType>::destroy<arrow::ListType> (
    this=0x62c770, __p=0x61e310 <vtable for arrow::ListType+16>)
    at /usr/include/c++/4.8.2/ext/new_allocator.h:124
124	        destroy(_Up* __p) { __p->~_Up(); }
Missing separate debuginfos, use: debuginfo-install brotli-1.0.7-5.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 glibc-2.17-292.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 libzstd-1.4.4-1.el7.x86_64 lz4-1.7.5-2.el7.x86_64 snappy-1.1.0-3.el7.x86_64 zlib-1.2.7-18.el7.x86_64

Here is the change to use g++ 8 on CentOS 7:

diff --git a/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in b/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
index fc6f5158f..95444b4be 100644
--- a/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
+++ b/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in
@@ -19,6 +19,8 @@
 
 %define _centos_ver %{?centos_ver:%{centos_ver}}%{!?centos_ver:8}
 
+%define use_cxx11_abi (%{_centos_ver} >= 8)
+
 %define boost_version %( \
   if [ "%{_centos_ver}" = 7 ]; then \
     echo 169; \
@@ -62,7 +64,6 @@ BuildRequires:	brotli-devel
 %endif
 BuildRequires:	bzip2-devel
 BuildRequires:	cmake%{cmake_version}
-BuildRequires:	gcc-c++
 BuildRequires:	gflags-devel
 BuildRequires:	git
 %if %{_centos_ver} >= 7
@@ -114,6 +115,9 @@ Apache Arrow is a data processing library for analysis.
 cpp_build_type=release
 mkdir cpp/build
 cd cpp/build
+%if !%{use_cxx11_abi}
+CXXFLAGS="%optflags -D_GLIBCXX_USE_CXX11_ABI=0"
+%endif
 %cmake3 .. \
 %if %{use_flight}
   -DARROW_FLIGHT=ON \
diff --git a/dev/tasks/linux-packages/apache-arrow/yum/centos-7/Dockerfile b/dev/tasks/linux-packages/apache-arrow/yum/centos-7/Dockerfile
index bfc34819a..e0f366ea0 100644
--- a/dev/tasks/linux-packages/apache-arrow/yum/centos-7/Dockerfile
+++ b/dev/tasks/linux-packages/apache-arrow/yum/centos-7/Dockerfile
@@ -22,11 +22,15 @@ COPY qemu-* /usr/bin/
 
 ARG DEBUG
 
+ENV \
+  DEVTOOLSET_VERSION=8
+
 RUN \
   quiet=$([ "${DEBUG}" = "yes" ] || echo "--quiet") && \
   yum update -y ${quiet} && \
-  yum install -y ${quiet} epel-release && \
-  yum groupinstall -y ${quiet} "Development Tools" && \
+  yum install -y ${quiet} \
+    centos-release-scl \
+    epel-release && \
   yum install -y ${quiet} \
     autoconf-archive \
     bison \
@@ -34,6 +38,7 @@ RUN \
     brotli-devel \
     bzip2-devel \
     cmake3 \
+    devtoolset-${DEVTOOLSET_VERSION} \
     flex \
     gflags-devel \
     git \

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 3, 2020

We can use g++ 8 on CentOS 7 aarch64 by devtoolset-8 package.
But I couldn't use binary build with g++ 8 on CentOS 7

Not sure of the reason. I did a quick test in centos:7 x86 container. I manually install required packages and build/test arrow. It works correctly, row-wise-conversion-example.cc is also ok.

For this aarch64 gcc4.8 arm_acel.h issue, I think we can workaround it by checking cpu arch and compiler version to disable ARROW_HAVE_ARMV8_CRC if necessary. It looks easier to handle and has few impacts.

@kou
Copy link
Member

kou commented Apr 3, 2020

I did a quick test in centos:7 x86 container. I manually install required packages and build/test arrow. It works correctly, row-wise-conversion-example.cc is also ok.

Ah, sorry. We need to use g++ 8 for build (we can use g++ 8 in scl enable devtoolset-8 sh environment) and use g++ 4 for test.

For this aarch64 gcc4.8 arm_acel.h issue, I think we can workaround it by checking cpu arch and compiler version to disable ARROW_HAVE_ARMV8_CRC if necessary. It looks easier to handle and has few impacts.

I think so too. Could you work on this? We already have some CMAKE_CXX_COMPILER_VERSION checks in SetupCxxFlags.cmake.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 4, 2020

For this aarch64 gcc4.8 arm_acel.h issue, I think we can workaround it by checking cpu arch and compiler version to disable ARROW_HAVE_ARMV8_CRC if necessary. It looks easier to handle and has few impacts.

I think so too. Could you work on this? We already have some CMAKE_CXX_COMPILER_VERSION checks in SetupCxxFlags.cmake.

Sure. I'll refine this patch to include the check.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 4, 2020

Main changes:

  • Check compiler options per cpu arch.
  • Remove SSE2. Replace ifdef SSE2 with SSE4_2. SSE2 is always supported by Arrow required compilers.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. Looks mostly good, just one question.

@@ -36,10 +36,6 @@
#include "arrow/util/sse_util.h"

// enable SIMD whitespace skipping, if available
#if defined(ARROW_HAVE_SSE2)
#define RAPIDJSON_SSE2 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep defining RAPIDJSON_SSE2, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From below rapidjson code, looks RAPIDJSON_SSE2 should be kept (though I suspect that code should check SSE42 first). Will update this patch to include RAPIDJSON_SSE2.
https://github.com/Tencent/rapidjson/blob/master/test/perftest/rapidjsontest.cpp#L30

Change-Id: I7cc37a36a87c70b4e91387bb7450547e771a1d67
@pitrou
Copy link
Member

pitrou commented Apr 6, 2020

CI failures are unrelated, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants