Skip to content

Commit

Permalink
ARROW-4372: [C++] Embed precompiled bitcode in the gandiva library
Browse files Browse the repository at this point in the history
We were not running the pyarrow tests after installing the manylinux wheels, which can lead to uncaught issues, like: https://travis-ci.org/kszucs/crossbow/builds/484284104

Author: Krisztián Szűcs <szucs.krisztian@gmail.com>

Closes #3484 from kszucs/manylinux_tests and squashes the following commits:

3b1da30 <Krisztián Szűcs> use sudo
c573a56 <Krisztián Szűcs> use env variables insude the container
fd5e3fe <Krisztián Szűcs> use latest docker image tag
d5531d9 <Krisztián Szűcs> test imports inside the wheel container
1aa19f1 <Krisztián Szűcs> reenable travis builds
b399496 <Krisztián Szűcs> test py27mu and py36m wheels
71233c7 <Krisztián Szűcs> test 2.7,16 wheel
2372f3d <Krisztián Szűcs> fix requirements path; disable other CI tests
3e4ec2a <Krisztián Szűcs> unterminated llvm:MemoryBuffer; fix check_import.py path
7c88d61 <Krisztián Szűcs> only build python 3.6 wheel
18c5488 <Krisztián Szűcs> install wheel from dist dir
0bb07a7 <Krisztián Szűcs> better bash split
54fc653 <Krisztián Szűcs> don't export
d3cb058 <Krisztián Szűcs> fix wheel building script
0d29b31 <Krisztián Szűcs> remove not existing resources from gandiva's pom
5d75adb <Krisztián Szűcs> initialize jni loader
09d829a <Krisztián Szűcs> build wheels for a single python distribution at a time; adjust travis and crossbow scripts
79abc0e <Krisztián Szűcs> mark .cc file as generated
af78be2 <Krisztián Szűcs> don't bundle irhelpers in the jar
a88cd37 <Krisztián Szűcs> cmake format
7deb359 <Krisztián Szűcs> fix REGEX; remove byteCodeFilePath from java configuration object
fa19529 <Krisztián Szűcs> properly construct llvm:StringRef
5841dcd <Krisztián Szűcs> remove commented code
42391b1 <Krisztián Szűcs> don't pass precompiled bitcode all around the constructors
d480c83 <Krisztián Szűcs> use const string ref for now
b0b1117 <Krisztián Szűcs> conda llvmdev
169f43a <Krisztián Szűcs> build gandiva in cpp docker image
cb69625 <Krisztián Szűcs> silent maven download msgs
19200c3 <Krisztián Szűcs> don't run wheel tests twice; cmake format
f2205d0 <Krisztián Szűcs> gandiva jni
dbf5b1c <Krisztián Szűcs> embed precompiled bitcode as char array; load precompiled IR from string
00d98e0 <Krisztián Szűcs> try to bundle bytecode files
97fe94b <Krisztián Szűcs> fix requirements-test.txt path
86e7e5b <Krisztián Szűcs> run pyarrow tests in manylinux CI build
  • Loading branch information
kszucs committed Feb 21, 2019
1 parent a977250 commit e8cc48b
Show file tree
Hide file tree
Showing 23 changed files with 207 additions and 235 deletions.
2 changes: 2 additions & 0 deletions .travis.yml
Expand Up @@ -225,6 +225,8 @@ matrix:
- $TRAVIS_BUILD_DIR/ci/travis_script_python.sh 3.6
- name: "[manylinux1] Python"
language: cpp
env:
- PYTHON_VERSIONS="2.7,32 3.6,16"
before_script:
- if [ $ARROW_CI_PYTHON_AFFECTED == "1" ]; then docker pull quay.io/xhochy/arrow_manylinux1_x86_64_base:llvm-7-manylinux1; fi
script:
Expand Down
2 changes: 2 additions & 0 deletions ci/docker_build_cpp.sh
Expand Up @@ -34,6 +34,8 @@ cmake -GNinja \
-DARROW_PARQUET=${ARROW_PARQUET:-ON} \
-DARROW_HDFS=${ARROW_HDFS:-OFF} \
-DARROW_PYTHON=${ARROW_PYTHON:-OFF} \
-DARROW_GANDIVA=${ARROW_GANDIVA:-OFF} \
-DARROW_GANDIVA_JAVA=${ARROW_GANDIVA_JAVA:-OFF} \
-DARROW_BUILD_TESTS=${ARROW_BUILD_TESTS:-OFF} \
-DARROW_BUILD_UTILITIES=${ARROW_BUILD_UTILITIES:-ON} \
-DARROW_INSTALL_NAME_RPATH=${ARROW_INSTALL_NAME_RPATH:-ON} \
Expand Down
2 changes: 2 additions & 0 deletions ci/travis_script_gandiva_java.sh
Expand Up @@ -22,6 +22,8 @@ set -e
source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh
JAVA_DIR=${TRAVIS_BUILD_DIR}/java

export MAVEN_OPTS="-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn"

pushd $JAVA_DIR

# build with gandiva profile
Expand Down
57 changes: 46 additions & 11 deletions ci/travis_script_manylinux.sh
Expand Up @@ -20,23 +20,58 @@

set -ex

pushd python/manylinux1
docker run --shm-size=2g --rm -e PYARROW_PARALLEL=3 -v $PWD:/io -v $PWD/../../:/arrow quay.io/xhochy/arrow_manylinux1_x86_64_base:llvm-7-manylinux1 /io/build_arrow.sh

# Testing for https://issues.apache.org/jira/browse/ARROW-2657
# These tests cannot be run inside of the docker container, since TensorFlow
# does not run on manylinux1

source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh

source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh

PYTHON_VERSION=3.6
CONDA_ENV_DIR=$TRAVIS_BUILD_DIR/pyarrow-test-$PYTHON_VERSION
pushd python/manylinux1

cat << EOF > check_imports.py
import sys
import pyarrow
import pyarrow.orc
import pyarrow.parquet
import pyarrow.plasma
import tensorflow
if sys.version_info.major > 2:
import pyarrow.gandiva
EOF

for PYTHON_TUPLE in ${PYTHON_VERSIONS}; do
IFS="," read PYTHON_VERSION UNICODE_WIDTH <<< $PYTHON_TUPLE

# cleanup the artifact directory, docker writes it as root
sudo rm -rf dist

# build the wheels
docker run --shm-size=2g --rm \
-e PYARROW_PARALLEL=3 \
-e PYTHON_VERSION=$PYTHON_VERSION \
-e UNICODE_WIDTH=$UNICODE_WIDTH \
-v $PWD:/io \
-v $PWD/../../:/arrow \
quay.io/xhochy/arrow_manylinux1_x86_64_base:latest \
/io/build_arrow.sh

# create a testing conda environment
CONDA_ENV_DIR=$TRAVIS_BUILD_DIR/pyarrow-test-$PYTHON_VERSION
conda create -y -q -p $CONDA_ENV_DIR python=$PYTHON_VERSION
conda activate $CONDA_ENV_DIR

# install the produced wheels
pip install tensorflow
pip install dist/*.whl

# Test optional dependencies and the presence of tensorflow
python check_imports.py

conda create -y -q -p $CONDA_ENV_DIR python=$PYTHON_VERSION
conda activate $CONDA_ENV_DIR
# Install test dependencies and run pyarrow tests
pip install -r ../requirements-test.txt
pytest --pyargs pyarrow
done

pip install -q tensorflow
pip install "dist/`ls dist/ | grep cp36`"
python -c "import pyarrow; import tensorflow"
popd
1 change: 1 addition & 0 deletions cpp/Dockerfile
Expand Up @@ -48,6 +48,7 @@ RUN arrow/ci/docker_install_conda.sh && \

ENV CC=gcc \
CXX=g++ \
ARROW_GANDIVA=OFF \
ARROW_BUILD_TESTS=ON \
ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX \
ARROW_HOME=$CONDA_PREFIX \
Expand Down
19 changes: 8 additions & 11 deletions cpp/src/gandiva/CMakeLists.txt
Expand Up @@ -27,17 +27,14 @@ add_dependencies(gandiva-all gandiva gandiva-tests gandiva-benchmarks)

find_package(LLVM)

# Set the path where the byte-code files will be installed.
set(GANDIVA_BC_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}/gandiva)
# Set the path where the bitcode file generated, see precompiled/CMakeLists.txt
set(GANDIVA_PRECOMPILED_BC_PATH "${CMAKE_CURRENT_BINARY_DIR}/irhelpers.bc")
set(GANDIVA_PRECOMPILED_CC_PATH "${CMAKE_CURRENT_BINARY_DIR}/precompiled_bitcode.cc")
set(GANDIVA_PRECOMPILED_CC_IN_PATH
"${CMAKE_CURRENT_SOURCE_DIR}/precompiled_bitcode.cc.in")

set(GANDIVA_BC_FILE_NAME irhelpers.bc)
set(GANDIVA_BC_INSTALL_PATH "${GANDIVA_BC_INSTALL_DIR}/${GANDIVA_BC_FILE_NAME}")
set(GANDIVA_BC_OUTPUT_PATH "${CMAKE_CURRENT_BINARY_DIR}/${GANDIVA_BC_FILE_NAME}")
install(FILES ${GANDIVA_BC_OUTPUT_PATH} DESTINATION ${GANDIVA_BC_INSTALL_DIR})

set(BC_FILE_PATH_CC "${CMAKE_CURRENT_BINARY_DIR}/bc_file_path.cc")
configure_file(bc_file_path.cc.in ${BC_FILE_PATH_CC})
add_definitions(-DGANDIVA_BYTE_COMPILE_FILE_PATH="${GANDIVA_BC_OUTPUT_PATH}")
# add_arrow_lib will look for this not yet existing file, so flag as generated
set_source_files_properties(${GANDIVA_PRECOMPILED_CC_PATH} PROPERTIES GENERATED TRUE)

set(SRC_FILES
annotator.cc
Expand Down Expand Up @@ -73,7 +70,7 @@ set(SRC_FILES
selection_vector.cc
tree_expr_builder.cc
to_date_holder.cc
${BC_FILE_PATH_CC})
${GANDIVA_PRECOMPILED_CC_PATH})

set(GANDIVA_SHARED_PRIVATE_LINK_LIBS arrow_shared LLVM::LLVM_INTERFACE ${RE2_LIBRARY})

Expand Down
5 changes: 2 additions & 3 deletions cpp/src/gandiva/configuration.cc
Expand Up @@ -25,12 +25,11 @@ const std::shared_ptr<Configuration> ConfigurationBuilder::default_configuration
InitDefaultConfig();

std::size_t Configuration::Hash() const {
boost::hash<std::string> string_hash;
return string_hash(byte_code_file_path_);
return 0; // dummy for now, no configuration properties yet
}

bool Configuration::operator==(const Configuration& other) const {
return other.byte_code_file_path() == byte_code_file_path();
return true; // always true, no configuration properties yet
}

bool Configuration::operator!=(const Configuration& other) const {
Expand Down
24 changes: 2 additions & 22 deletions cpp/src/gandiva/configuration.h
Expand Up @@ -26,9 +26,6 @@

namespace gandiva {

GANDIVA_EXPORT
extern const char kByteCodeFilePath[];

class ConfigurationBuilder;
/// \brief runtime config for gandiva
///
Expand All @@ -38,17 +35,9 @@ class GANDIVA_EXPORT Configuration {
public:
friend class ConfigurationBuilder;

const std::string& byte_code_file_path() const { return byte_code_file_path_; }

std::size_t Hash() const;
bool operator==(const Configuration& other) const;
bool operator!=(const Configuration& other) const;

private:
explicit Configuration(const std::string& byte_code_file_path)
: byte_code_file_path_(byte_code_file_path) {}

const std::string byte_code_file_path_;
};

/// \brief configuration builder for gandiva
Expand All @@ -57,15 +46,8 @@ class GANDIVA_EXPORT Configuration {
/// to override specific values and build a custom instance
class GANDIVA_EXPORT ConfigurationBuilder {
public:
ConfigurationBuilder() : byte_code_file_path_(kByteCodeFilePath) {}

ConfigurationBuilder& set_byte_code_file_path(const std::string& byte_code_file_path) {
byte_code_file_path_ = byte_code_file_path;
return *this;
}

std::shared_ptr<Configuration> build() {
std::shared_ptr<Configuration> configuration(new Configuration(byte_code_file_path_));
std::shared_ptr<Configuration> configuration(new Configuration());
return configuration;
}

Expand All @@ -74,10 +56,8 @@ class GANDIVA_EXPORT ConfigurationBuilder {
}

private:
std::string byte_code_file_path_;

static std::shared_ptr<Configuration> InitDefaultConfig() {
std::shared_ptr<Configuration> configuration(new Configuration(kByteCodeFilePath));
std::shared_ptr<Configuration> configuration(new Configuration());
return configuration;
}

Expand Down
19 changes: 12 additions & 7 deletions cpp/src/gandiva/engine.cc
Expand Up @@ -66,6 +66,9 @@

namespace gandiva {

extern const char kPrecompiledBitcode[];
extern const size_t kPrecompiledBitcodeSize;

std::once_flag init_once_flag;

bool Engine::init_once_done_ = false;
Expand Down Expand Up @@ -114,7 +117,7 @@ Status Engine::Make(std::shared_ptr<Configuration> config,
// Add mappings for functions that can be accessed from LLVM/IR module.
engine_obj->AddGlobalMappings();

auto status = engine_obj->LoadPreCompiledIRFiles(config->byte_code_file_path());
auto status = engine_obj->LoadPreCompiledIR();
ARROW_RETURN_NOT_OK(status);

// Add decimal functions
Expand All @@ -126,14 +129,16 @@ Status Engine::Make(std::shared_ptr<Configuration> config,
}

// Handling for pre-compiled IR libraries.
Status Engine::LoadPreCompiledIRFiles(const std::string& byte_code_file_path) {
Status Engine::LoadPreCompiledIR() {
auto bitcode = llvm::StringRef(kPrecompiledBitcode, kPrecompiledBitcodeSize);

/// Read from file into memory buffer.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> buffer_or_error =
llvm::MemoryBuffer::getFile(byte_code_file_path);
ARROW_RETURN_IF(
!buffer_or_error,
Status::CodeGenError("Could not load module from IR ", byte_code_file_path, ": ",
buffer_or_error.getError().message()));
llvm::MemoryBuffer::getMemBuffer(bitcode, "precompiled", false);

ARROW_RETURN_IF(!buffer_or_error,
Status::CodeGenError("Could not load module from IR: ",
buffer_or_error.getError().message()));

std::unique_ptr<llvm::MemoryBuffer> buffer = move(buffer_or_error.get());

Expand Down
5 changes: 3 additions & 2 deletions cpp/src/gandiva/engine.h
Expand Up @@ -79,8 +79,9 @@ class GANDIVA_EXPORT Engine {

llvm::ExecutionEngine& execution_engine() { return *execution_engine_.get(); }

/// load pre-compiled IR modules and merge them into the main module.
Status LoadPreCompiledIRFiles(const std::string& byte_code_file_path);
/// load pre-compiled IR modules from precompiled_bitcode.cc and merge them into
/// the main module.
Status LoadPreCompiledIR();

// Create and add mappings for cpp functions that can be accessed from LLVM.
void AddGlobalMappings();
Expand Down
8 changes: 0 additions & 8 deletions cpp/src/gandiva/jni/config_builder.cc
Expand Up @@ -34,16 +34,8 @@ using gandiva::ConfigurationBuilder;
JNIEXPORT jlong JNICALL
Java_org_apache_arrow_gandiva_evaluator_ConfigurationBuilder_buildConfigInstance(
JNIEnv* env, jobject configuration) {
jstring byte_code_file_path =
(jstring)env->CallObjectMethod(configuration, byte_code_accessor_method_id_, 0);
ConfigurationBuilder configuration_builder;
if (byte_code_file_path != nullptr) {
const char* byte_code_file_path_cpp = env->GetStringUTFChars(byte_code_file_path, 0);
configuration_builder.set_byte_code_file_path(byte_code_file_path_cpp);
env->ReleaseStringUTFChars(byte_code_file_path, byte_code_file_path_cpp);
}
std::shared_ptr<Configuration> config = configuration_builder.build();
env->DeleteLocalRef(byte_code_file_path);
return ConfigHolder::MapInsert(config);
}

Expand Down
3 changes: 0 additions & 3 deletions cpp/src/gandiva/jni/env_helper.h
Expand Up @@ -23,7 +23,4 @@
// class references
extern jclass configuration_builder_class_;

// method references
extern jmethodID byte_code_accessor_method_id_;

#endif // ENV_HELPER_H
6 changes: 0 additions & 6 deletions cpp/src/gandiva/jni/jni_common.cc
Expand Up @@ -68,7 +68,6 @@ static jint JNI_VERSION = JNI_VERSION_1_6;

// extern refs - initialized for other modules.
jclass configuration_builder_class_;
jmethodID byte_code_accessor_method_id_;

// refs for self.
static jclass gandiva_exception_;
Expand All @@ -93,11 +92,6 @@ jint JNI_OnLoad(JavaVM* vm, void* reserved) {
gandiva_exception_ = (jclass)env->NewGlobalRef(localExceptionClass);
env->DeleteLocalRef(localExceptionClass);

const char method_name[] = "getByteCodeFilePath";
const char return_type[] = "()Ljava/lang/String;";
byte_code_accessor_method_id_ =
env->GetMethodID(configuration_builder_class_, method_name, return_type);

env->ExceptionDescribe();

return JNI_VERSION;
Expand Down
26 changes: 23 additions & 3 deletions cpp/src/gandiva/precompiled/CMakeLists.txt
Expand Up @@ -67,12 +67,32 @@ foreach(SRC_FILE ${PRECOMPILED_SRCS})
endforeach()

# link all of the bitcode files into a single bitcode file.
add_custom_command(OUTPUT ${GANDIVA_BC_OUTPUT_PATH}
COMMAND ${LLVM_LINK_EXECUTABLE} -o ${GANDIVA_BC_OUTPUT_PATH}
add_custom_command(OUTPUT ${GANDIVA_PRECOMPILED_BC_PATH}
COMMAND ${LLVM_LINK_EXECUTABLE} -o ${GANDIVA_PRECOMPILED_BC_PATH}
${BC_FILES}
DEPENDS ${BC_FILES})

add_custom_target(precompiled ALL DEPENDS ${GANDIVA_BC_OUTPUT_PATH})
# write a cmake script to replace precompiled bitcode file's content into a .cc file
file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/WritePrecompiledCC.cmake" "\
file(READ \"${GANDIVA_PRECOMPILED_BC_PATH}\" HEXCONTENT HEX)
string(REGEX REPLACE
\"([0-9a-f][0-9a-f])\"
\"'\\\\\\\\x\\\\1',\"
GANDIVA_PRECOMPILED_BITCODE_CHARS
\"\${HEXCONTENT}\")
configure_file(${GANDIVA_PRECOMPILED_CC_IN_PATH}
${GANDIVA_PRECOMPILED_CC_PATH})
")

# add the previous command to the execution chain
add_custom_command(OUTPUT ${GANDIVA_PRECOMPILED_CC_PATH}
COMMAND ${CMAKE_COMMAND} -P
"${CMAKE_CURRENT_BINARY_DIR}/WritePrecompiledCC.cmake"
DEPENDS ${GANDIVA_PRECOMPILED_CC_IN_PATH}
${GANDIVA_PRECOMPILED_BC_PATH})

add_custom_target(precompiled ALL
DEPENDS ${GANDIVA_PRECOMPILED_BC_PATH} ${GANDIVA_PRECOMPILED_CC_PATH})

function(add_precompiled_unit_test REL_TEST_NAME)
get_filename_component(TEST_NAME ${REL_TEST_NAME} NAME_WE)
Expand Down
Expand Up @@ -15,9 +15,12 @@
// specific language governing permissions and limitations
// under the License.

#include <string>

namespace gandiva {

// Path to the byte-code file.
extern const char kByteCodeFilePath[] = "${GANDIVA_BC_INSTALL_PATH}";
// Content of precompiled bitcode file.
extern const char kPrecompiledBitcode[] = { ${GANDIVA_PRECOMPILED_BITCODE_CHARS} };
extern const size_t kPrecompiledBitcodeSize = sizeof(kPrecompiledBitcode);

} // namespace gandiva
15 changes: 0 additions & 15 deletions cpp/src/gandiva/tests/projector_test.cc
Expand Up @@ -84,21 +84,6 @@ TEST_F(TestProjector, TestProjectCache) {
&cached_projector);
ASSERT_OK(status);
EXPECT_EQ(cached_projector, projector);

// if configuration is different, should return a new projector.

// build a new path by replacing the first '/' with '//'
std::string alt_path(GANDIVA_BYTE_COMPILE_FILE_PATH);
auto pos = alt_path.find('/', 0);
EXPECT_NE(pos, std::string::npos);
alt_path.replace(pos, 1, "//");
auto other_configuration =
ConfigurationBuilder().set_byte_code_file_path(alt_path).build();
std::shared_ptr<Projector> should_be_new_projector2;
status = Projector::Make(schema, {sum_expr, sub_expr}, other_configuration,
&should_be_new_projector2);
ASSERT_OK(status);
EXPECT_NE(projector, should_be_new_projector2);
}

TEST_F(TestProjector, TestProjectCacheFieldNames) {
Expand Down
3 changes: 1 addition & 2 deletions cpp/src/gandiva/tests/test_util.h
Expand Up @@ -89,8 +89,7 @@ static ArrayPtr MakeArrowTypeArray(const std::shared_ptr<arrow::DataType>& type,

std::shared_ptr<Configuration> TestConfiguration() {
auto builder = ConfigurationBuilder();
builder.set_byte_code_file_path(GANDIVA_BYTE_COMPILE_FILE_PATH);
return builder.build();
return builder.DefaultConfiguration();
}

} // namespace gandiva
Expand Down

0 comments on commit e8cc48b

Please sign in to comment.