Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3367: [INTEGRATION] Port Spark integration test to the docker-compose setup #3300

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
41df92d
Fix spark_integration/{Dockerfile,spark_integration.sh} for spark-int…
sarutak Dec 23, 2018
cdb4f41
spark integration test
kszucs Dec 31, 2018
7703b50
add nightly test entry
kszucs Dec 31, 2018
d654366
silent docker build outputs on nightly builds
kszucs Jan 3, 2019
341adb4
silent installation steps
kszucs Jan 3, 2019
6a6e1d6
Revert "silent docker build outputs on nightly builds"
kszucs Jan 3, 2019
b91eae0
remove commented lines
kszucs Jan 3, 2019
6017145
remove notes from docker-compose
kszucs Jan 3, 2019
927eac6
fix docker-compose syntax error
kszucs Jan 3, 2019
c0aa8eb
less verbose maven
kszucs Jan 4, 2019
c0b24e6
patch spark 2.4.0
kszucs Jan 7, 2019
3c088ed
compile only sql/core and assemply spark packages; don't parallelize …
kszucs Jan 7, 2019
20cf16c
cleanup spark dockerfile
kszucs Jan 7, 2019
f1b74e9
silenting
kszucs Jan 7, 2019
11b41ab
warn level
kszucs Jan 7, 2019
05a934f
remove spark_integration entry from the old docker-compose.yml
kszucs Jan 7, 2019
fd54e8a
avoid bind mounts
kszucs Jan 8, 2019
bb30fe1
[skip ci] download spark into the image
kszucs Jan 8, 2019
31ce522
[skip ci] custom build dir for for java
kszucs Jan 8, 2019
cb8a438
[skip ci] use rsync...
kszucs Jan 8, 2019
2c8334d
setup.py fuuuuu
kszucs Jan 9, 2019
70028d2
[skip ci] java requires integration folder
kszucs Jan 9, 2019
68756b6
[skip ci] mvn fu
kszucs Jan 9, 2019
4de2c85
[skip ci] fix another pyspark test issue
kszucs Jan 9, 2019
e13d227
[skip ci] fix a pyspark test again
kszucs Jan 10, 2019
c594cd3
fix wrong patch; spark version tag pointed to a different commit...
kszucs Jan 25, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 0 additions & 1 deletion ci/docker_build_cpp.sh
Expand Up @@ -17,7 +17,6 @@
# under the License.

set -e
set -o xtrace

source_dir=${1:-/arrow/cpp}
build_dir=${2:-/build/cpp}
Expand Down
32 changes: 32 additions & 0 deletions ci/docker_build_java.sh
@@ -0,0 +1,32 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -e

# /arrow/java is read-only
mkdir -p /build/java

arrow_src=/build/java/arrow

pushd /arrow
rsync -a header java format integration $arrow_src
popd

pushd $arrow_src/java
mvn -DskipTests -Drat.skip=true install
popd
17 changes: 14 additions & 3 deletions ci/docker_build_python.sh
Expand Up @@ -26,15 +26,26 @@ export PYARROW_CMAKE_GENERATOR=Ninja
export PYARROW_BUILD_TYPE=${PYARROW_BUILD_TYPE:-debug}

# Feature flags
export SETUPTOOLS_SCM_VERSION_WRITE_TO_PREFIX=$build_dir
export PYARROW_WITH_ORC=${PYARROW_WITH_ORC:-1}
export PYARROW_WITH_PARQUET=${PYARROW_WITH_PARQUET:-1}
export PYARROW_WITH_PLASMA=${PYARROW_WITH_PLASMA:-1}

# Build pyarrow
pushd ${source_dir}
# hacky again, setuptools_scm writes _generated_version.py before pyarrow
# directory is created by setuptools
mkdir -p $build_dir/pyarrow

python setup.py build --build-temp=${build_dir} \
install --single-version-externally-managed \
--record=/build/python/record.txt
relative_build_dir=$(realpath --relative-to=. $build_dir)

# this is a nightmare, but prevents mutating the source directory
# which is bind mounted as readonly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof sorry. We should report to setuptools about this issue since this is really crappy

python setup.py build_ext --build-temp $relative_build_dir \
--build-lib $relative_build_dir \
build_py --build-lib $relative_build_dir \
egg_info --egg-base $relative_build_dir \
install_lib --build-dir $relative_build_dir \
install --single-version-externally-managed \
--record $relative_build_dir/record.txt
popd
2 changes: 1 addition & 1 deletion cpp/Dockerfile
Expand Up @@ -40,7 +40,7 @@ ADD ci/docker_install_conda.sh \
ci/conda_env_unix.yml \
/arrow/ci/
RUN arrow/ci/docker_install_conda.sh && \
conda install -c conda-forge \
conda install -q -c conda-forge \
--file arrow/ci/conda_env_cpp.yml \
--file arrow/ci/conda_env_unix.yml \
$EXTRA_CONDA_PKGS && \
Expand Down
18 changes: 0 additions & 18 deletions dev/docker-compose.yml
Expand Up @@ -31,31 +31,13 @@ services:
volumes:
- ../..:/apache-arrow

spark_integration:
build:
context: spark_integration
volumes:
- ../..:/apache-arrow

dask_integration:
build:
context: dask_integration
volumes:
- ../..:/apache-arrow

gen_apidocs:
build:
context: ..
dockerfile: dev/gen_apidocs/Dockerfile
volumes:
- ..:/arrow

iwyu:
build:
context: iwyu
volumes:
- ../..:/apache-arrow

run_site:
build:
context: run_site
Expand Down
67 changes: 0 additions & 67 deletions dev/spark_integration/Dockerfile

This file was deleted.

91 changes: 0 additions & 91 deletions dev/spark_integration/spark_integration.sh

This file was deleted.

16 changes: 15 additions & 1 deletion dev/tasks/tests.yml
Expand Up @@ -38,13 +38,15 @@ groups:
- docker-iwyu
- docker-clang-format
- docker-pandas-master
- docker-hdfs-integration
- docker-dask-integration
- docker-hdfs-integration
- docker-spark-integration

integration:
- docker-pandas-master
- docker-dask-integration
- docker-hdfs-integration
- docker-spark-integration

cpp-python:
- docker-cpp
Expand Down Expand Up @@ -275,6 +277,18 @@ tasks:
- docker-compose build hdfs-integration
- docker-compose run hdfs-integration

docker-spark-integration:
platform: linux
template: docker-tests/travis.linux.yml
params:
environment:
PYTHON_VERSION: 3.6
commands:
- docker-compose build cpp
- docker-compose build python
- docker-compose build spark-integration
- docker-compose run spark-integration

docker-pandas-master:
platform: linux
template: docker-tests/travis.linux.yml
Expand Down
43 changes: 35 additions & 8 deletions docker-compose.yml
Expand Up @@ -21,13 +21,19 @@ version: '3.5'

x-ubuntu-volumes:
&ubuntu-volumes
- .:/arrow:delegated
- ${ARROW_DOCKER_CACHE_DIR:-./docker_cache}/ubuntu:/build:delegated
- .:/arrow:ro # ensures that docker won't contaminate the host directory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brought up new issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the python build is failing, because of the _generated_version.py can't be written in the source directory (setting write_to to False resolves that). So python setup.py build would work without contaminating the source directory.
The issue is that python setup.py install is still attempting the source directory (no matter which options I use).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, can you rsync the files into container space and build there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't like that solution, but probably have to stick with that.

- ubuntu-cache:/build:delegated

x-alpine-volumes:
&alpine-volumes
- .:/arrow:delegated
- ${ARROW_DOCKER_CACHE_DIR:-./docker_cache}/alpine:/build:delegated
- .:/arrow:ro # ensures that docker won't contaminate the host directory
- alpine-cache:/build:delegated

volumes:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to named volumes.

ubuntu-cache:
alpine-cache:
maven-cache:
spark-cache:

services:

Expand Down Expand Up @@ -105,8 +111,8 @@ services:
context: .
dockerfile: java/Dockerfile
volumes:
- .:/arrow:delegated
- $HOME/.m2:/root/.m2:delegated
- .:/arrow:ro # ensures that docker won't contaminate the host directory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- maven-cache:/root/.m2:delegated

js:
image: arrow:js
Expand Down Expand Up @@ -267,14 +273,14 @@ services:
links:
- hdfs-namenode:hdfs-namenode

# TODO(kszucs): pass hdfs client version explicitly as a build argument
hdfs-integration:
# Usage:
# export PYTHON_VERSION=3.6
# docker-compose build cpp
# docker-compose build python
# docker-compose build hdfs-integration
# docker-compose run hdfs-integration
image: arrow:hdfs-${HDFS_VERSION:-2.6.5}
links:
- hdfs-namenode:hdfs-namenode
- hdfs-datanode-1:hdfs-datanode-1
Expand All @@ -286,6 +292,8 @@ services:
build:
context: .
dockerfile: integration/hdfs/Dockerfile
args:
HDFS_VERSION: ${HDFS_VERSION:-2.6.5}
volumes: *ubuntu-volumes

# TODO(kszucs): pass dask version explicitly as a build argument
Expand All @@ -296,10 +304,29 @@ services:
# docker-compose build python
# docker-compose build dask-integration
# docker-compose run dask-integration
image: arrow:dask-integration
build:
context: .
dockerfile: integration/dask/Dockerfile
volumes: *ubuntu-volumes

spark-integration:
# Usage:
# export PYTHON_VERSION=3.6
# docker-compose build cpp
# docker-compose build python
# docker-compose build spark-integration
# docker-compose run spark-integration
image: arrow:spark-${SPARK_VERSION:-2.4.0}
environment:
- SPARK_VERSION=${SPARK_VERSION:-2.4.0}
build:
context: .
dockerfile: integration/spark/Dockerfile
volumes:
- .:/arrow:ro # ensures that docker won't contaminate the host directory
- ubuntu-cache:/build:delegated
- maven-cache:/root/.m2:delegated
- spark-cache:/spark:delegated

# TODO(kszucs): hive-integration
# TODO(kszucs): spark-integration