Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Add nightly CI job to test against dev version of deps #10351

Merged
merged 12 commits into from
Jun 4, 2024
2 changes: 1 addition & 1 deletion tests/buildkite/build-containers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ case "${container}" in
cpu)
;;

gpu)
gpu | gpu_dev_ver)
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
BUILD_ARGS="$BUILD_ARGS --build-arg CUDA_VERSION_ARG=$CUDA_VERSION"
BUILD_ARGS="$BUILD_ARGS --build-arg NCCL_VERSION_ARG=$NCCL_VERSION"
BUILD_ARGS="$BUILD_ARGS --build-arg RAPIDS_VERSION_ARG=$RAPIDS_VERSION"
Expand Down
37 changes: 37 additions & 0 deletions tests/buildkite/pipeline-nightly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Nightly CI pipeline, to test against dev versions of dependencies

env:
DOCKER_CACHE_ECR_ID: "492475357299"
DOCKER_CACHE_ECR_REGION: "us-west-2"
DISABLE_RELEASE: "1"
# Skip uploading artifacts to S3 bucket
# Also, don't build all CUDA archs; just build sm_75
USE_DEPS_DEV_VER: "1"
# Use dev versions of RAPIDS and other dependencies
steps:
#### -------- CONTAINER BUILD --------
- label: ":docker: Build containers"
commands:
- "tests/buildkite/build-containers.sh gpu_build_centos7"
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
- "tests/buildkite/build-containers.sh gpu_dev_ver"
key: build-containers
agents:
queue: linux-amd64-cpu
- wait

- label: ":console: Build CUDA"
command: "tests/buildkite/build-cuda.sh"
key: build-cuda
agents:
queue: linux-amd64-cpu
- wait
- label: ":console: Test Python package, single GPU"
command: "tests/buildkite/test-python-gpu.sh gpu"
key: test-python-gpu
agents:
queue: linux-amd64-gpu
- label: ":console: Test Python package, 4 GPUs"
command: "tests/buildkite/test-python-gpu.sh mgpu"
key: test-python-mgpu
agents:
queue: linux-amd64-mgpu
9 changes: 8 additions & 1 deletion tests/buildkite/test-python-gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ chmod +x build/testxgboost
# Allocate extra space in /dev/shm to enable NCCL
export CI_DOCKER_EXTRA_PARAMS_INIT='--shm-size=4g'

command_wrapper="tests/ci_build/ci_build.sh gpu --use-gpus --build-arg "`
if [[ -z "${USE_DEPS_DEV_VER}" ]]
then
container_tag='gpu'
else
container_tag='gpu_dev_ver'
fi

command_wrapper="tests/ci_build/ci_build.sh ${container_tag} --use-gpus --build-arg "`
`"CUDA_VERSION_ARG=$CUDA_VERSION --build-arg "`
`"RAPIDS_VERSION_ARG=$RAPIDS_VERSION --build-arg "`
`"NCCL_VERSION_ARG=$NCCL_VERSION"
Expand Down
51 changes: 51 additions & 0 deletions tests/ci_build/Dockerfile.gpu_dev_ver
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Container to test XGBoost against dev versions of dependencies

ARG CUDA_VERSION_ARG
FROM nvidia/cuda:$CUDA_VERSION_ARG-runtime-ubuntu22.04
ARG CUDA_VERSION_ARG
ARG RAPIDS_VERSION_ARG
ARG NCCL_VERSION_ARG

# Environment
ENV DEBIAN_FRONTEND noninteractive
SHELL ["/bin/bash", "-c"] # Use Bash as shell

# Install all basic requirements
RUN \
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub && \
apt-get update && \
apt-get install -y wget unzip bzip2 libgomp1 build-essential openjdk-8-jdk-headless && \
# Python
wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/22.11.1-2/Mambaforge-22.11.1-2-Linux-x86_64.sh && \
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
bash conda.sh -b -p /opt/mambaforge

ENV PATH=/opt/mambaforge/bin:$PATH

# Create new Conda environment with cuDF, Dask, and cuPy
RUN \
export NCCL_SHORT_VER=$(echo "$NCCL_VERSION_ARG" | cut -d "-" -f 1) && \
mamba create -y -n gpu_test -c rapidsai-nightly -c nvidia -c conda-forge \
python=3.10 "cudf>$RAPIDS_VERSION_ARG" "rmm>$RAPIDS_VERSION_ARG" cudatoolkit=$CUDA_VERSION_ARG \
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
"nccl>=${NCCL_SHORT_VER}" \
dask=2024.1.1 \
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
"dask-cuda>$RAPIDS_VERSION_ARG" "dask-cudf>$RAPIDS_VERSION_ARG" cupy \
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
numpy pytest pytest-timeout scipy scikit-learn pandas matplotlib wheel python-kubernetes urllib3 graphviz hypothesis \
"pyspark>=3.4.0" cloudpickle cuda-python && \
mamba clean --all && \
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
conda run --no-capture-output -n gpu_test pip install buildkite-test-collector

ENV GOSU_VERSION 1.10
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/

# Install lightweight sudo (not bound to TTY)
RUN set -ex; \
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
chmod +x /usr/local/bin/gosu && \
gosu nobody true

# Default entry-point to use if running locally
# It will preserve attributes of created files
COPY entrypoint.sh /scripts/

WORKDIR /workspace
ENTRYPOINT ["/scripts/entrypoint.sh"]
Loading