Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMM integration plugin #5873

Merged
merged 48 commits into from
Aug 12, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b7a322d
[CI] Add RMM as an optional dependency
hcho3 Jul 8, 2020
e15845d
Replace caching allocator with pool allocator from RMM
hcho3 Jul 8, 2020
812c209
Revert "Replace caching allocator with pool allocator from RMM"
hcho3 Jul 9, 2020
a891112
Use rmm::mr::get_default_resource()
hcho3 Jul 9, 2020
b5eb54d
Try setting default resource (doesn't work yet)
hcho3 Jul 9, 2020
6abd4c0
Allocate pool_mr in the heap
hcho3 Jul 9, 2020
2bdbc23
Prevent leaking pool_mr handle
hcho3 Jul 9, 2020
c723632
Separate EXPECT_DEATH() in separate test suite suffixed DeathTest
hcho3 Jul 9, 2020
78c2254
Turn off death tests for RMM
hcho3 Jul 9, 2020
a520fa1
Address reviewer's feedback
hcho3 Jul 9, 2020
a73391c
Prevent leaking of cuda_mr
hcho3 Jul 10, 2020
309efc0
Merge remote-tracking branch 'origin/master' into add_rmm
hcho3 Jul 22, 2020
fa4ec11
Fix Jenkinsfile syntax
hcho3 Jul 22, 2020
871fc29
Remove unnecessary function in Jenkinsfile
hcho3 Jul 22, 2020
48051df
[CI] Install NCCL into RMM container
hcho3 Jul 22, 2020
c0a05ce
Run Python tests
hcho3 Jul 22, 2020
c12e0a6
Try building with RMM, CUDA 10.0
hcho3 Jul 22, 2020
a3e0e2f
Do not use RMM for CUDA 10.0 target
hcho3 Jul 22, 2020
3aeab69
Actually test for test_rmm flag
hcho3 Jul 22, 2020
862d580
Fix TestPythonGPU
hcho3 Jul 22, 2020
2a064bf
Use CNMeM allocator, since pool allocator doesn't yet support multiGPU
hcho3 Jul 29, 2020
ab4e7b4
Merge branch 'master' into add_rmm
hcho3 Jul 29, 2020
dd05d7b
Merge remote-tracking branch 'origin/master' into add_rmm
hcho3 Jul 29, 2020
a4da8c5
Merge remote-tracking branch 'upstream/master' into add_rmm
hcho3 Jul 29, 2020
789021f
Use 10.0 container to build RMM-enabled XGBoost
hcho3 Jul 30, 2020
f27d836
Revert "Use 10.0 container to build RMM-enabled XGBoost"
hcho3 Jul 31, 2020
a4b86a9
Fix Jenkinsfile
hcho3 Jul 31, 2020
e5eb262
[CI] Assign larger /dev/shm to NCCL
hcho3 Jul 31, 2020
4cf7f00
Use 10.2 artifact to run multi-GPU Python tests
hcho3 Jul 31, 2020
d023a50
Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target
hcho3 Jul 31, 2020
abc64a3
Rename Conda env rmm_test -> gpu_test
hcho3 Jul 31, 2020
1e7e42e
Use env var to opt into CNMeM pool for C++ tests
hcho3 Jul 31, 2020
f1eeaff
Merge branch 'master' into add_rmm
hcho3 Jul 31, 2020
1069ae0
Use identical CUDA version for RMM builds and tests
hcho3 Jul 31, 2020
99a7520
Use Pytest fixtures to enable RMM pool in Python tests
hcho3 Aug 6, 2020
ecc16ec
Merge remote-tracking branch 'upstream/master' into add_rmm
hcho3 Aug 7, 2020
92d1481
Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM
hcho3 Aug 7, 2020
e74fd0d
Use per-device MR; use command arg in gtest
hcho3 Aug 8, 2020
2ee04b3
Set CMake prefix path to use Conda env
hcho3 Aug 8, 2020
87422a2
Use 0.15 nightly version of RMM
hcho3 Aug 8, 2020
9021a75
Remove unnecessary header
hcho3 Aug 8, 2020
377580a
Fix a unit test when cudf is missing
Aug 8, 2020
2f3c532
Merge remote-tracking branch 'upstream/master' into add_rmm
hcho3 Aug 9, 2020
3df7cc3
Add RMM demos
hcho3 Aug 10, 2020
567fb33
Remove print()
hcho3 Aug 10, 2020
1e63c46
Use HostDeviceVector in GPU predictor
hcho3 Aug 11, 2020
ad216c5
Simplify pytest setup; use LocalCUDACluster fixture
hcho3 Aug 11, 2020
b4195cd
Address reviewers' commments
hcho3 Aug 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ option(USE_NCCL "Build with NCCL to enable distributed GPU support." OFF)
option(BUILD_WITH_SHARED_NCCL "Build with shared NCCL library." OFF)
set(GPU_COMPUTE_VER "" CACHE STRING
"Semicolon separated list of compute versions to be built against, e.g. '35;61'")
option(USE_RMM "Build with RAPIDS Memory Manager (RMM)" OFF)
## Copied From dmlc
option(USE_HDFS "Build with HDFS support" OFF)
option(USE_AZURE "Build with AZURE support" OFF)
Expand Down Expand Up @@ -79,6 +80,9 @@ endif (R_LIB AND GOOGLE_TEST)
if (USE_AVX)
message(SEND_ERROR "The option 'USE_AVX' is deprecated as experimental AVX features have been removed from XGBoost.")
endif (USE_AVX)
if (USE_RMM AND NOT (USE_CUDA))
message(SEND_ERROR "`USE_RMM` must be enabled with `USE_CUDA` flag.")
endif (USE_RMM AND NOT (USE_CUDA))

#-- Sanitizer
if (USE_SANITIZER)
Expand Down Expand Up @@ -170,6 +174,9 @@ endif (R_LIB)
# Plugin
add_subdirectory(${xgboost_SOURCE_DIR}/plugin)

# 3rd-party libs
include(cmake/ExternalLibs.cmake)

#-- library
if (BUILD_STATIC_LIB)
add_library(xgboost STATIC)
Expand Down
34 changes: 34 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ pipeline {
'build-cpu-non-omp': { BuildCPUNonOmp() },
'build-gpu-cuda10.0': { BuildCUDA(cuda_version: '10.0') },
'build-gpu-cuda10.1': { BuildCUDA(cuda_version: '10.1') },
'build-gpu-rmm-cuda10.2': { BuildCUDAWithRMM(cuda_version: '10.2') },
'build-jvm-packages': { BuildJVMPackages(spark_version: '2.4.3') },
'build-jvm-doc': { BuildJVMDoc() }
])
Expand All @@ -84,6 +85,7 @@ pipeline {
'test-python-mgpu-cuda10.1': { TestPythonGPU(cuda_version: '10.1', multi_gpu: true) },
'test-cpp-gpu': { TestCppGPU(cuda_version: '10.1') },
'test-cpp-mgpu': { TestCppGPU(cuda_version: '10.1', multi_gpu: true) },
'test-rmm-cpp-gpu': { TestCppGPUWithRMM(cuda_version: '10.2') },
'test-jvm-jdk8': { CrossTestJVMwithJDK(jdk_version: '8', spark_version: '2.4.3') },
'test-jvm-jdk11': { CrossTestJVMwithJDK(jdk_version: '11') },
'test-jvm-jdk12': { CrossTestJVMwithJDK(jdk_version: '12') },
Expand Down Expand Up @@ -262,6 +264,22 @@ def BuildCUDA(args) {
}
}

def BuildCUDAWithRMM(args) {
node('linux && cpu_build') {
unstash name: 'srcs'
echo "Build with CUDA ${args.cuda_version} and RMM"
def container_type = "rmm"
def docker_binary = "docker"
def docker_args = "--build-arg CUDA_VERSION=${args.cuda_version}"
sh """
${dockerRun} ${container_type} ${docker_binary} ${docker_args} tests/ci_build/build_via_cmake.sh --conda-env=rmm_test -DUSE_CUDA=ON -DUSE_RMM=ON
"""
echo 'Stashing C++ test executable (testxgboost)...'
stash name: 'xgboost_rmm_cpp_tests', includes: 'build/testxgboost'
deleteDir()
}
}

def BuildJVMPackages(args) {
node('linux && cpu') {
unstash name: 'srcs'
Expand Down Expand Up @@ -368,6 +386,22 @@ def TestCppGPU(args) {
}
}

def TestCppGPUWithRMM(args) {
node('linux && gpu') {
unstash name: 'xgboost_rmm_cpp_tests'
unstash name: 'srcs'
echo "Test C++, CUDA ${args.cuda_version} with RMM"
def container_type = "rmm"
def docker_binary = "nvidia-docker"
def docker_args = "--build-arg CUDA_VERSION=${args.cuda_version}"
echo "Using a single GPU"
sh """
${dockerRun} ${container_type} ${docker_binary} ${docker_args} bash -c "source activate rmm_test && build/testxgboost --gtest_filter=-*.MGPU_*:*DeathTest.*"
"""
deleteDir()
}
}

def CrossTestJVMwithJDK(args) {
node('linux && cpu') {
unstash name: 'xgboost4j_jar'
Expand Down
27 changes: 27 additions & 0 deletions cmake/ExternalLibs.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# RMM
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
if (USE_RMM)
# Use Conda env if available
if(DEFINED ENV{CONDA_PREFIX})
hcho3 marked this conversation as resolved.
Show resolved Hide resolved
set(CMAKE_PREFIX_PATH "$ENV{CONDA_PREFIX};${CMAKE_PREFIX_PATH}")
message(STATUS "Detected Conda environment, CMAKE_PREFIX_PATH set to: ${CMAKE_PREFIX_PATH}")
else()
message(STATUS "No Conda environment detected")
endif()

find_path(RMM_INCLUDE "rmm"
HINTS "$ENV{RMM_ROOT}/include")

find_library(RMM_LIBRARY "rmm"
HINTS "$ENV{RMM_ROOT}/lib" "$ENV{RMM_ROOT}/build")

if ((NOT RMM_LIBRARY) OR (NOT RMM_INCLUDE))
message(FATAL_ERROR "Could not locate RMM library")
endif ()

message(STATUS "RMM: RMM_LIBRARY set to ${RMM_LIBRARY}")
message(STATUS "RMM: RMM_INCLUDE set to ${RMM_INCLUDE}")

target_include_directories(objxgboost PUBLIC ${RMM_INCLUDE})
target_link_libraries(objxgboost PUBLIC ${RMM_LIBRARY} cuda)
target_compile_definitions(objxgboost PUBLIC -DXGBOOST_USE_RMM=1)
endif ()
26 changes: 22 additions & 4 deletions src/common/device_helpers.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,12 @@

#ifdef XGBOOST_USE_NCCL
#include "nccl.h"
#endif
#endif // XGBOOST_USE_NCCL

#if defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1
#include "rmm/mr/device/default_memory_resource.hpp"
#include "rmm/mr/device/thrust_allocator_adaptor.hpp"
#endif // defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600 || defined(__clang__)

Expand Down Expand Up @@ -370,12 +375,21 @@ inline void DebugSyncDevice(std::string file="", int32_t line = -1) {
}

namespace detail {

#if defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1
template <typename T>
using XGBBaseDeviceAllocator = rmm::mr::thrust_allocator<T>;
#else // defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1
template <typename T>
using XGBBaseDeviceAllocator = thrust::device_malloc_allocator<T>;
#endif // defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1

/**
* \brief Default memory allocator, uses cudaMalloc/Free and logs allocations if verbose.
*/
template <class T>
struct XGBDefaultDeviceAllocatorImpl : thrust::device_malloc_allocator<T> {
using SuperT = thrust::device_malloc_allocator<T>;
struct XGBDefaultDeviceAllocatorImpl : XGBBaseDeviceAllocator<T> {
using SuperT = XGBBaseDeviceAllocator<T>;
using pointer = thrust::device_ptr<T>; // NOLINT
template<typename U>
struct rebind // NOLINT
Expand All @@ -391,10 +405,14 @@ struct XGBDefaultDeviceAllocatorImpl : thrust::device_malloc_allocator<T> {
GlobalMemoryLogger().RegisterDeallocation(ptr.get(), n * sizeof(T));
return SuperT::deallocate(ptr, n);
}
#if defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1
XGBDefaultDeviceAllocatorImpl() : SuperT(rmm::mr::get_default_resource(), cudaStream_t{0}) {}
#endif // defined(XGBOOST_USE_RMM) && XGBOOST_USE_RMM == 1
};

/**
* \brief Caching memory allocator, uses cub::CachingDeviceAllocator as a back-end and logs allocations if verbose. Does not initialise memory on construction.
* \brief Caching memory allocator, uses cub::CachingDeviceAllocator as a back-end and logs
* allocations if verbose. Does not initialise memory on construction.
*/
template <class T>
struct XGBCachingDeviceAllocatorImpl : thrust::device_malloc_allocator<T> {
Expand Down
39 changes: 39 additions & 0 deletions tests/ci_build/Dockerfile.rmm
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
ARG CUDA_VERSION
FROM nvidia/cuda:$CUDA_VERSION-devel-ubuntu18.04

# Environment
ENV DEBIAN_FRONTEND noninteractive
SHELL ["/bin/bash", "-c"] # Use Bash as shell

# Install all basic requirements
RUN \
apt-get update && \
apt-get install -y wget unzip bzip2 libgomp1 build-essential ninja-build git && \
# Python
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3.sh -b -p /opt/python && \
# CMake
wget -nv -nc https://cmake.org/files/v3.13/cmake-3.13.0-Linux-x86_64.sh --no-check-certificate && \
bash cmake-3.13.0-Linux-x86_64.sh --skip-license --prefix=/usr

ENV PATH=/opt/python/bin:$PATH

# Create new Conda environment with RMM
RUN \
conda create -n rmm_test -c nvidia -c rapidsai -c conda-forge -c defaults \
python=3.7 rmm=0.14 cudatoolkit=$CUDA_VERSION

ENV GOSU_VERSION 1.10

# Install lightweight sudo (not bound to TTY)
RUN set -ex; \
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
chmod +x /usr/local/bin/gosu && \
gosu nobody true

# Default entry-point to use if running locally
# It will preserve attributes of created files
COPY entrypoint.sh /scripts/

WORKDIR /workspace
ENTRYPOINT ["/scripts/entrypoint.sh"]
13 changes: 12 additions & 1 deletion tests/ci_build/build_via_cmake.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
#!/usr/bin/env bash
set -e

if [[ "$1" == --conda-env=* ]]
then
conda_env=$(echo "$1" | sed 's/^--conda-env=//g' -)
echo "Activating Conda environment ${conda_env}"
shift 1
cmake_args="$@"
source activate ${conda_env}
else
cmake_args="$@"
fi

rm -rf build
mkdir build
cd build
cmake .. "$@" -DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON -DCMAKE_VERBOSE_MAKEFILE=ON
cmake .. ${cmake_args} -DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON -DCMAKE_VERBOSE_MAKEFILE=ON
make clean
make -j$(nproc)
cd ..
2 changes: 2 additions & 0 deletions tests/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ if (USE_CUDA)
$<$<COMPILE_LANGUAGE:CUDA>:${GEN_CODE}>)
target_compile_definitions(testxgboost
PRIVATE -DXGBOOST_USE_CUDA=1)
find_package(CUDA)
target_include_directories(testxgboost PRIVATE ${CUDA_INCLUDE_DIRS})
set_target_properties(testxgboost PROPERTIES
CUDA_SEPARABLE_COMPILATION OFF)

Expand Down
82 changes: 65 additions & 17 deletions tests/cpp/common/test_span.cc
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,6 @@ TEST(Span, FromPtrLen) {
}
}

{
auto lazy = [=]() {Span<float const, 16> tmp (arr, 5);};
EXPECT_DEATH(lazy(), "\\[xgboost\\] Condition .* failed.\n");
}

// dynamic extent
{
Span<float, 16> s (arr, 16);
Expand All @@ -122,6 +117,15 @@ TEST(Span, FromPtrLen) {
}
}

TEST(SpanDeathTest, FromPtrLen) {
float arr[16];
InitializeRange(arr, arr+16);
{
auto lazy = [=]() {Span<float const, 16> tmp (arr, 5);};
EXPECT_DEATH(lazy(), "\\[xgboost\\] Condition .* failed.\n");
}
}

TEST(Span, FromFirstLast) {
float arr[16];
InitializeRange(arr, arr+16);
Expand Down Expand Up @@ -285,7 +289,13 @@ TEST(Span, ElementAccess) {
ASSERT_EQ(i, arr[j]);
++j;
}
}

TEST(SpanDeathTest, ElementAccess) {
float arr[16];
InitializeRange(arr, arr + 16);

Span<float> s (arr);
EXPECT_DEATH(s[16], "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s[-1], "\\[xgboost\\] Condition .* failed.\n");

Expand All @@ -312,7 +322,9 @@ TEST(Span, FrontBack) {
ASSERT_EQ(s.front(), 0);
ASSERT_EQ(s.back(), 3);
}
}

TEST(SpanDeathTest, FrontBack) {
{
Span<float, 0> s;
EXPECT_DEATH(s.front(), "\\[xgboost\\] Condition .* failed.\n");
Expand Down Expand Up @@ -340,10 +352,6 @@ TEST(Span, FirstLast) {
for (size_t i = 0; i < first.size(); ++i) {
ASSERT_EQ(first[i], arr[i]);
}
auto constexpr kOne = static_cast<Span<float, 4>::index_type>(-1);
EXPECT_DEATH(s.first<kOne>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first<17>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first<32>(), "\\[xgboost\\] Condition .* failed.\n");
}

{
Expand All @@ -359,10 +367,6 @@ TEST(Span, FirstLast) {
for (size_t i = 0; i < last.size(); ++i) {
ASSERT_EQ(last[i], arr[i+12]);
}
auto constexpr kOne = static_cast<Span<float, 4>::index_type>(-1);
EXPECT_DEATH(s.last<kOne>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last<17>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last<32>(), "\\[xgboost\\] Condition .* failed.\n");
}

// dynamic extent
Expand All @@ -379,10 +383,6 @@ TEST(Span, FirstLast) {
ASSERT_EQ(first[i], s[i]);
}

EXPECT_DEATH(s.first(-1), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first(17), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first(32), "\\[xgboost\\] Condition .* failed.\n");

delete [] arr;
}

Expand All @@ -399,6 +399,50 @@ TEST(Span, FirstLast) {
ASSERT_EQ(s[12 + i], last[i]);
}

delete [] arr;
}
}

TEST(SpanDeathTest, FirstLast) {
trivialfis marked this conversation as resolved.
Show resolved Hide resolved
// static extent
{
float arr[16];
InitializeRange(arr, arr + 16);

Span<float> s (arr);
auto constexpr kOne = static_cast<Span<float, 4>::index_type>(-1);
EXPECT_DEATH(s.first<kOne>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first<17>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first<32>(), "\\[xgboost\\] Condition .* failed.\n");
}

{
float arr[16];
InitializeRange(arr, arr + 16);

Span<float> s (arr);
auto constexpr kOne = static_cast<Span<float, 4>::index_type>(-1);
EXPECT_DEATH(s.last<kOne>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last<17>(), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last<32>(), "\\[xgboost\\] Condition .* failed.\n");
}

// dynamic extent
{
float *arr = new float[16];
InitializeRange(arr, arr + 16);
Span<float> s (arr, 16);
EXPECT_DEATH(s.first(-1), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first(17), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.first(32), "\\[xgboost\\] Condition .* failed.\n");

delete [] arr;
}

{
float *arr = new float[16];
InitializeRange(arr, arr + 16);
Span<float> s (arr, 16);
EXPECT_DEATH(s.last(-1), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last(17), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s.last(32), "\\[xgboost\\] Condition .* failed.\n");
Expand All @@ -420,7 +464,11 @@ TEST(Span, Subspan) {
auto s4 = s1.subspan(2, dynamic_extent);
ASSERT_EQ(s1.data() + 2, s4.data());
ASSERT_EQ(s4.size(), s1.size() - 2);
}

TEST(SpanDeathTest, Subspan) {
int arr[16] {0};
Span<int> s1 (arr);
EXPECT_DEATH(s1.subspan(-1, 0), "\\[xgboost\\] Condition .* failed.\n");
EXPECT_DEATH(s1.subspan(17, 0), "\\[xgboost\\] Condition .* failed.\n");

Expand Down
Loading