Release xgboost 1.2 with GPU support #134

edwardjkim · 2020-09-10T21:11:52Z

Description of changes:

This CR upgrades XGBoost to 1.2 and enables GPU support.

When the image is build with xgboost 1.1, many integration tests fail with an xgboost error on feature mismatch, e.g.,

xgboost.core.XGBoostError: [16:47:33] /workspace/src/learner.cc:1062: Check failed: learner_model_param_.num_feature == p_fmat->Info().num_col_ (9 vs. 8) : Number of columns does not match number of features in booster.

This is due to a bug in 1.1 (github issues: xgboost 1.1.1 pred failed, while 0.90 pred success dmlc/xgboost#5841, Regression demo is broken dmlc/xgboost#5709). This has been fixed in 1.2 (Fix prediction heuristic dmlc/xgboost#5955). From 1.2 release notes:

Restore capability to run prediction when the test input has fewer features than the training data (#5955). This capability is necessary to support predicting with LIBSVM inputs. The previous release (1.1) had broken this capability, so we restore it in this version with better tests.

Since it doesn't sound like upstream XGBoost will backport this fix to 1.1, we release 1.2 in this CR.

New in XGBoost 1.1 & 1.2
- silent parameter has been removed in 1.1 in favor of verbose.
- A new objective survival:aft is added to support survival analysis: https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html
MLIO needs to be upgraded. The latest version of MLIO is v0.6. However, the conda package for v0.5 and v0.6 add ~3GB uncompressed (~1GB compressed) to the docker image (mainly due to a huge list of dependencies for image reader, e.g., ffmpeng, opencv, which were newly added in v0.5) increasing training time by ~1 minute. Thus, Dockerfile is optimized and rewritten to install mlio from source. The final image size is 1326.14 MB (compressed) with XGBoost 1.2, MLIO upgrade, and GPU support, compared to 1225.65 MB (compressed) for 1.0-1-cpu-py3.
GPU support
- We could install the CUDA toolkit, but installing CUDA Toolkit will increase the image size by around 700 MB (compressed). The proposed base image nvidia/cuda:${CUDA_VERSION}-base-ubuntu${UBUNTU_VERSION} is a small image that contains a minimal set of CUDA runtime files.
- Customers will have to specify the parameter tree_method: gpu_hist (and use the correct instance type, e.g., p3.xlarge, p3.2xlarge) to enable GPU training.
With GPU support in the same image as the CPU image, it is no longer necessary to append the architecture in the image tag. Since we dropped Python 2 support, the -cpu-py3 in the framework version is also redundant, and this CR proposes to drop the -<architecture>-<python version> suffix. (However, we will keep the old tag format in the deployment pipelines for backwards compatibility. That is, we will tag the same image with two tags: 1.2-1 and 1.2-1-cpu-py3.)

Testing: tox, integration tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

balajitummala · 2020-09-15T07:08:11Z

docker/1.2-1/base/Dockerfile.cpu

+# Python won’t try to write .pyc or .pyo files on the import of source modules
+# Force stdin, stdout and stderr to be totally unbuffered. Good for logging
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1


Will this have any impact on performance ? minor maybe

Interesting point. Those environment variables were kept from previous versions, e.g., https://github.com/aws/sagemaker-xgboost-container/blob/master/docker/1.0-1/base/Dockerfile.cpu#L32, and I didn't think to remove them.

balajitummala

Left a minor comment

Release xgboost 1.2 with GPU support

6294799

edwardjkim force-pushed the 1.2-1-development branch from a85c226 to 6294799 Compare September 10, 2020 21:12

edwardjkim added 3 commits September 10, 2020 21:49

Add back --universal in python setup.py bdist_wheel

e8694a8

Specify python version in tox.ini

d932c38

Revert README instructions

231ba87

edwardjkim closed this Sep 14, 2020

edwardjkim reopened this Sep 14, 2020

edwardjkim mentioned this pull request Sep 14, 2020

Release xgboost 1.2 with GPU support #133

Closed

edwardjkim requested review from ericangelokim, balajitummala and aws-patlin September 14, 2020 18:18

balajitummala reviewed Sep 15, 2020

View reviewed changes

balajitummala approved these changes Sep 15, 2020

View reviewed changes

edwardjkim merged commit aba19e9 into aws:master Sep 15, 2020

wiltonwu mentioned this pull request Oct 22, 2020

Install mlio dependency from source aws/sagemaker-scikit-learn-container#61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release xgboost 1.2 with GPU support #134

Release xgboost 1.2 with GPU support #134

edwardjkim commented Sep 10, 2020

balajitummala Sep 15, 2020

edwardjkim Sep 15, 2020

balajitummala left a comment •

edited

Loading

Release xgboost 1.2 with GPU support #134

Release xgboost 1.2 with GPU support #134

Conversation

edwardjkim commented Sep 10, 2020

balajitummala Sep 15, 2020

Choose a reason for hiding this comment

edwardjkim Sep 15, 2020

Choose a reason for hiding this comment

balajitummala left a comment • edited Loading

Choose a reason for hiding this comment

balajitummala left a comment •

edited

Loading