Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] XGBoost 1.0.0 Release Candidate #5253

Open
hcho3 opened this issue Jan 31, 2020 · 24 comments
Open

[RFC] XGBoost 1.0.0 Release Candidate #5253

hcho3 opened this issue Jan 31, 2020 · 24 comments

Comments

@hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Jan 31, 2020

The last release (0.90) came out on May 20, 2019, and after 8 months of effort, we proudly announce the 1.0.0 release. In the next two weeks, the community is invited to try out the release candidate (RC).

Feedback period: until the end of February 14, 2020 February 17, 2020. No new feature will be added to the 1.0.0 release; only critical bug fixes will be added.

@dmlc/xgboost-committer

Now available

pip3 install xgboost==1.0.0rc2
  • R package. RC2 available from the Releases section. Download the tarball file xgboost_1.0.0.1.tar.gz and run
R CMD INSTALL xgboost_1.0.0.1.tar.gz
  • JVM packages. RC2 available from the Releases section. Download the JAR files xgboost4j_2.12-1.0.0-RC2.jar and xgboost4j-spark_2.12-1.0.0-RC2.jar and run
mvn install:install-file -Dfile=./xgboost4j_2.12-1.0.0-RC2.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j_2.12 -Dversion=1.0.0-RC2 -Dpackaging=jar
mvn install:install-file -Dfile=./xgboost4j-spark_2.12-1.0.0-RC2.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j-spark_2.12 -Dversion=1.0.0-RC2 -Dpackaging=jar

to install the JARs into your local Maven repository. Now you should be able to add XGBoost4J and XGBoost4J-Spark as Maven dependencies:

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>1.0.0-RC2</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.12</artifactId>
    <version>1.0.0-RC2</version>
</dependency>

TODOs

  • Create a new branch release_1.0.0.
  • Create Python wheels and upload to the Releases section and PyPI (use pre-release mechanism)
  • Create JAR files for the JVM packages and upload to the Releases section
  • Create a tarball for the R package and upload to the Releases section
  • Write a summary of 1.0.0 release

Outstanding patches that should make it into the 1.0.0 release:

Merged after RC1:

Merged after RC2

Known limitation

  • When training parameter reg_lambda is set to zero, some leaf nodes may be assigned a NaN value. (See discussion) For now, please set reg_lambda to a nonzero value.
@hcho3 hcho3 pinned this issue Jan 31, 2020
@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Jan 31, 2020

@RAMitchell @trivialfis For some reason, the Linux binary wheel is now 208 MB, exceeding the 200 MB size limit. Let us find ways to reduce it. The last stable release was only 142.8 MB.

Update. libxgboost.so is copied twice into the wheel:

$ zipinfo -1 xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl | grep libxgboost.so
xgboost/lib/libxgboost.so
xgboost-1.0.0rc1.data/data/xgboost/libxgboost.so

For now, I am removing one manually, but ideally CI should automatically de-duplicate the so file.

@hcho3 hcho3 mentioned this issue Jan 31, 2020
8 of 9 tasks complete
@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Jan 31, 2020

We can remove the duplication. I consider it as a bug

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Jan 31, 2020

@trivialfis Should I try to fix CI so that libxgboost.so is not duplicated? I modified (by hand) the whl file and uploaded the modified version to PyPI: https://pypi.org/project/xgboost/1.0.0rc1/#files

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Jan 31, 2020

For this release we might have to make do with manually removing it. But I will refactor the setup script so it can be more friendly.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Jan 31, 2020

The R package is producing a warning about the use of std::cout in the codebase:

* checking compiled code ... NOTE
File ‘xgboost/libs/xgboost.so’:
  Found ‘_ZSt4cout’, possibly from ‘std::cout’ (C++)
    Object: ‘./amalgamation/xgboost-all0.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

CRAN may not accept XGBoost if this warning persists.

There are two places where std::cout is used:

  • src/common/timer.cc: it should be sufficient to replace std::cout with LOG(CONSOLE).
  • src/common/observer.h: since the Observer is a debugging feature, I suggest we disable it for the R package.
@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Jan 31, 2020

Agreed! Just replace/disable it.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Jan 31, 2020

@trams @CodingCat It's my first time to create JAR artifacts for XGBoost4J. I used CentOS 6 Docker image to compile the native lib. I'd love to get your feedback and learn the best practices for making releases in Java world.

@ankane

This comment has been minimized.

Copy link
Contributor

@ankane ankane commented Feb 1, 2020

This is great! Happy to report it works well with:

(both commits in branches)

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

@ankane That's good to know. Thanks! One tidbit: XGBoost now requires CMake 3.16+ for Mac target, so we no longer require the extra CMake flags. So we should be able to remove the following lines:

      args << "-DOpenMP_C_FLAGS=\"-Xpreprocessor -fopenmp -I#{libomp.opt_include}\""
      args << "-DOpenMP_C_LIB_NAMES=omp"
      args << "-DOpenMP_CXX_FLAGS=\"-Xpreprocessor -fopenmp -I#{libomp.opt_include}\""
      args << "-DOpenMP_CXX_LIB_NAMES=omp"
      args << "-DOpenMP_omp_LIBRARY=#{libomp.opt_lib}/libomp.dylib"

https://github.com/ankane/homebrew-core/blob/1f39811c129a2a36d391368d5791fdff257b5f94/Formula/xgboost.rb#L22-L26

That is, cmake .. will just work, as long as CMake 3.16+ is installed.

@ankane

This comment has been minimized.

Copy link
Contributor

@ankane ankane commented Feb 1, 2020

It worked without the flags outside of Homebrew (https://github.com/ankane/ml-builds/compare/xgboost-1-0), but needed them inside the Homebrew environment for some reason. I believe Homebrew modifies some of the build flags in their compiler shim.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

@ankane Ah I see. Glad to hear that we finally have OpenMP-enabled XGBoost in Homebrew!

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Feb 1, 2020

@hcho3 Out of curious

xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl

Does py2 in the file name imply we support Python2?

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

@trivialfis No, it does not, since we have this line

python_requires='>=3.5',

However, let me rename the wheel to use py3 exclusively.

@trams

This comment has been minimized.

Copy link
Contributor

@trams trams commented Feb 1, 2020

@akimboyko, do you guys want to take a look on this RC?

@trams

This comment has been minimized.

Copy link
Contributor

@trams trams commented Feb 1, 2020

@hcho3 You did right!
Building JARs is like building wheels. It is not java part which is problematic but rather native libxgboost.so library. What you generally want from this library is to depend on the oldest possible version of LIBC. The idea is that libc is backward compatible on a binary level. So everything which compiled with libc version X will work for any version Y as long as Y >= X

What you do not want to do is build this native library on a new Ubuntu (let's say 19.10) and start a distributed training on a cluster with old Ubuntu (for example, 18.04 LTS). Dynamic linker won't find a new libc symbols your libxgboost.so will be referencing. I've done this at least once by accident :)

You can use nm command to see all undefined symbols in a so file

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

Adding a known limitation about needing a nonzero reg_lambda parameter. cc @trivialfis

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Feb 1, 2020

@hcho3 I don't think that's a very serious problem, as least not among machine learning libraries. Just state that if NAN is encountered try adjusting lambda.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

"Known limitations" will be part of the Release Note.

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Feb 1, 2020

Got it . Thanks!

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 1, 2020

@trams That's good to hear. In the future, I may want to spend more time on the distributed portion of XGBoost.

@trivialfis

This comment has been minimized.

Copy link
Member

@trivialfis trivialfis commented Feb 2, 2020

@hcho3 Can we add Python 3.8 support by adding this to setup.py?

'Programming Language :: Python :: 3.8'

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 14, 2020

It looks like all blockers are resolved. I will push out 1.0 release by end of this week. I am putting up RC2 tonight, to allow users to try it. #5281 made a non-trivial change to serialization logic, so I am extending the feedback period to February 17, 2020 to ensure that nothing is broken.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 14, 2020

RC2 is now available.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

@hcho3 hcho3 commented Feb 18, 2020

As promised, I will now commence my work on 1.0.0 release. I am currently going through the 311 commits that have been made since 0.90 and summarize what they are about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.