Permalink
Commits on Jan 21, 2017
  1. Disallow multiple roots for tree_method=hist (#1979)

    As discussed in issue #1978, tree_method=hist ignores the parameter
    param.num_roots; it simply assumes that the tree has only one root. In
    particular, when InitData() method initializes row_set_collection_, it simply
    assigns all rows to node 0, the value that's hard-coded.
    
    For now, the updater will simply fail when num_roots exceeds 1. I will revise
    the updater soon to support multiple roots.
    hcho3 committed with tqchen Jan 21, 2017
  2. adding sample weights for XGBRegressor (was this forgotten?) (#1874)

    vatsan committed with tqchen Jan 21, 2017
  3. [R] various R code maintenance (#1964)

    * [R] xgb.save must work when handle in nil but raw exists
    
    * [R] print.xgb.Booster should still print other info when handle is nil
    
    * [R] rename internal function xgb.Booster to xgb.Booster.handle to make its intent clear
    
    * [R] rename xgb.Booster.check to xgb.Booster.complete and make it visible; more docs
    
    * [R] storing evaluation_log should depend only on watchlist, not on verbose
    
    * [R] reduce the excessive chattiness of unit tests
    
    * [R] only disable some tests in windows when it's not 64-bit
    
    * [R] clean-up xgb.DMatrix
    
    * [R] test xgb.DMatrix loading from libsvm text file
    
    * [R] store feature_names in xgb.Booster, use them from utility functions
    
    * [R] remove non-functional co-occurence computation from xgb.importance
    
    * [R] verbose=0 is enough without a callback
    
    * [R] added forgotten xgb.Booster.complete.Rd; cran check fixes
    
    * [R] update installation instructions
    khotilov committed with tqchen Jan 21, 2017
Commits on Jan 18, 2017
  1. fix ylim with max_num_features in python plot_importance (#1974)

    wxchan committed with tqchen Jan 18, 2017
Commits on Jan 16, 2017
  1. added the max_features parameter to the plot_importance function. (#1963

    )
    
    * added the max_features parameter to the plot_importance function.
    
    * renamed max_features parameter to max_num_features for better understanding
    
    * removed unwanted character in docstring
    naileakim committed with tqchen Jan 16, 2017
Commits on Jan 13, 2017
  1. Rename parameter in fast_hist to disambiguate (#1962)

    hcho3 committed with tqchen Jan 13, 2017
  2. Histogram Optimized Tree Grower (#1940)

    * Support histogram-based algorithm + multiple tree growing strategy
    
    * Add a brand new updater to support histogram-based algorithm, which buckets
      continuous features into discrete bins to speed up training. To use it, set
      `tree_method = fast_hist` to configuration.
    * Support multiple tree growing strategies. For now, two policies are supported:
      * `grow_policy=depthwise` (default):  favor splitting at nodes closest to the
        root, i.e. grow depth-wise.
      * `grow_policy=lossguide`: favor splitting at nodes with highest loss change
    * Improve single-threaded performance
      * Unroll critical loops
      * Introduce specialized code for dense data (i.e. no missing values)
    * Additional training parameters: `max_leaves`, `max_bin`, `grow_policy`, `verbose`
    
    * Adding a small test for hist method
    
    * Fix memory error in row_set.h
    
    When std::vector is resized, a reference to one of its element may become
    stale. Any such reference must be updated as well.
    
    * Resolve cross-platform compilation issues
    
    * Versions of g++ older than 4.8 lacks support for a few C++11 features, e.g.
      alignas(*) and new initializer syntax. To support g++ 4.6, use pre-C++11
      initializer and remove alignas(*).
    * Versions of MSVC older than 2015 does not support alignas(*). To support
      MSVC 2012, remove alignas(*).
    * For g++ 4.8 and newer, alignas(*) is enabled for performance benefits.
    * Some old compilers (MSVC 2012, g++ 4.6) do not support template aliases
      (which uses `using` to declate type aliases). So always use `typedef`.
    
    * Fix a host of CI issues
    
    * Remove dependency for libz on osx
    * Fix heading for hist_util
    * Fix minor style issues
    * Add missing #include
    * Remove extraneous logging
    
    * Enable tree_method=hist in R
    
    * Renaming HistMaker to GHistBuilder to avoid confusion
    
    * Fix R integration
    
    * Respond to style comments
    
    * Consistent tie-breaking for priority queue using timestamps
    
    * Last-minute style fixes
    
    * Fix issuecomment-271977647
    
    The way we quantize data is broken. The agaricus data consists of all
    categorical values. When NAs are converted into 0's,
    `HistCutMatrix::Init` assign both 0's and 1's to the same single bin.
    
    Why? gmat only the smallest value (0) and an upper bound (2), which is twice
    the maximum value (1). Add the maximum value itself to gmat to fix the issue.
    
    * Fix issuecomment-272266358
    
    * Remove padding from cut values for the continuous case
    * For categorical/ordinal values, use midpoints as bin boundaries to be safe
    
    * Fix CI issue -- do not use xrange(*)
    
    * Fix corner case in quantile sketch
    
    Signed-off-by: Philip Cho <chohyu01@cs.washington.edu>
    
    * Adding a test for an edge case in quantile sketcher
    
    max_bin=2 used to cause an exception.
    
    * Fix fast_hist test
    
    The test used to require a strictly increasing Test AUC for all examples.
    One of them exhibits a small blip in Test AUC before achieving a Test AUC
    of 1. (See bottom.)
    
    Solution: do not require monotonic increase for this particular example.
    
    [0] train-auc:0.99989 test-auc:0.999497
    [1] train-auc:1 test-auc:0.999749
    [2] train-auc:1 test-auc:0.999749
    [3] train-auc:1 test-auc:0.999749
    [4] train-auc:1 test-auc:0.999749
    [5] train-auc:1 test-auc:0.999497
    [6] train-auc:1 test-auc:1
    [7] train-auc:1 test-auc:1
    [8] train-auc:1 test-auc:1
    [9] train-auc:1 test-auc:1
    hcho3 committed with tqchen Jan 13, 2017
Commits on Jan 9, 2017
  1. Validation Typo (#1949)

    change valudation to validation
    Luckick committed with tqchen Jan 9, 2017
  2. Make lib path relatrive to fix setup error #1932 (#1947)

    diver-in-sky committed with tqchen Jan 9, 2017
Commits on Jan 6, 2017
  1. [R] fix #1903 (#1929)

    khotilov committed with tqchen Jan 6, 2017
  2. [R] xgb.plot.tree fixes (#1939)

    * [R] a few fixes and improvements to xgb.plot.tree
    
    * [R] deprecate n_first_tree replace with trees; fix types in xgb.model.dt.tree
    khotilov committed with tqchen Jan 6, 2017
  3. An option for doing binomial+1 or epsilon-dropout from DART paper (#1922

    )
    
    * An option for doing binomial+1 or epsilon-dropout from DART paper
    
    * use callback-based discrete_distribution to make MSVC2013 happy
    khotilov committed with tqchen Jan 6, 2017
Commits on Jan 5, 2017
  1. 0.6-4 submission (#1935)

    hetong007 committed on GitHub Jan 5, 2017
Commits on Jan 2, 2017
  1. Fix comment in cross_validation.py (#1923)

    cv() doesn't output std_value because show_stdv is set to False.
    mnogu committed with terrytangyuan Jan 2, 2017
Commits on Dec 31, 2016
  1. Correcting small typos in documentation. (#1901)

    willettk committed with tqchen Dec 31, 2016
  2. [R] Increase the version number, date and required R version (#1920)

    * remove unnecessary line
    hetong007 committed on GitHub Dec 31, 2016
Commits on Dec 28, 2016
  1. disable openmp on solaris (#1912)

    thirdwing committed with hetong007 Dec 28, 2016
Commits on Dec 26, 2016
  1. cross_validation is included in model_selection module since sklearn …

    …0.18 (#1908)
    adamist521 committed with terrytangyuan Dec 26, 2016
Commits on Dec 23, 2016
  1. Fix cmake build for linux. Update GPU benchmarks. (#1904)

    RAMitchell committed with tqchen Dec 23, 2016
Commits on Dec 22, 2016
  1. option to shuffle data in mknfolds (#1459)

    * option to shuffle data in mknfolds
    
    * removed possibility to run as stand alone test
    
    * split function def in 2 lines for lint
    
    * option to shuffle data in mknfolds
    
    * removed possibility to run as stand alone test
    
    * split function def in 2 lines for lint
    jokari69 committed with terrytangyuan Dec 22, 2016
  2. GPU Plugin: Add subsample, colsample_bytree, colsample_bylevel (#1895)

    RAMitchell committed with tqchen Dec 22, 2016
Commits on Dec 19, 2016
  1. fix dart bug (#1882)

    wxchan committed with tqchen Dec 19, 2016
Commits on Dec 18, 2016
  1. Bump up version number, add cleanup script (#1886)

    * fix cran check
    
    * change required R version because of utils::globalVariables
    
    * temporary commit, monotone not working
    
    * fix test
    
    * fix doc
    
    * fix doc
    
    * fix cran note and warning
    
    * improve checks
    
    * fix urls
    
    * fix cran check
    
    * add cleanup and bump up version number
    
    * use clean in build
    
    * Update Makefile
    hetong007 committed on GitHub Dec 18, 2016
Commits on Dec 16, 2016
  1. [R Package] Use the C++ 11 compiler to test OpenMP flags (#1881)

    * fix segfault when gctorture() is enabled
    
    * use the C++ 11 compiler to test OpenMP flags
    
    * auto-generated configure script
    yixuan committed with hetong007 Dec 16, 2016
  2. autoconf for solaris (#1880)

    thirdwing committed with tqchen Dec 16, 2016
Commits on Dec 15, 2016
  1. [R] Fix for cran submission of xgboost 0.6 (#1875)

    fix cran check
    hetong007 committed on GitHub Dec 15, 2016
  2. GPU Plugin: Add bosch demo, update build instructions (#1872)

    RAMitchell committed with tqchen Dec 15, 2016
  3. Add monotonic tutorial. (#1870)

    madrury committed with terrytangyuan Dec 15, 2016
Commits on Dec 13, 2016
  1. python package tree plotting support fmap (#1856)

    * to_graphviz and plot_tree support fmap
    
    * [python-package] add model_plot docstring
    Cynus committed with terrytangyuan Dec 13, 2016
Commits on Dec 11, 2016
  1. fix typo in comment. (#1850)

    Liam0205 committed with tqchen Dec 11, 2016
  2. [R-package] JSON dump format and a couple of bugfixes (#1855)

    * [R-package] JSON tree dump interface
    
    * [R-package] precision bugfix in xgb.attributes
    
    * [R-package] bugfix for cb.early.stop called from xgb.cv
    
    * [R-package] a bit more clarity on labels checking in xgb.cv
    
    * [R-package] test JSON dump for gblinear as well
    
    * whitespace lint
    khotilov committed with tqchen Dec 11, 2016
  3. config.mk: Set TEST_COVER to 0 by default (#1853)

    Set the TEST_COVER to 0 by default so it uses optimization
    -O3 when compiling.
    AbdealiJK committed with tqchen Dec 11, 2016
Commits on Dec 9, 2016
  1. refactor duplicate evaluation implementation (#1852)

    fromradio committed with CodingCat Dec 9, 2016
Commits on Dec 8, 2016
  1. Add benchmarks, fix GCC build (#1848)

    RAMitchell committed with tqchen Dec 8, 2016
Commits on Dec 7, 2016
  1. [jvm-packages] Scala implementation of the Rabit tracker. (#1612)

    * [jvm-packages] Scala implementation of the Rabit tracker.
    
    A Scala implementation of RabitTracker that is interface-interchangable with the
    Java implementation, ported from `tracker.py` in the
    [dmlc-core project](https://github.com/dmlc/dmlc-core).
    
    * [jvm-packages] Updated Akka dependency in pom.xml.
    
    * Refactored the RabitTracker directory structure.
    
    * Fixed premature stopping of connection handler.
    
    Added a new finite state "AwaitingPortNumber" to explicitly wait for the
    worker to send the port, and close the connection. Stopping the actor
    prematurely sends a TCP RST to the worker, causing the worker to crash
    on AssertionError.
    
    * Added interface IRabitTracker so that user can switch implementations.
    
    * Default timeout duration changes.
    
    * Dependency for Akka tests.
    
    * Removed the main function of RabitTracker.
    
    * A skeleton for testing Akka-based Rabit tracker.
    
    * waitFor() in RabitTracker no longer throws exceptions.
    
    * Completed unit test for the 'start' command of Rabit tracker.
    
    * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.)
    
    * Fixed the default timeout duration.
    
    * Use Java container to avoid serialization issues due to intermediate wrappers.
    
    * Added tests for Allreduce/model training using Scala Rabit tracker.
    
    * Added spill-over unit test for the Scala Rabit tracker.
    
    * Fixed a typo.
    
    * Overhaul of RabitTracker interface per code review.
    
      - Removed methods start() waitFor() (no arguments) from IRabitTracker.
      - The timeout in start(timeout) is now worker connection timeout, as tcp
        socket binding timeout is less intuitive.
      - Dropped time unit from start(...) and waitFor(...) methods; the default
        time unit is millisecond.
      - Moved random port number generation into the RabitTrackerHandler.
      - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit.
    
    * More code refactoring and comments.
    
    * Unified timeout constants. Readable tracker status code.
    
    * Add comments to indicate that allReduce is for tests only. Removed all other variants.
    
    * Removed unused imports.
    
    * Simplified signatures of training methods.
    
     - Moved TrackerConf into parameter map.
     - Changed GeneralParams so that TrackerConf becomes a standalone parameter.
     - Updated test cases accordingly.
    
    * Changed monitoring strategies.
    
    * Reverted monitoring changes.
    
    * Update test case for Rabit AllReduce.
    
    * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers.
    
    * More comprehensive test cases for exception handling and worker connection timeout.
    
    * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case.
    
    * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code.
    
    * Reverted scalastyle-config changes.
    
    * Visibility scope change. Interface tweaks.
    
    * Use match pattern to handle tracker_conf parameter.
    
    * Minor clarification in JNI code.
    
    * Clearer intent in match pattern to suppress warnings.
    
    * Removed Future from constructor. Block in start() and waitFor() instead.
    
    * Revert inadvertent comment changes.
    
    * Removed debugging information.
    
    * Updated test cases that are a bit finicky.
    
    * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.
    xydrolase committed with CodingCat Dec 7, 2016