# XGBoost.jl

https://github.com/dmlc/XGBoost.jl

# Summary

XGBoost.jl is an interface package for XGBoost (https://github.com/tqchen/xgboost) which is an efficient and scalable implementation of
gradient boosting.

The core functionality revolves around *xgboost* which calls a C 
wrapper to implement the boosted trees. K-fold cross validation is
included for *xgboost* parameters. Julia matrix/array, 
SparseMatrixCSC, libSVM, and the XGBoost binary file format are
all accepted data types. Custom loss metrics are supported.

# Details

| Test                      | Results                           |            
| :- | :- |
| Packages works            | yes                               |
| Deprecation warnings      | None                              |
| Compatible with JuliaDB   | If tables are converted to arrays |
| Contains Documetation     | Points to XGBoost (i.e not Julia specific)             |
| Simplicity                | Good |

# Functions

xgboost(data, label, args...)

    Arguments: nrounds::Integer;, param = [], 
               watchlist = [], metrics = [], obj = Union{}, 
               feval = Union{}, group = [], kwargs...

predict(Booster, data, args...)

    Arguments: output_margin::Bool = false, 
               ntree_limit::Integer = 0
nfold_cv(data, args...)

    Arguments: num_boost_round::Integer = 10, 
               nfold::Integer = 3; label = Union{},
               param=[], metrics=[], obj = Union{}, 
               feval = Union{}, fpreproc = Union{},
               show_stdv = true, seed::Integer = 0, kwargs...

# Example Code

In [2]:
using XGBoost
include("resources/load_titanic.jl")

load (generic function with 2 methods)

In [3]:
train, train_targets, test, test_targets = load()

([1.0 3.0 … 7.25 0.0; 2.0 1.0 … 71.2833 1.0; … ; 887.0 2.0 … 13.0 0.0; 888.0 1.0 … 30.0 0.0], [0, 1, 1, 1, 0, 0, 1, 1, 1, 1  …  0, 1, 1, 0, 0, 0, 0, 0, 0, 1], [5.0 3.0 … 8.05 0.0; 14.0 3.0 … 31.275 0.0; … ; 890.0 1.0 … 30.0 1.0; 891.0 3.0 … 7.75 2.0], [0, 0, 0, 1, 0, 0, 0, 0, 1, 0  …  0, 0, 1, 1, 0, 0, 0, 0, 1, 0])

### K-fold cross validation for max_depth and eta (learning rate)

In [5]:
nfold = 5
param = ["max_depth" => 2,
         "eta" => 1,
         "objective" => "binary:logistic"]
metrics = ["auc"]
nfold_cv(train, num_round=2, nfold, label = train_targets, param = param, metrics = metrics)

[11:21:39] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[11:21:39] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[11:21:39] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[1]	cv-test-auc:0.798790+0.036774	cv-train-auc:0.830975+0.016641
[11:21:40] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[11:21:40] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[11:21:40] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[2]	cv-test-auc:0.847617+0.027814	cv-train-auc:0.865430+0.014302
[11:21:40] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
[11:21:40] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned n

In [6]:
num_round = 2
bst = xgboost(train, num_round, label=train_targets, eta=1, max_depth=2)

pred = predict(bst, test)
print("test-error=", sum((pred .> 0.5) .!= test_targets) / float(size(pred)[1]), "\n")

[1]	train-rmse:0.378630
[2]	train-rmse:0.361474


test-error=0.175
