***Breast Cancer Diagnostic using XGBoost in Julia***

***Breast Cancer Wisconsin (Diagnostic) Data Set***

**Predict wheter the cancer is benign or malignant**

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant



# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.7.2" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools Plots"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.7.2 on the current Colab Runtime...
2022-05-17 18:12:42 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.7/julia-1.7.2-linux-x86_64.tar.gz [123295596/123295596] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...
Installing Julia package Plots...
Installing Julia package CUDA...
Installing IJulia kernel...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mInstalling julia kernelspec in /root/.local/share/jupyter/kernels/julia-1.7

Successfully installed julia version 1.7.2!
Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then
jump to the 'Checking the Installation' section.




# Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system:

In [1]:
versioninfo()

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2


In [2]:
using BenchmarkTools

M = rand(2^11, 2^11)

@btime $M * $M;

  432.777 ms (2 allocations: 32.00 MiB)


In [3]:
if ENV["COLAB_GPU"] == "1"
    using CUDA

    run(`nvidia-smi`)

    # Create a new random matrix directly on the GPU:
    M_on_gpu = CUDA.CURAND.rand(2^11, 2^11)
    @btime $M_on_gpu * $M_on_gpu; nothing
else
    println("No GPU found.")
end

Tue May 17 18:29:36 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

**Set up Julia in Google colab**

In [4]:
]  add "https://github.com/dmlc/XGBoost.jl.git"

[32m[1m     Cloning[22m[39m git-repo `https://github.com/dmlc/XGBoost.jl.git`
[32m[1m    Updating[22m[39m git-repo `https://github.com/dmlc/XGBoost.jl.git`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m XGBoost_jll ─ v1.5.2+0
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [009559a3] [39m[92m+ XGBoost v1.5.2 `https://github.com/dmlc/XGBoost.jl.git#master`[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Manifest.toml`
 [90m [009559a3] [39m[92m+ XGBoost v1.5.2 `https://github.com/dmlc/XGBoost.jl.git#master`[39m
 [90m [a5c6f535] [39m[92m+ XGBoost_jll v1.5.2+0[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39m[90mXGBoost_jll[39m
[32m  ✓ [39mXGBoost
  2 dependencies successfully precompiled in 2 seconds (150 already precompiled, 3 skipped during auto due to previous errors)


**Set up XGBoost**

In [5]:
] build XGBoost

In [6]:
using XGBoost

In [8]:
import Pkg; Pkg.add("DataFrames")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m Crayons ───────── v4.1.1
[32m[1m   Installed[22m[39m InvertedIndices ─ v1.1.0
[32m[1m   Installed[22m[39m PooledArrays ──── v1.4.2
[32m[1m   Installed[22m[39m DataFrames ────── v1.3.4
[32m[1m   Installed[22m[39m PrettyTables ──── v1.3.1
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [a93c6f00] [39m[92m+ DataFrames v1.3.4[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Manifest.toml`
 [90m [a8cc5b0e] [39m[92m+ Crayons v4.1.1[39m
 [90m [a93c6f00] [39m[92m+ DataFrames v1.3.4[39m
 [90m [41ab1584] [39m[92m+ InvertedIndices v1.1.0[39m
 [90m [2dfb63ee] [39m[92m+ PooledArrays v1.4.2[39m
 [90m [08abe8d2] [39m[92m+ PrettyTables v1.3.1[39m
 [90m [9fa8497b] [39m[92m+ Future[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39m[90mInvertedIndices[39m
[32m  ✓ [39m[90mPooledArrays[39m
[32m  ✓ [39m[90mCrayons[

In [9]:
using DataFrames

In [11]:
import Pkg; Pkg.add("CSV")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m CodecZlib ────────── v0.7.0
[32m[1m   Installed[22m[39m SentinelArrays ───── v1.3.12
[32m[1m   Installed[22m[39m WeakRefStrings ───── v1.4.2
[32m[1m   Installed[22m[39m InlineStrings ────── v1.1.2
[32m[1m   Installed[22m[39m FilePathsBase ────── v0.9.18
[32m[1m   Installed[22m[39m TranscodingStreams ─ v0.9.6
[32m[1m   Installed[22m[39m CSV ──────────────── v0.10.4
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [336ed68f] [39m[92m+ CSV v0.10.4[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Manifest.toml`
 [90m [336ed68f] [39m[92m+ CSV v0.10.4[39m
 [90m [944b1d66] [39m[92m+ CodecZlib v0.7.0[39m
 [90m [48062228] [39m[92m+ FilePathsBase v0.9.18[39m
 [90m [842dd82b] [39m[92m+ InlineStrings v1.1.2[39m
 [90m [91c51154] [39m[92m+ SentinelArrays v1.3.12[39m
 [90m [3bb67fe8] [39m[92m+ TranscodingStreams v0.9.6[39

In [12]:
using CSV

In [19]:
dataset = CSV.read(
    download("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"), 
    DataFrame)

Unnamed: 0_level_0,842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001
Unnamed: 0_level_1,Int64,String1,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198
5,843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578
6,844359,M,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127
7,84458202,M,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366
8,844981,M,13.0,21.82,87.5,519.8,0.1273,0.1932,0.1859
9,84501001,M,12.46,24.04,83.97,475.9,0.1186,0.2396,0.2273
10,845636,M,16.02,23.24,102.7,797.8,0.08206,0.06669,0.03299


In [20]:
(numrows,numcolumns) = size(dataset)

(568, 32)

In [51]:
# Number Malignant
forCnt = dataset[:,[:M]]
size(filter(row -> row.M =="M", forCnt))

(211, 1)

In [52]:
# Number Benign
size(filter(row -> row.M =="B", forCnt))

(357, 1)

In [53]:
describe(dataset)

Unnamed: 0_level_0,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,842302,3.04238e7,8670,906157.0,911320502,0,Int64
2,M,,B,,M,0,String1
3,17.99,14.1205,6.981,13.355,28.11,0,Float64
4,10.38,19.3053,9.71,18.855,39.28,0,Float64
5,122.8,91.9148,43.79,86.21,188.5,0,Float64
6,1001,654.28,143.5,548.75,2501.0,0,Float64
7,0.1184,0.0963215,0.05263,0.095865,0.1634,0,Float64
8,0.2776,0.104036,0.01938,0.092525,0.3454,0,Float64
9,0.3001,0.0884273,0.0,0.0614,0.4268,0,Float64
10,0.1471,0.0487463,0.0,0.033455,0.2012,0,Float64


**Data Preparation**

Convert the dataframe into an x array for features and the y vector for
 results of benign or malignant. Additionally, convert the array of strings "B" and "M" into integer values 0 and 1 respectively:

In [71]:
x = Matrix(dataset[:,3:32])

568×30 Matrix{Float64}:
 20.57   17.77  132.9   1326.0  0.08474  …  0.2416  0.186    0.275   0.08902
 19.69   21.25  130.0   1203.0  0.1096      0.4504  0.243    0.3613  0.08758
 11.42   20.38   77.58   386.1  0.1425      0.6869  0.2575   0.6638  0.173
 20.29   14.34  135.1   1297.0  0.1003      0.4     0.1625   0.2364  0.07678
 12.45   15.7    82.57   477.1  0.1278      0.5355  0.1741   0.3985  0.1244
 18.25   19.98  119.6   1040.0  0.09463  …  0.3784  0.1932   0.3063  0.08368
 13.71   20.83   90.2    577.9  0.1189      0.2678  0.1556   0.3196  0.1151
 13.0    21.82   87.5    519.8  0.1273      0.539   0.206    0.4378  0.1072
 12.46   24.04   83.97   475.9  0.1186      1.105   0.221    0.4366  0.2075
 16.02   23.24  102.7    797.8  0.08206     0.1459  0.09975  0.2948  0.08452
 15.78   17.89  103.6    781.0  0.0971   …  0.3965  0.181    0.3792  0.1048
 19.17   24.8   132.4   1123.0  0.0974      0.3639  0.1767   0.3176  0.1023
 15.85   23.95  103.7    782.7  0.08401     0.2322  0.1119  

In [72]:
typeof(x)

Matrix{Float64} (alias for Array{Float64, 2})

In [85]:

y = Vector(map(element -> element == "B" ? 0 : 1, dataset[!,:M]))

568-element Vector{Int64}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 ⋮
 0
 0
 0
 0
 0
 1
 1
 1
 1
 1
 1
 0

In [80]:
typeof(y)

Vector{Int64} (alias for Array{Int64, 1})

In [86]:
# Convert dataframes x, y to Arrays x, y
Matrix(x)

568×30 Matrix{Float64}:
 20.57   17.77  132.9   1326.0  0.08474  …  0.2416  0.186    0.275   0.08902
 19.69   21.25  130.0   1203.0  0.1096      0.4504  0.243    0.3613  0.08758
 11.42   20.38   77.58   386.1  0.1425      0.6869  0.2575   0.6638  0.173
 20.29   14.34  135.1   1297.0  0.1003      0.4     0.1625   0.2364  0.07678
 12.45   15.7    82.57   477.1  0.1278      0.5355  0.1741   0.3985  0.1244
 18.25   19.98  119.6   1040.0  0.09463  …  0.3784  0.1932   0.3063  0.08368
 13.71   20.83   90.2    577.9  0.1189      0.2678  0.1556   0.3196  0.1151
 13.0    21.82   87.5    519.8  0.1273      0.539   0.206    0.4378  0.1072
 12.46   24.04   83.97   475.9  0.1186      1.105   0.221    0.4366  0.2075
 16.02   23.24  102.7    797.8  0.08206     0.1459  0.09975  0.2948  0.08452
 15.78   17.89  103.6    781.0  0.0971   …  0.3965  0.181    0.3792  0.1048
 19.17   24.8   132.4   1123.0  0.0974      0.3639  0.1767   0.3176  0.1023
 15.85   23.95  103.7    782.7  0.08401     0.2322  0.1119  

In [82]:
import Pkg; Pkg.add("MLDataUtils")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m MappedArrays ── v0.4.1
[32m[1m   Installed[22m[39m LearnBase ───── v0.3.0
[32m[1m   Installed[22m[39m MLLabelUtils ── v0.5.7
[32m[1m   Installed[22m[39m MLDataPattern ─ v0.5.4
[32m[1m   Installed[22m[39m MLDataUtils ─── v0.5.4
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [cc2ba9b6] [39m[92m+ MLDataUtils v0.5.4[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Manifest.toml`
 [90m [7f8f8fb0] [39m[92m+ LearnBase v0.3.0[39m
 [90m [9920b226] [39m[92m+ MLDataPattern v0.5.4[39m
 [90m [cc2ba9b6] [39m[92m+ MLDataUtils v0.5.4[39m
 [90m [66a33bbf] [39m[92m+ MLLabelUtils v0.5.7[39m
 [90m [dbb5928d] [39m[92m+ MappedArrays v0.4.1[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39m[90mMappedArrays[39m
[32m  ✓ [39m[90mLearnBase[39m
[32m  ✓ [39m[90mMLLabelUtils[39m
[32m  ✓ [39m[90mMLDataPattern[39m
[32m  ✓ 

In [88]:
using MLDataUtils

Randomize the data rows so we don't pull only one cancer classification.
Transpose(x) so it properly aligns with the output dimensions

In [89]:
Xs, Ys = shuffleobs((transpose(x), y))

([17.05 14.61 … 12.56 12.23; 19.08 15.69 … 19.07 19.56; … ; 0.3109 0.253 … 0.2121 0.2668; 0.09061 0.05695 … 0.07188 0.08174], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0  …  0, 0, 0, 1, 0, 0, 0, 0, 0, 0])

In [None]:
#Split data into training and test sets {2/3 training; 1/3 test}

In [90]:
(X_train1, y_train1), (X_test1, y_test1) = splitobs((Xs, Ys); at = 0.67)

(([17.05 14.61 … 13.96 13.0; 19.08 15.69 … 17.05 21.82; … ; 0.3109 0.253 … 0.3068 0.4378; 0.09061 0.05695 … 0.07957 0.1072], [1, 0, 0, 0, 0, 0, 0, 1, 0, 0  …  1, 0, 0, 1, 0, 0, 0, 1, 1, 1]), ([12.32 18.61 … 12.56 12.23; 12.39 20.25 … 19.07 19.56; … ; 0.2827 0.2341 … 0.2121 0.2668; 0.06771 0.07421 … 0.07188 0.08174], [0, 1, 0, 1, 1, 0, 1, 0, 1, 0  …  0, 0, 0, 1, 0, 0, 0, 0, 0, 0]))

In [91]:
#Transpose x back 
    x_train = Array(transpose(X_train1))
    y_train = Array(y_train1)
    x_test = Array(transpose(X_test1))
    y_test = Array(y_test1)

187-element Vector{Int64}:
 0
 1
 0
 1
 1
 0
 1
 0
 1
 0
 0
 0
 1
 ⋮
 0
 0
 0
 0
 0
 1
 0
 0
 0
 0
 0
 0

**Hyperparameters for Tunning**

eta [default=0.3, alias: learning_rate]
Step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features, and eta shrinks the feature weights to make the boosting process more conservative.
range: [0,1]

max_depth [default=6]
Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 indicates no limit on depth. Beware that XGBoost aggressively consumes memory when training a deep tree. exact tree method requires non-zero value.
range: [0,∞]






In [92]:
dtrain = DMatrix(x_train, label = y_train)

DMatrix(Ptr{Nothing} @0x0000000015a0d960, XGBoost.var"#_setinfo#8"())

In [118]:
boost = xgboost(dtrain, num_round = 2, 100, eta = 1 , objective = "binary:logistic")

[1]	train-logloss:0.186739
[2]	train-logloss:0.090753
[3]	train-logloss:0.053734
[4]	train-logloss:0.035809
[5]	train-logloss:0.027452
[6]	train-logloss:0.021554
[7]	train-logloss:0.017867
[8]	train-logloss:0.015599
[9]	train-logloss:0.012860
[10]	train-logloss:0.011259
[11]	train-logloss:0.009873
[12]	train-logloss:0.009150
[13]	train-logloss:0.008803
[14]	train-logloss:0.008533
[15]	train-logloss:0.008328
[16]	train-logloss:0.008059
[17]	train-logloss:0.007796
[18]	train-logloss:0.007525
[19]	train-logloss:0.007355
[20]	train-logloss:0.007127
[21]	train-logloss:0.006959
[22]	train-logloss:0.006815
[23]	train-logloss:0.006647
[24]	train-logloss:0.006532
[25]	train-logloss:0.006415
[26]	train-logloss:0.006326
[27]	train-logloss:0.006224
[28]	train-logloss:0.006145
[29]	train-logloss:0.006067
[30]	train-logloss:0.005999
[31]	train-logloss:0.005933
[32]	train-logloss:0.005890
[33]	train-logloss:0.005825
[34]	train-logloss:0.005786
[35]	train-logloss:0.005737
[36]	train-logloss:0.005695
[

Booster(Ptr{Nothing} @0x0000000014b5a9c0)

In [119]:
prediction = XGBoost.predict(boost, x_test)

187-element Vector{Float32}:
 0.000115474846
 0.99954456
 0.00017906365
 0.99991167
 0.9999027
 0.0003317018
 0.9996939
 6.926588f-5
 0.9999306
 0.9237183
 0.010344508
 0.00014393381
 0.9991441
 ⋮
 0.0005576648
 0.16607077
 0.0001469575
 7.55378f-5
 0.00010939565
 0.99980956
 0.0002482798
 0.70105135
 0.0016226117
 0.00013114279
 0.0003866863
 0.0046268953

In [120]:
prediction_rounded = Array{Int64, 1}(map(val -> round(val), prediction))

187-element Vector{Int64}:
 0
 1
 0
 1
 1
 0
 1
 0
 1
 1
 0
 0
 1
 ⋮
 0
 0
 0
 0
 0
 1
 0
 1
 0
 0
 0
 0

In [101]:
import Pkg; Pkg.add("MLBase")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m MLBase ─ v0.9.0
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [f0e99cf1] [39m[92m+ MLBase v0.9.0[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.7/Manifest.toml`
 [90m [f0e99cf1] [39m[92m+ MLBase v0.9.0[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mMLBase
  1 dependency successfully precompiled in 1 seconds (169 already precompiled, 3 skipped during auto due to previous errors)


In [121]:
using MLBase
errorrate(y_test, prediction_rounded)

0.026737967914438502

In [122]:
MLBase.confusmat(2, Array{Int64, 1}(y_test .+1), Array{Int64,1}(prediction_rounded .+1))

2×2 Matrix{Int64}:
 120   4
   1  62

In [124]:
boost = xgboost(dtrain, num_round = 2, 1000, eta = 1 , max_depth = 10, objective = "binary:logistic")
prediction_rounded = Array{Int64, 1}(map(val -> round(val), XGBoost.predict(boost, x_test)))

[1]	train-logloss:0.186739
[2]	train-logloss:0.090753
[3]	train-logloss:0.053734
[4]	train-logloss:0.035809
[5]	train-logloss:0.027452
[6]	train-logloss:0.021554
[7]	train-logloss:0.017867
[8]	train-logloss:0.015599
[9]	train-logloss:0.012860
[10]	train-logloss:0.011259
[11]	train-logloss:0.009873
[12]	train-logloss:0.009150
[13]	train-logloss:0.008803
[14]	train-logloss:0.008533
[15]	train-logloss:0.008328
[16]	train-logloss:0.008059
[17]	train-logloss:0.007796
[18]	train-logloss:0.007525
[19]	train-logloss:0.007355
[20]	train-logloss:0.007127
[21]	train-logloss:0.006959
[22]	train-logloss:0.006815
[23]	train-logloss:0.006647
[24]	train-logloss:0.006532
[25]	train-logloss:0.006415
[26]	train-logloss:0.006326
[27]	train-logloss:0.006224
[28]	train-logloss:0.006145
[29]	train-logloss:0.006067
[30]	train-logloss:0.005999
[31]	train-logloss:0.005933
[32]	train-logloss:0.005890
[33]	train-logloss:0.005825
[34]	train-logloss:0.005786
[35]	train-logloss:0.005737
[36]	train-logloss:0.005695
[

187-element Vector{Int64}:
 0
 1
 0
 1
 1
 0
 1
 0
 1
 1
 0
 0
 1
 ⋮
 0
 0
 0
 0
 0
 1
 0
 1
 0
 0
 0
 0

In [125]:
errorrate(y_test, prediction_rounded)

0.026737967914438502

In [126]:
MLBase.confusmat(2, Array{Int64, 1}(y_test .+1), Array{Int64,1}(prediction_rounded .+1))

2×2 Matrix{Int64}:
 120   4
   1  62

** Vary hyperparameters**

Run 1

dtrain, num_round = 2, 100, eta = 1 , objective = "binary:logistic"

0.026737967914438502

2×2 Matrix{Int64}:

 120   4

   1  62

Run 2





<img src="https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-mask.png" height="100" />