Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R 4.0 (Fri, 20 Mar 2020) #1

Open
Laurae2 opened this issue Mar 28, 2020 · 2 comments
Open

R 4.0 (Fri, 20 Mar 2020) #1

Laurae2 opened this issue Mar 28, 2020 · 2 comments

Comments

@Laurae2
Copy link
Owner

Laurae2 commented Mar 28, 2020

Latest news: https://developer.r-project.org/blosxom.cgi/R-devel/NEWS

To install R-devel 4.0 from 20 March 2020 or later, install libpcre2-dev with the following:

sudo apt-get install libpcre2-dev

image

Else error: pcre2 library and headers are required

@Laurae2
Copy link
Owner Author

Laurae2 commented Mar 28, 2020

Use the following to install xgboost with R 4.0 and Intel MKL:

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
mkdir build
cd build
cmake .. -DR_LIB=ON -DUSE_CUDA=ON -DCMAKE_C_COMPILER=/usr/bin/gcc-7 -DCMAKE_CXX_COMPILER=/usr/bin/g++-7 -DUSE_NCCL=ON -DNCCL_ROOT=/usr/lib/x86_64-linux-gnu
make -j 36
make install -I/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/
sudo R CMD INSTALL ./R-package

Test case:

library(xgboost)

set.seed(1)
N <- 500000
p <- 100
pp <- 25
X <- matrix(runif(N * p), ncol = p)
betas <- 2 * runif(pp) - 1
sel <- sort(sample(p, pp))
m <- X[, sel] %*% betas - 1 + rnorm(N)
y <- rbinom(N, 1, plogis(m))

tr <- sample.int(N, N * 0.90)

trainer <- function(n_cpus, n_gpus, n_iterations) {
  
  dtrain <- xgb.DMatrix(X[tr,], label = y[tr])
  dtest <- xgb.DMatrix(X[-tr,], label = y[-tr])
  wl <- list(test = dtest)
  
  if (n_gpus == 0) {
    
    pt <- proc.time()
    model <- xgb.train(list(objective = "reg:logistic", eval_metric = "logloss", subsample = 0.8, nthread = n_cpus, eta = 0.10,
                            max_bin = 64, tree_method = "hist"),
                       dtrain, watchlist = wl, nrounds = n_iterations)
    my_time <- proc.time() - pt
    
  } else {
    
    pt <- proc.time()
    model <- xgb.train(list(objective = "reg:logistic", eval_metric = "logloss", subsample = 0.8, nthread = n_cpus, eta = 0.10,
                            max_bin = 64, tree_method = "gpu_hist", n_gpus = n_gpus),
                       dtrain, watchlist = wl, nrounds = n_iterations)
    my_time <- proc.time() - pt
    
  }
  
  rm(model, dtrain, dtest)
  gc(verbose = FALSE)
  
  return(my_time)
  
}

trainer(1, 1, 50)
trainer(1, 0, 50)

Results:

R Under development (unstable) (2020-03-27 r78091) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(xgboost)
> 
> set.seed(1)
> N <- 500000
> p <- 100
> pp <- 25
> X <- matrix(runif(N * p), ncol = p)
> betas <- 2 * runif(pp) - 1
> sel <- sort(sample(p, pp))
> m <- X[, sel] %*% betas - 1 + rnorm(N)
> y <- rbinom(N, 1, plogis(m))
> 
> tr <- sample.int(N, N * 0.90)
> 
> trainer <- function(n_cpus, n_gpus, n_iterations) {
+   
+   dtrain <- xgb.DMatrix(X[tr,], label = y[tr])
+   dtest <- xgb.DMatrix(X[-tr,], label = y[-tr])
+   wl <- list(test = dtest)
+   
+   if (n_gpus == 0) {
+     
+     pt <- proc.time()
+     model <- xgb.train(list(objective = "reg:logistic", eval_metric = "logloss", subsample = 0.8, nthread = n_cpus, eta = 0.10,
+                             max_bin = 64, tree_method = "hist"),
+                        dtrain, watchlist = wl, nrounds = n_iterations)
+     my_time <- proc.time() - pt
+     
+   } else {
+     
+     pt <- proc.time()
+     model <- xgb.train(list(objective = "reg:logistic", eval_metric = "logloss", subsample = 0.8, nthread = n_cpus, eta = 0.10,
+                             max_bin = 64, tree_method = "gpu_hist", n_gpus = n_gpus),
+                        dtrain, watchlist = wl, nrounds = n_iterations)
+     my_time <- proc.time() - pt
+     
+   }
+   
+   rm(model, dtrain, dtest)
+   gc(verbose = FALSE)
+   
+   return(my_time)
+   
+ }
> 
> trainer(1, 1, 50)
[18:44:17] WARNING: /home/laurae/Downloads/R/xgboost/include/xgboost/generic_parameters.h:36: 
n_gpus: 
	Deprecated. Single process multi-GPU training is no longer supported.
	Please switch to distributed training with one process per GPU.
	This can be done using Dask or Spark.  See documentation for details.
[1]	test-logloss:0.687465 
[2]	test-logloss:0.682720 
[3]	test-logloss:0.678590 
[4]	test-logloss:0.675024 
[5]	test-logloss:0.671988 
[6]	test-logloss:0.669297 
[7]	test-logloss:0.666885 
[8]	test-logloss:0.664741 
[9]	test-logloss:0.662744 
[10]	test-logloss:0.660938 
[11]	test-logloss:0.659223 
[12]	test-logloss:0.657735 
[13]	test-logloss:0.656362 
[14]	test-logloss:0.655067 
[15]	test-logloss:0.653892 
[16]	test-logloss:0.652753 
[17]	test-logloss:0.651746 
[18]	test-logloss:0.650809 
[19]	test-logloss:0.649926 
[20]	test-logloss:0.649096 
[21]	test-logloss:0.648273 
[22]	test-logloss:0.647593 
[23]	test-logloss:0.646822 
[24]	test-logloss:0.646082 
[25]	test-logloss:0.645458 
[26]	test-logloss:0.644867 
[27]	test-logloss:0.644329 
[28]	test-logloss:0.643732 
[29]	test-logloss:0.643188 
[30]	test-logloss:0.642666 
[31]	test-logloss:0.642239 
[32]	test-logloss:0.641737 
[33]	test-logloss:0.641328 
[34]	test-logloss:0.640816 
[35]	test-logloss:0.640435 
[36]	test-logloss:0.640006 
[37]	test-logloss:0.639663 
[38]	test-logloss:0.639272 
[39]	test-logloss:0.638916 
[40]	test-logloss:0.638568 
[41]	test-logloss:0.638268 
[42]	test-logloss:0.637945 
[43]	test-logloss:0.637617 
[44]	test-logloss:0.637404 
[45]	test-logloss:0.637157 
[46]	test-logloss:0.636902 
[47]	test-logloss:0.636716 
[48]	test-logloss:0.636500 
[49]	test-logloss:0.636262 
[50]	test-logloss:0.636045 
   user  system elapsed 
  3.687   0.852   4.666 
> trainer(1, 0, 50)
[1]	test-logloss:0.687480 
[2]	test-logloss:0.682716 
[3]	test-logloss:0.678664 
[4]	test-logloss:0.675134 
[5]	test-logloss:0.672144 
[6]	test-logloss:0.669425 
[7]	test-logloss:0.666906 
[8]	test-logloss:0.664728 
[9]	test-logloss:0.662718 
[10]	test-logloss:0.661065 
[11]	test-logloss:0.659484 
[12]	test-logloss:0.657958 
[13]	test-logloss:0.656504 
[14]	test-logloss:0.655166 
[15]	test-logloss:0.654018 
[16]	test-logloss:0.652931 
[17]	test-logloss:0.651872 
[18]	test-logloss:0.650910 
[19]	test-logloss:0.650024 
[20]	test-logloss:0.649156 
[21]	test-logloss:0.648393 
[22]	test-logloss:0.647713 
[23]	test-logloss:0.646957 
[24]	test-logloss:0.646182 
[25]	test-logloss:0.645562 
[26]	test-logloss:0.644967 
[27]	test-logloss:0.644405 
[28]	test-logloss:0.643904 
[29]	test-logloss:0.643418 
[30]	test-logloss:0.642868 
[31]	test-logloss:0.642303 
[32]	test-logloss:0.641841 
[33]	test-logloss:0.641390 
[34]	test-logloss:0.640920 
[35]	test-logloss:0.640521 
[36]	test-logloss:0.640097 
[37]	test-logloss:0.639677 
[38]	test-logloss:0.639321 
[39]	test-logloss:0.638976 
[40]	test-logloss:0.638593 
[41]	test-logloss:0.638342 
[42]	test-logloss:0.637964 
[43]	test-logloss:0.637667 
[44]	test-logloss:0.637394 
[45]	test-logloss:0.637112 
[46]	test-logloss:0.636879 
[47]	test-logloss:0.636631 
[48]	test-logloss:0.636388 
[49]	test-logloss:0.636186 
[50]	test-logloss:0.635976 
   user  system elapsed 
 19.236   0.167  19.412

@Laurae2
Copy link
Owner Author

Laurae2 commented Mar 28, 2020

Use the following to install LightGBM with R 4.0:

git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
nano ./R-package/src/install.libs.R
[change `use_GPU <- FALSE` to `use_GPU <- TRUE`, Ctrl+X + Y + Enter]
sudo Rscript build_r.R

Test case:

library(lightgbm)
library(Matrix)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
train$data[, 1] <- 1:6513
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
valids <- list(test = dtest)

params <- list(objective = "regression",
               metric = "rmse",
               device = "gpu",
               gpu_platform_id = 0,
               gpu_device_id = 0,
               nthread = 1,
               boost_from_average = FALSE,
               max_bin = 32)
model <- lgb.train(params,
                   dtrain,
                   2,
                   valids,
                   min_data = 1,
                   learning_rate = 1,
                   early_stopping_rounds = 10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant