Update build instructions, improve memory usage (#1811)

dmlc · Nov 25, 2016 · be2f28e · be2f28e
1 parent 80c8515
commit be2f28e
Show file tree

Hide file tree

Showing 10 changed files with 653 additions and 573 deletions.
diff --git a/plugin/updater_gpu/README.md b/plugin/updater_gpu/README.md
@@ -3,29 +3,62 @@
 ## Usage
 Specify the updater parameter as 'grow_gpu'. 
 
+This plugin currently works with the CLI version and python version.
+
 Python example:
 ```python
 param['updater'] = 'grow_gpu'
 ```
 
+## Memory usage
+Device memory usage can be calculated as approximately:
+```
+bytes = (10 x n_rows) + (44 x n_rows x n_columns x column_density)
+```
+Data is stored in a sparse format. For example, missing values produced by one hot encoding are not stored. If a one hot encoding separates a categorical variable into 5 columns the column_density of these columns is 1/5 = 0.2.
+
+A 4GB graphics card will process approximately 3.5 million rows of the well known Kaggle higgs dataset.
+
+The algorithm will automatically perform row subsampling if it detects there is not enough memory on the device.
+
 ## Dependencies
 A CUDA capable GPU with at least compute capability >= 3.5 (the algorithm depends on shuffle and vote instructions introduced in Kepler).
 
+Building the plug-in requires CUDA Toolkit 7.5 or later.
+
 The plugin also depends on CUB 1.5.4 - http://nvlabs.github.io/cub/index.html.
 
 CUB is a header only cuda library which provides sort/reduce/scan primitives.
 
 
 ## Build
-The plugin can be built using cmake and specifying the option PLUGIN_UPDATER_GPU=ON.
+To use the plugin xgboost must be built using cmake specifying the option PLUGIN_UPDATER_GPU=ON. The location of the CUB library must also be specified with the cmake variable CUB_DIRECTORY. CMake will prepare a build system depending on which platform you are on.
 
-Specify the location of the CUB library with the cmake variable CUB_DIRECTORY.
+From the command line on Windows or Linux starting from the xgboost directory:
 
-It is recommended to build with Cuda Toolkit 7.5 or greater.
+```bash
+$ mkdir build
+$ cd build
+$ cmake .. -DPLUGIN_UPDATER_GPU=ON -DCUB_DIRECTORY=<MY_CUB_DIRECTORY>
+```
+
+On Windows you may also need to specify your generator as 64 bit, so the cmake command becomes:
+```bash
+$ cmake .. -G"Visual Studio 12 2013 Win64" -DPLUGIN_UPDATER_GPU=ON -DCUB_DIRECTORY=<MY_CUB_DIRECTORY>
+```
+You may also  be able to use a later version of visual studio depending on whether the CUDA toolkit supports it.
+
+On an linux cmake will generate a Makefile in the build directory. Invoking the command 'make' from this directory will build the project. If the build fails try invoking make again. There can sometimes be problems with the order items are built.
+
+On Windows cmake will generate an xgboost.sln solution file in the build directory. Build this solution in release mode. This is also a good time to check it is being built as x64. If not make sure the cmake generator is set correctly.
+
+The build process generates an xgboost library and executable as normal but containing the GPU tree construction algorithm.
 
 ## Author
 Rory Mitchell 
 
 Report any bugs to r.a.mitchell.nz at google mail.
 
 
+
+
diff --git a/plugin/updater_gpu/speed_test.py b/plugin/updater_gpu/speed_test.py
@@ -4,7 +4,6 @@
 import numpy as np
 import xgboost as xgb
 import time
-test_size = 550000
 
 # path to where the data lies
 dpath = '../../demo/data'
@@ -13,6 +12,9 @@
 dtrain = np.loadtxt( dpath+'/training.csv', delimiter=',', skiprows=1, converters={32: lambda x:int(x=='s') } )
 dtrain = np.concatenate((dtrain, np.copy(dtrain)))
 dtrain = np.concatenate((dtrain, np.copy(dtrain)))
+dtrain = np.concatenate((dtrain, np.copy(dtrain)))
+test_size = len(dtrain)
+
 print(len(dtrain))
 print ('finish loading from csv ')
 
@@ -37,10 +39,9 @@
 # scale weight of positive examples
 param['scale_pos_weight'] = sum_wneg/sum_wpos
 param['bst:eta'] = 0.1
-param['max_depth'] = 16
+param['max_depth'] = 15
 param['eval_metric'] = 'auc'
-param['silent'] = 1
-param['nthread'] = 4
+param['nthread'] = 16
 
 plst = param.items()+[('eval_metric', 'ams@0.15')]