# Gradient Boosting: CPU vs GPU
This is a basic tuturoal which shows how to run gradient boosting on CPU and GPU on Google Colaboratory. It will give you an opportunity to see the speedup that you get from GPU training. The speedup is large even on Tesla K80 that is available in Colaboratory. On newer generations of GPU the speedup will be much bigger.

We will use CatBoost gradient boosting library, which is known for it's good GPU performance.
  
 You could try it out on Colaboratory, just pressing on the following badge:  
 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catboost/tutorials/blob/master/tools/google_colaboratory_cpu_vs_gpu_tutorial.ipynb) 

## Set GPU as hardware accelerator
First of all, you need to select GPU as hardware accelerator. There are two simple steps to do so:  
Step 1. Navigate to 'Runtime' menu and select 'Change runtime type'  
Step 2. Choose GPU as hardware accelerator.  
That's all!

## Importing CatBoost

Next big thing is to import CatBoost inside environment. Colaboratory has built in libraries installed and most libraries can be installed quickly with a simple *!pip install* command.  
Please ignore the warning message about already imported enum package. Furthermore take note that you need to re-import the library every time you start a new session of Colab.

In [1]:
!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/98/03/777a0e1c12571a7f3320a4fa6d5f123dba2dd7c0bca34f4f698a6396eb48/catboost-0.12.2-cp36-none-manylinux1_x86_64.whl (55.5MB)
[K    100% |████████████████████████████████| 55.5MB 968kB/s 
Collecting enum34 (from catboost)
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Installing collected packages: enum34, catboost
Successfully installed catboost-0.12.2 enum34-1.1.6


##Download and prepare dataset
The next step is dataset downloading. GPU training is useful for large datsets. You will get a good speedup starting from 10k objects and the more objects you have, the more will be the speedup.
Because of that reason we have selected a large dataset - [Epsilon](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) (500.000 documents and 2.000 features) for this tutorial.
Firstly, we will get the data through catboost.datasets module. The code below does this. It will run for approximately 10-15 minutes. So please be patient :)

In [0]:
from catboost.datasets import epsilon

train, test = epsilon()

X_train, y_train = train.iloc[:,1:], train[0]
X_test, y_test = train.iloc[:,1:], train[0]

## Training on CPU
Now we will train the model on CPU and measure execution time.
We will use 100 iterations for our CPU training since otherwise it will take a long time.
It will take around 15 minutes.

In [4]:
from catboost import CatBoostClassifier
import timeit

def train_on_cpu():  
  model = CatBoostClassifier(
    iterations=100,
    learning_rate=0.03,
    eval_metric='Accuracy',
  )
  
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=30
  );   
      
cpu_time = timeit.timeit('train_on_cpu()', setup="from __main__ import train_on_cpu", number=1)

print('Time to fit model on CPU: {} sec'.format(int(cpu_time)))

0:	learn: 0.6648425	test: 0.6648425	best: 0.6648425 (0)	total: 7.12s	remaining: 11m 45s
30:	learn: 0.7375125	test: 0.7375125	best: 0.7375125 (30)	total: 3m 34s	remaining: 7m 56s
60:	learn: 0.7686725	test: 0.7686725	best: 0.7686725 (60)	total: 6m 48s	remaining: 4m 21s
90:	learn: 0.7885500	test: 0.7885500	best: 0.7885500 (90)	total: 9m 58s	remaining: 59.2s
99:	learn: 0.7932025	test: 0.7932025	best: 0.7932025 (99)	total: 10m 55s	remaining: 0us

bestTest = 0.7932025
bestIteration = 99

Time to fit model on CPU: 862 sec


Take notice that learning time itself wothout data feeding is around 12 minutes. Whereas all the process consumes 14-15 min.

## Training on GPU
The previous code execution has been done on CPU. It's time to use GPU!  
We need to use '*task_type='GPU'*' parameter value to run GPU training. Now the execution time wouldn't be so big :)  
BTW if Colaboratory shows you a warning 'GPU memory usage is close to the limit', just press 'Ignore'.

In [5]:
def train_on_gpu():  
  model = CatBoostClassifier(
    iterations=100,
    learning_rate=0.03,
    eval_metric='Accuracy',
    task_type='GPU'
  )
  
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=30
  );     
      
gpu_time = timeit.timeit('train_on_gpu()', setup="from __main__ import train_on_gpu", number=1)

print('Time to fit model on GPU: {} sec'.format(int(gpu_time)))
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

0:	learn: 0.6652575	test: 0.6652575	best: 0.6652575 (0)	total: 323ms	remaining: 32s
30:	learn: 0.7377725	test: 0.7377725	best: 0.7377725 (30)	total: 6.44s	remaining: 14.3s
60:	learn: 0.7700775	test: 0.7700775	best: 0.7700775 (60)	total: 12.2s	remaining: 7.81s
90:	learn: 0.7893275	test: 0.7893275	best: 0.7893275 (90)	total: 17.9s	remaining: 1.77s
99:	learn: 0.7937800	test: 0.7937800	best: 0.7937800 (99)	total: 19.5s	remaining: 0us
bestTest = 0.79378
bestIteration = 99
Time to fit model on GPU: 195 sec
GPU speedup over CPU: 4x


As you can see GPU is much faster than CPU on large datasets. It takes just 3-4 mins vs 14-15 mins to fit the model. Moreover learning process consumes just 30 seconds vs 12 minutes! This is a good reason to use GPU instead of CPU!
  
Thank you for attention! 