<a href="https://colab.research.google.com/github/MoronSlayer/Deep-Learning-Projects/blob/learner/rapids_colab_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [1]:
!nvidia-smi

Thu Sep 29 22:45:10 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#Setup:
Set up script installs
1. Updates gcc in Colab
1. Installs Conda
1. Install RAPIDS' current stable version of its libraries, as well as some external libraries including:
  1. cuDF
  1. cuML
  1. cuGraph
  1. cuSpatial
  1. cuSignal
  1. BlazingSQL
  1. xgboost
1. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [3]:
!pip install pynvml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pynvml
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 3.8 MB/s 
[?25hInstalling collected packages: pynvml
Successfully installed pynvml-11.4.1


In [4]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
***********************************************************************



In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

In [1]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:38
🔁 Restarting kernel...


In [1]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

# RAPIDS is now installed on Colab.  You can copy your code into the cells below.  Enjoy!

# KNN using RAPIDS 
> Link to git: https://github.com/rapidsai/cuml/blob/branch-0.15/notebooks/nearest_neighbors_demo.ipynb

In [3]:
import cudf
import numpy as np
from cuml.datasets import make_blobs
from cuml.neighbors import NearestNeighbors as cuNearestNeighbors
from sklearn.neighbors import NearestNeighbors as skNearestNeighbors

### Defining parameters

In [4]:
n_samples = 2**17
n_features = 40

n_query = 2**13
n_neighbors = 4
random_state = 0

### Genrating data

In [5]:
%%time
device_data, _ = make_blobs(n_samples=n_samples,
                            n_features=n_features,
                            centers=5,
                            random_state=random_state)

device_data = cudf.DataFrame(device_data)

CPU times: user 5.03 s, sys: 979 ms, total: 6.01 s
Wall time: 7.99 s


cuML model

In [7]:
%%time
knn_cuml = cuNearestNeighbors()
knn_cuml.fit(device_data)

CPU times: user 243 ms, sys: 1.91 ms, total: 245 ms
Wall time: 249 ms


NearestNeighbors()

In [8]:
%%time
D_cuml, I_cuml = knn_cuml.kneighbors(device_data[:n_query], n_neighbors)

CPU times: user 1.12 s, sys: 485 ms, total: 1.61 s
Wall time: 3.16 s


In [9]:
# Copy dataset from GPU memory to host memory.
# This is done to later compare CPU and GPU results.
host_data = device_data.to_pandas()

### Scikit-learn model

In [10]:
%%time
knn_sk = skNearestNeighbors(algorithm="brute",
                            n_jobs=-1)
knn_sk.fit(host_data)

CPU times: user 9.55 ms, sys: 866 µs, total: 10.4 ms
Wall time: 27.4 ms


NearestNeighbors(algorithm='brute', n_jobs=-1)

In [11]:
%%time
D_sk, I_sk = knn_sk.kneighbors(host_data[:n_query], n_neighbors)

CPU times: user 56.8 s, sys: 8.26 s, total: 1min 5s
Wall time: 47.2 s




---



---



---



In [12]:
from sklearn import datasets
X, y  = datasets.make_classification(n_samples=40000)

In [13]:
X = X.astype(np.float32)
y = y.astype(np.float32)

In [14]:
def train_data(model, X=X, y=y):
    clf = model
    clf.fit(X, y)

In [15]:
from sklearn.svm import SVC 
from cuml.svm import SVC as SVC_gpu

clf_svc = SVC(kernel='poly', degree=2, gamma='auto', C=1)
sklearn_time_svc = %timeit -o train_data(clf_svc)

clf_svc = SVC_gpu(kernel='poly', degree=2, gamma='auto', C=1)
cuml_time_svc = %timeit -o train_data(clf_svc)

print(f"""Average time of sklearn's {clf_svc.__class__.__name__}""", sklearn_time_svc.average, 's')
print(f"""Average time of cuml's {clf_svc.__class__.__name__}""", cuml_time_svc.average, 's')

print('Ratio between sklearn and cuml is', sklearn_time_svc.average/cuml_time_svc.average)

1min 18s ± 3.58 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.48 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Average time of sklearn's SVC 78.65281007214298 s
Average time of cuml's SVC 2.4763528024286643 s
Ratio between sklearn and cuml is 31.761552713734826


In [16]:
!pip install cutecharts

import cutecharts.charts as ctc 

def plot(sklearn_time, cuml_time):

    chart = ctc.Bar('Sklearn vs cuml')
    chart.set_options(
        labels=['sklearn', 'cuml'],
        x_label='library',
        y_label='time (s)',
        )

    chart.add_series('time', data=[round(sklearn_time.average,2), round(cuml_time.average,2)])
    return chart

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cutecharts
  Downloading cutecharts-1.2.0-py3-none-any.whl (17 kB)
Installing collected packages: cutecharts
Successfully installed cutecharts-1.2.0
[0m

In [20]:
plot(sklearn_time_svc, cuml_time_svc).render_notebook()

In [21]:
from cuml.neighbors import (KNeighborsClassifier as KNeighborsClassifier_gpu,
                            KNeighborsRegressor as KNeighborsRegressor_gpu)
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor


In [25]:
clf_nn = KNeighborsClassifier(n_neighbors=10)
sklearn_time_nn = %timeit -o train_data(clf_nn)


3.22 ms ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [26]:
clf_nn = KNeighborsClassifier_gpu(n_neighbors=10)
cuml_time_nn = %timeit -o train_data(clf_nn)

2.47 ms ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [27]:
print(f"""Average time of sklearn's {clf_nn.__class__.__name__}""", sklearn_time_nn.average, 's')
print(f"""Average time of cuml's {clf_nn.__class__.__name__}""", cuml_time_nn.average, 's')

print('Ratio between sklearn and cuml is', sklearn_time_nn.average/cuml_time_nn.average)

Average time of sklearn's KNeighborsClassifier 0.003222532555714679 s
Average time of cuml's KNeighborsClassifier 0.002467951844285251 s
Ratio between sklearn and cuml is 1.3057517970525732


In [28]:
plot(sklearn_time_nn, cuml_time_nn).render_notebook()

# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-contrib