<a href="https://colab.research.google.com/github/PadmarajBhat/Rapids.AI/blob/master/rapids_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4.

In [1]:
!nvidia-smi

Fri Jul 19 15:18:27 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P8    16W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [2]:
import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)
print(device_name)
if device_name != b'Tesla T4':
  raise Exception("""
    Unfortunately this instance does not have a T4 GPU.
    
    Please make sure you've configured Colab to request a GPU instance type.
    
    Sometimes Colab allocates a Tesla K80 instead of a T4. Resetting the instance.

    If you get a K80 GPU, try Runtime -> Reset all runtimes...
  """)
else:
  print('Woo! You got the right kind of GPU!')

b'Tesla T4'
Woo! You got the right kind of GPU!


#Setup:

1. Install most recent Miniconda release compatible with Google Colab's Python install  (3.6.7)
2. Install RAPIDS libraries
3. Set necessary environment variables
4. Copy RAPIDS .so files into current working directory, a workaround for conda/colab interactions

In [0]:
# intall miniconda
!wget -c https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

# install RAPIDS packages
!conda install -q -y --prefix /usr/local -c conda-forge \
  -c rapidsai-nightly/label/cuda10.0 -c nvidia/label/cuda10.0 \
  cudf cuml

!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
!conda install -q -y --prefix /usr/local -c nvidia -c rapidsai \
  -c numba -c conda-forge -c defaults nvstrings=0.8 python=3.6 cudatoolkit=10.0

# set environment vars
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

# copy .so files to current working dir
for fn in ['libcudf.so', 'librmm.so']:
  shutil.copy('/usr/local/lib/'+fn, os.getcwd())

--2019-07-19 15:18:37--  https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.200.79, 104.18.201.79, 2606:4700::6812:c94f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.200.79|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58468498 (56M) [application/x-sh]
Saving to: ‘Miniconda3-4.5.4-Linux-x86_64.sh’


2019-07-19 15:18:38 (187 MB/s) - ‘Miniconda3-4.5.4-Linux-x86_64.sh’ saved [58468498/58468498]

PREFIX=/usr/local
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-hdf63c60_3 ...
installing: libstdcxx-ng-7.2.0-hdf63c60_3 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-hf484d3e_0 ...
installing: openssl-1.0.2o-h20670df_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7-

# cuDF and cuML Examples #

Now you can run code! 

What follows are basic examples where all processing takes place on the GPU.

#[cuDF](https://github.com/rapidsai/cudf)#

Load a dataset into a GPU memory resident DataFrame and perform a basic calculation.

Everything from CSV parsing to calculating tip percentage and computing a grouped average is done on the GPU.

_Note_: You must import nvstrings and nvcategory before cudf, else you'll get errors.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [9]:
!ls -l "/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv"

-rw------- 1 root root 5493629 Jul 16 23:19 '/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv'


In [10]:
import nvstrings, nvcategory, cudf
import io, requests
tips_df = cudf.read_csv("/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv")
tips_df.head()

<cudf.DataFrame ncols=40 nrows=5 >

In [11]:
print(tips_df.head(), tips_df.shape, tips_df.columns)

   county        city  zipcode                            address  state  rent            latitude ...  HomePrice
0    None     WEXFORD            266 Clematis Dr Allegheny County     PA  2400             40.6182 ...     158051
1    None   WHITEHALL                2310 N 1st Ave Lehigh County     PA   995           40.649906 ...     158051
2    None   WHITEHALL           3338 St Stephens Ln Lehigh County     PA  1740           40.646282 ...     158051
3    None  WAYNESBORO                97 W Main St Franklin County     PA   675  39.756992000000004 ...     158051
4    None  QUAKERTOWN                 200 E Broad St Bucks County     PA  1300  40.441176999999996 ...     158051
[32 more columns] (18203, 40) Index(['county', 'city', 'zipcode', 'address', 'state', 'rent', 'latitude',
       'longitude', 'cemetery_dist_miles', 'nationalhighway_miles',
       'railline_miles', 'starbucks_miles', 'walmart_miles', 'hospital_miles',
       'physician_dist_miles', 'dentist_dist_miles', 'opt_dist_

In [12]:
print(tips_df[tips_df.columns[:-1]])

    county         city  zipcode                            address  state  rent            latitude ...  Crime_Rate
0    None      WEXFORD            266 Clematis Dr Allegheny County     PA  2400             40.6182 ...         2.4
1    None    WHITEHALL                2310 N 1st Ave Lehigh County     PA   995           40.649906 ...         2.4
2    None    WHITEHALL           3338 St Stephens Ln Lehigh County     PA  1740           40.646282 ...         2.4
3    None   WAYNESBORO                97 W Main St Franklin County     PA   675  39.756992000000004 ...         2.4
4    None   QUAKERTOWN                 200 E Broad St Bucks County     PA  1300  40.441176999999996 ...         2.4
5    None   WAYNESBORO           407 Viewpoint Way Franklin County     PA  1025  39.766594000000005 ...         2.4
6    None   WAYNESBORO           403 Viewpoint Way Franklin County     PA  1025  39.766580000000005 ...         2.4
7    None   WAYNESBORO                240 Crown Ct Franklin County     

In [13]:
print(tips_df.groupby('city').HomePrice.mean().reset_index())

              city           HomePrice
0        ABINGTON   165602.2857142857
1        AIRVILLE            158051.0
2           AKRON  170204.57142857142
3  ALBRIGHTSVILLE  111965.84615384616
4        ALBURTIS  194317.18181818182
5       ALIQUIPPA   90321.06666666667
6       ALLENTOWN  144091.08250825084
7    ALLISON PARK  185304.86956521738
8         ALTOONA  141058.81818181818
9          AMBLER           259464.06
[664 more rows]


In [14]:
from cuml import SGD

sgd = SGD(eta0=0.1)

result_sgd = sgd.fit(tips_df[tips_df.columns[:-1]], tips_df[tips_df.columns[-1]])

ValueError: ignored

In [15]:
import numpy as np
#print(filter(tips_df.columns,(tips_df.dtypes == np.float64)))
#print(tips_df.select_dtypes(np.number).dtypes)

tips_numeric = tips_df.select_dtypes(include=np.float64).fillna(0)
print(tips_numeric.dtypes)
result_sgd = sgd.fit(tips_numeric[tips_numeric.columns[:-1]], tips_numeric[tips_numeric.columns[-1]])

zipcode                  float64
latitude                 float64
longitude                float64
cemetery_dist_miles      float64
nationalhighway_miles    float64
railline_miles           float64
starbucks_miles          float64
walmart_miles            float64
hospital_miles           float64
physician_dist_miles     float64
dentist_dist_miles       float64
opt_dist_miles           float64
vet_dist_miles           float64
farmers_miles            float64
time                     float64
lotsize                  float64
Census_MedianIncome      float64
CollegeGrads             float64
WhiteCollar              float64
Schools                  float64
Unemployment             float64
EmploymentDiversity      float64
Census_Vacancy           float64
Crime_Rate               float64
dtype: object


Let us see the to and from Pandas trasformations; but it is not so clear if the pandas will be on local memory or on distributed memory. As in what if the data is huge for a cluster node.

In [0]:
pdf = tips_numeric.to_pandas()

In [23]:
cudf.from_pandas(pdf)

<cudf.DataFrame ncols=25 nrows=18203 >

Apperantly, there is one more df called as dask_df which is for the distributed df computing. And this answers our last question. But how do we configure it ?

In [2]:
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
!conda install dask

bash: ./Miniconda3-4.5.4-Linux-x86_64.sh: No such file or directory
/bin/bash: conda: command not found


In [24]:
import dask_cudf

ModuleNotFoundError: ignored

https://docs.dask.org/en/latest/install.html - for installation and other doc

#[cuML](https://github.com/rapidsai/cuml)#

This snippet loads a 

As above, all calculations are performed on the GPU.

# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-extended