<a href="https://colab.research.google.com/github/PadmarajBhat/Rapids.AI/blob/master/rapids_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4.

In [0]:
!nvidia-smi

Sun Jul 21 14:47:30 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   52C    P8    16W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [0]:
import pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)
print(device_name)
if device_name != b'Tesla T4':
  raise Exception("""
    Unfortunately this instance does not have a T4 GPU.
    
    Please make sure you've configured Colab to request a GPU instance type.
    
    Sometimes Colab allocates a Tesla K80 instead of a T4. Resetting the instance.

    If you get a K80 GPU, try Runtime -> Reset all runtimes...
  """)
else:
  print('Woo! You got the right kind of GPU!')

b'Tesla T4'
Woo! You got the right kind of GPU!


#Setup:

1. Install most recent Miniconda release compatible with Google Colab's Python install  (3.6.7)
2. Install RAPIDS libraries
3. Set necessary environment variables
4. Copy RAPIDS .so files into current working directory, a workaround for conda/colab interactions

In [0]:
# intall miniconda
!wget -c https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

# install RAPIDS packages
!conda install -q -y --prefix /usr/local -c conda-forge \
  -c rapidsai-nightly/label/cuda10.0 -c nvidia/label/cuda10.0 \
  cudf cuml

!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
!conda install -q -y --prefix /usr/local -c nvidia -c rapidsai \
  -c numba -c conda-forge -c defaults nvstrings=0.8 python=3.6 cudatoolkit=10.0

!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
!conda install  -q -y --prefix /usr/local dask

# set environment vars
import sys, os, shutil
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'

# copy .so files to current working dir
for fn in ['libcudf.so', 'librmm.so']:
  shutil.copy('/usr/local/lib/'+fn, os.getcwd())

--2019-07-21 14:47:39--  https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58468498 (56M) [application/x-sh]
Saving to: ‘Miniconda3-4.5.4-Linux-x86_64.sh’


2019-07-21 14:47:40 (95.8 MB/s) - ‘Miniconda3-4.5.4-Linux-x86_64.sh’ saved [58468498/58468498]

PREFIX=/usr/local
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-hdf63c60_3 ...
installing: libstdcxx-ng-7.2.0-hdf63c60_3 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-hf484d3e_0 ...
installing: openssl-1.0.2o-h20670df_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
!ls -l "/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv"

-rw------- 1 root root 5493629 Jul 16 23:19 '/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv'


In [0]:
import dask.dataframe as dd
import numpy as np
tips_df = dd.read_csv("/content/drive/My Drive/Colab Notebooks/TrainData_PA.csv", assume_missing=True)
tips_df.head()

Unnamed: 0,county,city,zipcode,address,state,rent,latitude,longitude,cemetery_dist_miles,nationalhighway_miles,railline_miles,starbucks_miles,walmart_miles,hospital_miles,physician_dist_miles,dentist_dist_miles,opt_dist_miles,vet_dist_miles,farmers_miles,time,bed,bath,halfbath,sqft,property_type,garage,yearbuilt,pool,fireplace,patio,lotsize,Census_MedianIncome,CollegeGrads,WhiteCollar,Schools,Unemployment,EmploymentDiversity,Census_Vacancy,Crime_Rate,HomePrice
0,,WEXFORD,,266 Clematis Dr Allegheny County,PA,2400.0,40.6182,-80.0776,1.019586,0.206222,0.629888,1.348776,3.326397,1.584675,0.229126,0.472933,0.651244,7.323725,1.094678,2016.25,3.0,2.0,1.0,2000.0,Condo,1.0,2008.0,0.0,1.0,0.0,4086.388045,54476.09,21.0,66.57,48.3,5.1,3.48,3.42,2.4,158051.0
1,,WHITEHALL,,2310 N 1st Ave Lehigh County,PA,995.0,40.649906,-75.47894,1.019586,0.206222,0.629888,1.348776,3.326397,1.584675,0.229126,0.472933,0.651244,7.323725,1.094678,2016.25,2.0,1.0,1.0,1100.0,Condo,0.0,1935.0,0.0,0.0,0.0,2247.513425,54476.09,21.0,66.57,48.3,5.1,3.48,3.42,2.4,158051.0
2,,WHITEHALL,,3338 St Stephens Ln Lehigh County,PA,1740.0,40.646282,-75.510056,1.019586,0.206222,0.629888,1.348776,3.326397,1.584675,0.229126,0.472933,0.651244,7.323725,1.094678,2015.75,3.0,2.0,1.0,1522.0,Condo,0.0,2006.0,0.0,1.0,1.0,3109.741302,54476.09,21.0,66.57,48.3,5.1,3.48,3.42,2.4,158051.0
3,,WAYNESBORO,,97 W Main St Franklin County,PA,675.0,39.756992,-77.579704,1.019586,0.206222,0.629888,1.348776,3.326397,1.584675,0.229126,0.472933,0.651244,7.323725,1.094678,2016.25,3.0,1.0,1.0,1150.0,Condo,0.0,1960.0,0.0,0.0,0.0,2349.673126,54476.09,21.0,66.57,48.3,5.1,3.48,3.42,2.4,158051.0
4,,QUAKERTOWN,,200 E Broad St Bucks County,PA,1300.0,40.441177,-75.33254,1.019586,0.206222,0.629888,1.348776,3.326397,1.584675,0.229126,0.472933,0.651244,7.323725,1.094678,2016.25,3.0,2.0,1.0,1000.0,SFR,0.0,1960.0,0.0,0.0,0.0,2043.194023,54476.09,21.0,66.57,48.3,5.1,3.48,3.42,2.4,158051.0


https://docs.dask.org/en/latest/install.html - for installation and other doc

In [0]:
print(list(tips_df['zipcode'].compute()))

[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 17325.0, 17350.0, 17350.0, 17325.0, 17325.0, 17344.0, 17325.0, 17325.0, 17350.0, 17325.0, 17320.0, 17325.0, 17325.0, 17353.0, 17325.0, 17325.0, 17325.0, 17316.0, 17325.0, 17325.0, 17316.0, 17325.0, 17344.0, 17325.0, 17372.0, 17320.0, 17303.0, 17325.0, 17325.0, 17303.0, 17325.0, 17325.0, 17325.0, 17340.0, 17325.0, 17372.0, 17353.0, 17340.0, 17325.0, 17340.0, 17325.0, 17340.0, 17303.0, 17316.0, 17303.0, 17303.0, 17340.0, 17325.0, 17340.0, 15206.0, 15232.0, 15237.0, 15229.0, 15056.0, 15104.0, 15212.0, 15232.0, 15236.0, 15215.0, 15104.0, 15217.0, 15206.0, 15211.0, 15217.0, 15228.0, 15056.0, 15213.0, 15214.0, 15212.0, 15221.0, 15226.0, 15224.0, 15236.0, 15212.0, 15228.0, 15212.0, 15217.0, 15206.0, 15143.0, 15205.0, 15218.0, 15017.0, 15221.0, 15211.0, 15206.0, 15206.0, 15232.0, 15205.0, 15228.0, 15221.0, 15207.0, 15206.0, 15220.0, 15227.0, 15104.0, 15202.0, 15065

In [0]:
tips_df.npartitions

1

In [0]:
tips_df.divisions

(None, None)

In [0]:
tips_df.columns

Index(['county', 'city', 'zipcode', 'address', 'state', 'rent', 'latitude',
       'longitude', 'cemetery_dist_miles', 'nationalhighway_miles',
       'railline_miles', 'starbucks_miles', 'walmart_miles', 'hospital_miles',
       'physician_dist_miles', 'dentist_dist_miles', 'opt_dist_miles',
       'vet_dist_miles', 'farmers_miles', 'time', 'bed', 'bath', 'halfbath',
       'sqft', 'property_type', 'garage', 'yearbuilt', 'pool', 'fireplace',
       'patio', 'lotsize', 'Census_MedianIncome', 'CollegeGrads',
       'WhiteCollar', 'Schools', 'Unemployment', 'EmploymentDiversity',
       'Census_Vacancy', 'Crime_Rate', 'HomePrice'],
      dtype='object')

In [0]:
tips_df[tips_df.yearbuilt.isna()].compute().shape

(0, 40)

In [0]:
tips_df.set_index("yearbuilt")

Unnamed: 0_level_0,county,city,zipcode,address,state,rent,latitude,longitude,cemetery_dist_miles,nationalhighway_miles,railline_miles,starbucks_miles,walmart_miles,hospital_miles,physician_dist_miles,dentist_dist_miles,opt_dist_miles,vet_dist_miles,farmers_miles,time,bed,bath,halfbath,sqft,property_type,garage,pool,fireplace,patio,lotsize,Census_MedianIncome,CollegeGrads,WhiteCollar,Schools,Unemployment,EmploymentDiversity,Census_Vacancy,Crime_Rate,HomePrice
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1
1800.0,object,object,float64,object,object,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,object,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64
2016.0,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...


In [0]:
tips_df.npartitions

1

In [0]:
tips_df.divisions

(None, None)

In [0]:
tips_df= tips_df.repartition(npartitions=4)

In [0]:
tips_df.npartitions

4

Some Readings:
* https://docs.dask.org/en/latest/why.html
* https://hub.gke.mybinder.org/user/dask-dask-examples-vfc23dy2/lab
* https://www.youtube.com/watch?v=ods97a5Pzw0

Awesome Article: https://docs.dask.org/en/latest/spark.html


Above articles tell you about how DASK and SPARK are related and How PySpark and Dask distributed can be compared !!!

But the one below shows the awesome speed factor compared to that of SPARK

https://docs.dask.org/en/latest/gpu.html

# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-extended