# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [None]:
!nvidia-smi

Sun Feb  5 14:39:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P0    30W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#Setup:
Set up script installs
1. Updates gcc in Colab
1. Installs Conda
1. Install RAPIDS' current stable version of its libraries, as well as some external libraries including:
  1. cuDF
  1. cuML
  1. cuGraph
  1. cuSpatial
  1. cuSignal
  1. BlazingSQL
  1. xgboost
1. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 328, done.[K
remote: Counting objects: 100% (157/157), done.[K
remote: Compressing objects: 100% (102/102), done.[K
remote: Total 328 (delta 92), reused 98 (delta 55), pack-reused 171[K
Receiving objects: 100% (328/328), 94.64 KiB | 13.52 MiB/s, done.
Resolving deltas: 100% (154/154), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pynvml
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.0/47.0 KB 7.1 MB/s eta 0:00:00
Installing collected packages: pynvml
Successfully installed pynvml-11.4.1
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
We will now install RAPIDS via pip!  Please stand by, should be quick...
***********************************************************************



In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

Updating your Colab environment.  This will restart your kernel.  Don't Panic!
Get:1 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ Packages [71.1 kB]
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease [18.1 kB]
Ign:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  InRelease
Hit:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Hit:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release
Hit:11 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:12 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:13 http://ppa.launchp

In [None]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:16
🔁 Restarting kernel...


In [None]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
!python --version

Python 3.8.15


In [None]:
import sys
sys.path

['/usr/local/lib/python3.8/site-packages',
 '/content',
 '/env/python',
 '/usr/lib/python38.zip',
 '/usr/lib/python3.8',
 '/usr/lib/python3.8/lib-dynload',
 '',
 '/usr/local/lib/python3.8/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.8/dist-packages/IPython/extensions',
 '/root/.ipython']

In [None]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os, sys, shutil
sys.path.append('/usr/local/lib/python3.8/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

Found existing installation: cffi 1.15.1
Uninstalling cffi-1.15.1:
  Successfully uninstalled cffi-1.15.1
Found existing installation: cryptography 38.0.4
Uninstalling cryptography-38.0.4:
  Successfully uninstalled cryptography-38.0.4
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cffi==1.15.0
  Downloading cffi-1.15.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (446 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 446.7/446.7 kB 12.2 MB/s eta 0:00:00
Installing collected packages: cffi
Successfully installed cffi-1.15.0
Installing RAPIDS Stable 22.12
Starting the RAPIDS install on Colab.  This will take about 15 minutes.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

In [None]:
import cudf
import cuml
import numpy as np
import pandas as pd
import pickle
from cuml.ensemble import RandomForestClassifier as curfc
from cuml.metrics import accuracy_score
import time

--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy, cupy-cuda11x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------



In [None]:
train_fn = "/content/drive/MyDrive/Parallel Project/Data/Cancer/cancer_train.csv" #train data path
test_fn = "/content/drive/MyDrive/Parallel Project/Data/Cancer/cancer_test.csv" #test data path

In [None]:
train = cudf.read_csv(train_fn)
test = cudf.read_csv(test_fn)

Marketing Preprocessing (only run when using marketing dataset)

In [None]:
# Code categorical variables
import sklearn
for p in ["job", "marital", "education", "default", "housing", "loan", "contact", "poutcome", "month"]:
	label_encoder = cuml.preprocessing.LabelEncoder()
	label_encoder.fit(train[p].unique())
	train[p] = label_encoder.transform(train[p])
	test[p] = label_encoder.transform(test[p])

loan Preprocessing (only run when using marketing dataset)

In [None]:
# Code categorical variables
import sklearn
for p in ["grade", "sub_grade", "initial_list_status"]:
	label_encoder = cuml.preprocessing.LabelEncoder()
	label_encoder.fit(train[p].unique())
	train[p] = label_encoder.transform(train[p])
	test[p] = label_encoder.transform(test[p])

In [None]:
#change dtype to resolve the data type error when training on marketing and cancer set 
t1= train.astype('float32')

In [None]:
cols = train.columns.values 
predictors = [p for p in cols if p != "y"]
target = "y"

In [None]:
n = 200
split_n = 85
leaf_n = 75
seed = 1995
cuml_model = curfc(random_state = seed, n_estimators = n, min_samples_split = split_n, min_samples_leaf = leaf_n)

  return func(**kwargs)


In [None]:
start= time.time()
cuml_model.fit(t1[predictors], t1[target])
end=time.time()

In [None]:
print(end-start)

0.4738731384277344


In [None]:
%%time
fil_preds_orig = cuml_model.predict(test[predictors])

CPU times: user 448 ms, sys: 230 ms, total: 677 ms
Wall time: 625 ms


In [None]:
print(accuracy_score(test[target], cuml_model.predict(test[predictors])))

0.899584174156189
