# Install RAPIDS cuDF

**This will complete in about 3-4 minutes**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ritchieng/deep-learning-wizard/blob/master/docs/machine_learning/gpu/rapids_cudf.ipynb)

## Environment Setup

### Check Version

#### Python Version

In [5]:
# Check Python Version
!python --version

Python 3.10.12


#### Ubuntu Version

In [6]:
# Check Ubuntu Version
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.5 LTS
Release:	20.04
Codename:	focal


#### Check CUDA Version

In [7]:
# Check CUDA/cuDNN Version
!nvcc -V && which nvcc

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
/usr/local/cuda/bin/nvcc


#### Check GPU Version

In [8]:
# Check GPU
!nvidia-smi

Sun Jun 11 17:29:55 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Setup:
This set up script:

1. Checks to make sure that the GPU is RAPIDS compatible
1. Installs the **current stable version** of RAPIDSAI's core libraries using pip, which are:
  1. cuDF
  1. cuML
  1. cuGraph
  1. xgboost

**This will complete in about 3-4 minutes**

Please use the [RAPIDS Conda Colab Template notebook](https://colab.research.google.com/drive/1TAAi_szMfWqRfHVfjGSqnGVLr_ztzUM9) if you need to install any of RAPIDS Extended libraries, such as:
- cuSpatial
- cuSignal
- cuxFilter
- cuCIM

OR
- nightly versions of any library 

In [9]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 390, done.[K
remote: Counting objects: 100% (121/121), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 390 (delta 89), reused 51 (delta 51), pack-reused 269[K
Receiving objects: 100% (390/390), 107.11 KiB | 17.85 MiB/s, done.
Resolving deltas: 100% (191/191), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pynvml
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 6.5 MB/s eta 0:00:00
Installing collected packages: pynvml
Successfully installed pynvml-11.5.0
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
We will now install RAPIDS cuDF, cuML, and cuGraph via pip! 
Please stand by, should be quick...
***********************************************************************

Looking in ind

# JoinBoost GPU

In [29]:
!git clone https://github.com/zachary62/JoinBoostGPU

Cloning into 'JoinBoostGPU'...
remote: Enumerating objects: 52, done.[K
remote: Counting objects: 100% (52/52), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 52 (delta 22), reused 35 (delta 9), pack-reused 0[K
Unpacking objects: 100% (52/52), 19.54 MiB | 9.08 MiB/s, done.


In [30]:
import cudf
import sys
sys.path.append('JoinBoostGPU/')
import joinboostgpu

In [31]:
%%time
customer = cudf.read_csv('JoinBoostGPU/data/customer.csv')
lineorder_o = cudf.read_csv('JoinBoostGPU/data/lineorder.csv')
date = cudf.read_csv("JoinBoostGPU/data/date.csv")
part = cudf.read_csv("JoinBoostGPU/data/part.csv")
supplier = cudf.read_csv("JoinBoostGPU/data/supplier.csv")

CPU times: user 53 ms, sys: 13 ms, total: 66 ms
Wall time: 69.3 ms


In [32]:
dim_df = {
"customer": customer,
"part": part,
"date": date,
"supplier": supplier
}
dim_key = {
"customer": "CUSTKEY",
"part": "PARTKEY",
"date": "DATEKEY",
"supplier": "SUPPKEY"
}
dim_feature = {
"customer": ["NAME", "ADDRESS", "CITY"],
"part":  ["NAME", "MFGR", "CATEGORY", "BRAND1"],
"date":["DATE", "DAYOFWEEK", "MONTH", "YEAR", "YEARMONTHNUM", "YEARMONTH", "DAYNUMINWEEK"],
"supplier": ["NAME", "ADDRESS", "CITY", "NATION"]
}

for relation in dim_key:
    key = dim_key[relation]
    dim_df[relation].set_index(key,inplace=True)
  
lineorder_o.rename(columns={"ORDERDATE": "DATEKEY"}, inplace=True)

In [39]:
%%time
joinboostgpu.train_decision_tree(lineorder_o, dim_df, dim_key, dim_feature)

splitting relation date feature DAYNUMINWEEK value 19
splitting relation part feature CATEGORY value 70
splitting relation part feature BRAND1 value 4
splitting relation customer feature CITY value 950
splitting relation supplier feature ADDRESS value 30
splitting relation date feature YEARMONTHNUM value 5
splitting relation date feature DAYOFWEEK value 971
CPU times: user 761 ms, sys: 27.4 ms, total: 788 ms
Wall time: 780 ms
