# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [None]:
!nvidia-smi

Sat May 27 11:28:15 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#Setup:
Set up script installs
1. Updates gcc in Colab
1. Installs Conda
1. Install RAPIDS' current stable version of its libraries, as well as some external libraries including:
  1. cuDF
  1. cuML
  1. cuGraph
  1. cuSpatial
  1. cuSignal
  1. BlazingSQL
  1. xgboost
1. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 390, done.[K
remote: Counting objects: 100% (121/121), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 390 (delta 89), reused 51 (delta 51), pack-reused 269[K
Receiving objects: 100% (390/390), 107.11 KiB | 5.64 MiB/s, done.
Resolving deltas: 100% (191/191), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pynvml
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.1/53.1 kB 1.2 MB/s eta 0:00:00
Installing collected packages: pynvml
Successfully installed pynvml-11.5.0
***********************************************************************
Woo! Your instance has the right kind of GPU, a Tesla T4!
We will now install RAPIDS via pip!  Please stand by, should be quick...
***********************************************************************



In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

Updating your Colab environment.  This will restart your kernel.  Don't Panic!
Found existing installation: cupy-cuda11x 11.0.0
Uninstalling cupy-cuda11x-11.0.0:
  Successfully uninstalled cupy-cuda11x-11.0.0
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:2 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease [18.1 kB]
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:6 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Hit:7 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Hit:8 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease
Hit:9 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal InRelease
Get:10 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu focal InRelease [17.5 kB]
Hit:11 http://ppa.launchpad.net/ubuntugis/ppa/

In [None]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.1.0-1/Mambaforge-23.1.0-1-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:14
🔁 Restarting kernel...


In [None]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

libcudf-23.04.01     | 286.0 MB  | #####1     |  51% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####2     |  52% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####3     |  53% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####3     |  54% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####4     |  55% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####5     |  56% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####6     |  56% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####7     |  57% [A[A[A[A[A[A[A[A[A[A[A[A[A[A














libcudf-23.04.01     | 286.0 MB  | #####7     |  58% 

# cuDF and cuML Examples #

Now you can run code! 

What follows are basic examples where all processing takes place on the GPU.

#[cuDF](https://github.com/rapidsai/cudf)#

Load a dataset into a GPU memory resident DataFrame and perform a basic calculation.

Everything from CSV parsing to calculating tip percentage and computing a grouped average is done on the GPU.

In [None]:
import cudf
import io, requests

# download CSV file from GitHub
url="https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

# read CSV from memory
tips_df = cudf.read_csv(io.StringIO(content))
tips_df['tip_percentage'] = tips_df['tip']/tips_df['total_bill']*100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

#[cuML](https://github.com/rapidsai/cuml)#

This snippet loads a 

As above, all calculations are performed on the GPU.

In [None]:
import cuml



# Next Steps #

For an overview of how you can access and work with your own datasets in Colab, check out [this guide](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92).

For more RAPIDS examples, check out our RAPIDS notebooks repos:
1. https://github.com/rapidsai/notebooks
2. https://github.com/rapidsai/notebooks-contrib

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import cupy as cp


class KNeighborsClassifier:
    def _init_(self, k):
        self.k = k
        self.scaler = StandardScaler()

    def fit(self, X_train, y_train):
        X_train = self.scaler.fit_transform(X_train)
        self.X_train = cp.asarray(X_train)
        self.y_train = cp.asarray(y_train)

    def euclidean_distance(self, X1, X2):
        X1 = cp.asarray(X1)
        X2 = cp.asarray(X2)
        distances = cp.zeros((len(X1), len(X2)))
        for i, x1 in enumerate(X1):
            for j, x2 in enumerate(X2):
                distance = cp.sqrt(cp.sum((x1 - x2[:len(x1)]) ** 2))
                distances[i, j] = distance
        return distances

    def predict(self, X):
        X = self.scaler.transform(X)
        X = cp.asarray(X)
        distances = self.euclidean_distance(X, self.X_train)
        distances_1d = cp.reshape(distances, -1)
        unique_distances = cp.unique(distances_1d)
        threshold = unique_distances[int(self.k * len(unique_distances))]
        predictions = cp.zeros(len(X), dtype=int)
        for i, distance_row in enumerate(distances):
            sorted_indices = cp.argsort(distance_row)
            k_nearest_labels = self.y_train[sorted_indices[:self.k]]
            unique_labels, label_counts = cp.unique(k_nearest_labels, return_counts=True)
            max_count_label = unique_labels[cp.argmax(label_counts)]
            predictions[i] = max_count_label
        return predictions.get()

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from cuml.neighbors import KNeighborsClassifier

class FraudDetection:
    def __init__(self, dataset_path, k, test_size=0.2):
        self.k = k
        self.test_size = test_size
        self.dataset_path = dataset_path
        self.model = None
        self.X_train = None
        self.X_test = None
        self.y_train = None
        self.y_test = None

    def load_dataset(self):
        df = pd.read_csv(self.dataset_path)
        df = df.sample(frac=1).reset_index(drop=True)  # Shuffle the dataset

        # Preprocess the features
        X = df.drop(['isFraud'], axis=1)

        # One-hot encode categorical variables
        categorical_cols = ['type']
        X = pd.get_dummies(X, columns=categorical_cols)

        # Convert the remaining columns to numeric
        X = X.apply(pd.to_numeric, errors='coerce')

        # Handle missing values
        X.fillna(0, inplace=True)

        # Standardize the feature matrix
        scaler = StandardScaler()
        X = scaler.fit_transform(X)

        # Preprocess the target variable
        y = df['isFraud'].values

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, test_size=self.test_size, random_state=42)

    def train(self):
        self.model = KNeighborsClassifier(n_neighbors=self.k)
        self.model.fit(self.X_train, self.y_train)

    def test(self):
        y_pred = self.model.predict(self.X_test)
        accuracy = np.sum(y_pred == self.y_test) / len(self.y_test)
        return accuracy


# # Example usage
# dataset_path = 'path/to/dataset.csv'  # Replace with the actual path to the downloaded dataset file
# k = 5  # Number of neighbors

# fraud_detector = FraudDetection(dataset_path, k)
# fraud_detector.load_dataset()
# fraud_detector.train()
# accuracy = fraud_detector.test()
# print("Accuracy:", accuracy)


In [None]:

    dataset_path = r"/content/PS_20174392719_1491204439457_log.csv"
    k = 5  # Number of neighbors to consider

    fraud_detection = FraudDetection(dataset_path, k)
    fraud_detection.load_dataset()
    fraud_detection.train()
    accuracy = fraud_detection.test()
    print("Accuracy:", accuracy)

Accuracy: 0.9995825529706434


In [2]:
# Example prediction on a new record
new_record = [1, 30000, 'CASH_OUT', 100000, 0, 0, 10000]

prediction = fraud_detector.predict(new_record)
if prediction[0] == 0:
    print("The new record is not a fraud.")
else:
    print("The new record is a fraud.")

The new record is a fraud.
