# Chapter 8: Machine Learning using cuML

<img src="images/chapter-08/rapids_logo.png" style="width:600px;"/>

As part of the NVIDIA RAPIDS suite, cuML is incredibly useful for accelerating the end-to-end machine learning pipeline, from data preprocessing to model training and evaluation, utilizing the parallel processing capabilities of NVIDIA GPUs. 

Like other RAPIDS libraries, cuML strives to mimic the behavior of its counterpart, scikit-learn, in the Python scientific computing ecosystem.  By matching the scikit-learn API, users who are already familiar with the syntax and functionality of scikit-learn will easily be able to transition to cuML for GPU acceleration.


## cuML Basics

cuML is a library that implements a suite of machine learning algorithms within an easy-to-use scikit-learn like interface but on GPUs, enabling dramatic performance improvements. 

### Advantages of Using cuML

cuML offers several advantages that make it an attractive choice for data scientists looking to accelerate their machine learning workflows:
- **Speed:** By leveraging GPU acceleration, cuML can significantly reduce the time required to train models and make predictions.
- **Scalability:** It’s designed to scale from a single GPU to multi-GPU setups, enabling the processing of large datasets more efficiently.
- **Ease of Use:** cuML’s API mirrors that of scikit-learn, making it accessible to those already familiar with the popular Python library for machine learning.


### When to Use cuML
If you encounter any of these scenarios, cuML offers an extraordinary advantage:
- Large datasets slowing down your computations
- Performance-critical machine learning applications
- Desire to tap into the raw power of GPU processing


### Use Cases
- Big Data Analytics: Ideal for applications requiring the processing of large volumes of data, such as financial analysis or real-time analytics.
- Deep Learning Preprocessing: Use cuML for preprocessing steps in machine learning workflows, significantly reducing bottlenecks when training deep learning models.
- Time Series Forecasting: Speed up training on time series models that involve massive datasets.
  

### Short Comings of cuML

- **GPU Requirement:** cuML is designed to run on NVIDIA GPUs, which means you need access to compatible hardware. For those without NVIDIA GPUs, cuML is not an option, limiting its accessibility compared to CPU-based libraries like scikit-learn.

- **CUDA Dependency:** The library depends on CUDA, NVIDIA’s parallel computing platform and programming model. This means users must have a compatible CUDA version installed, which can introduce compatibility issues and additional setup complexity.

- **GPU Memory Limitations:** The performance and scalability of cuML algorithms are directly tied to the GPU’s memory capacity. For very large datasets, this could become a bottleneck, as the entire dataset and intermediate computations need to fit into GPU memory, which is typically more limited than system RAM.

- **Limited Algorithm Selection:** While cuML offers a range of commonly used machine learning algorithms, its selection is not as comprehensive as scikit-learn’s. Certain niche or very new algorithms might not be available, which could be a limitation for some projects.

- **Scaling Challenges:** While cuML supports multi-GPU configurations for some algorithms, scaling out to multiple GPUs can introduce additional complexity in terms of setup and code. Managing data distribution and aggregation across GPUs can be challenging, particularly for algorithms that are not inherently designed for distributed computing.

- **Integration with Other Libraries:** Data scientists often use a wide range of tools and libraries in their workflow. cuML’s integration with other Python libraries is generally good, especially within the RAPIDS ecosystem, but there can be challenges when integrating with libraries that are not GPU-aware, requiring additional data transfers between CPU and GPU memory.

- **Ecosystem Compatibility:** Projects deeply integrated with other machine learning and data processing frameworks may encounter challenges incorporating cuML, especially if those frameworks do not natively support GPU acceleration or have specific dependencies on CPU-based algorithms.

- **Familiarity with GPU Computing:** To fully leverage cuML and troubleshoot any issues that arise, users may need a basic understanding of GPU computing principles, which can be a learning curve for those only familiar with CPU-based computing.

- **Documentation and Community Support:** While the RAPIDS ecosystem is growing, the documentation and community support for cuML might not be as extensive or mature as for more established libraries like scikit-learn. This can make solving specific problems or understanding advanced features more challenging.

## Advanced Features of cuML

### Multi-GPU Support 

cuML supports multi-GPU setups, allowing you to scale your computations further. This is especially useful for extremely large datasets or complex models that benefit from distributed processing.

### Integration wth Other RAPIDS Libraries 

cuML integrates well with other RAPIDS libraries like cuDF (for data manipulation), cuGraph (for graph analytics), and cuSpatial (for spatial data). This synergy allows you to build comprehensive data science workflows entirely on the GPU.


## Links to Handy References

cuML Documentation: https://docs.rapids.ai/api/cuml/stable/

cuML API Reference: https://docs.rapids.ai/api/cuml/stable/api/ 

Scikit-learn Documentation: https://scikit-learn.org/stable/  

<img src="images/chapter-08/nvidia-cuda-ml.jpg" style="width:800px;"/>

# Coding Guide

### Installation 
Please use the cuDF RAPIDS Installation Guide for intallation instructions appropriate to your hardware and Python environment: https://docs.rapids.ai/install/ 

# Examples

## Create a simple DataFrame.

In [None]:
import cudf
import numpy as np

# Creating a random DataFrame
data = cudf.DataFrame({
    'x1': np.random.rand(1000),
    'x2': np.random.rand(1000),
    'y': np.random.randint(0, 2, size=1000)
})

data.head()

### 💡 Challenge: modify the number of rows in the DataFrame in the frame above and observe how it changes the output.

## Train a Machine Learning Model - Simple Logistic Regression

In [None]:
from cuml.linear_model import LogisticRegression
from cuml.model_selection import train_test_split

#split the data into training and testing sets
X = data[['x1', 'x2']]
y = data['y']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)
model = LogisticRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

### 💡 Challenge: Try different parameters, different solvers or adding regularization. What happens?

## Evaluate Model Performance Using Accuracy.

In [None]:
from cuml.metrics.accuracy import accuracy_score

accuracy = accuracy_score(y_test, predictions)
print(f"Model Accuracy: {accuracy:.4f}")

## Hyperparameter Tuning with Grid Search

In [None]:
from cuml.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'fit_intercept': [True, False],
    'max_iter': [100, 200]
}

grid_search = GridSearchCV(model, param_grid, scoring='accuracy', cv=3)
grid_search.fit(X_train, y_train)

# Get the best parameters and score
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Score: {grid_search.best_score_:.4f}")

## Cross-Validation

In [None]:
from cuml.model_selection import cross_val_score

scores = cross_val_score(model, X_train, y_train, cv=5)
print(f"Cross-Validation Scores: {scores}")
print(f"Mean Cross-Validation Score: {scores.mean():.4f}")

## Comparing GPU & CPU Models

In [None]:
import time 

# Timing the GPU model training
start_time = time.time()
model = LogisticRegression()
model.fit(X_train, y_train)
gpu_time = time.time() - start_time
print(f"GPU Training Time: {gpu_time:.4f} seconds")

Model fit on the CPU instead: 

In [None]:
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Create a large random DataFrame using pandas
data_pd = pd.DataFrame({
    'x1': np.random.rand(1000000),
    'x2': np.random.rand(1000000),
    'y': np.random.randint(0, 2, size=1000000)
})

X_pd = data_pd[['x1', 'x2']]
y_pd = data_pd['y']
X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(X_pd, y_pd, test_size=0.2)


# Timing the CPU model training
start_time = time.time()
cpu_model = SklearnLogisticRegression()
cpu_model.fit(X_train_pd, y_train_pd)

cpu_time = time.time() - start_time
print(f"CPU Training Time: {cpu_time:.4f} seconds")

**NOTE:** Although on surface level the code looked almost identical, the model using cuML was almost 100x as fast as the one using just scikit-learn. cuML on a GPU can significantly outperform traditional CPU-based machine learning libraries, especially with large datasets. The time savings become more pronounced as the data size increases, showcasing the advantages of leveraging GPU acceleration for machine learning tasks.

### 💡 Challenge: if you have a specific dataset or model in mind, you could run the above examples to see the time differences firsthand!


## Saving & Loading Your Model

In [None]:
!pip install joblib

In [None]:
# Save the trained model using cuML's joblib.
import joblib

joblib.dump(model, 'logistic_regression_model.pkl')

# Load the model
loaded_model = joblib.load('logistic_regression_model.pkl')

## K-Means Clustering

In [None]:
# 📊 Using K-Means Clustering
# Let's demonstrate K-Means clustering using cuML.

from cuml.cluster import KMeans
from cupy import cp 

# Generate synthetic data for clustering
X_clustering = cp.random.rand(10000, 2)  # 10,000 samples, 2 features

# Initialize KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X_clustering)

# Predict cluster labels
labels = kmeans.predict(X_clustering)

# Display the first few labels
print(labels[:10])


## PCA for Dimensionality Reduction 

In [None]:
# 📊 PCA for Dimensionality Reduction
# Perform PCA on the dataset for dimensionality reduction.

from cuml.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print(f"PCA Transformed Shape: {X_pca.shape}")
