# **Speeding Up Pandas with GPU Acceleration Using cuDF**

Pandas is an incredibly flexible and powerful library for data manipulation, but it often struggles with performance, especially when working with large datasets. While pandas can handle various operations efficiently, certain limitations—such as single-threaded execution and memory management—can lead to slow processing. If you have a suitable NVIDIA GPU, **cuDF**, a part of the RAPIDS ecosystem, can help accelerate your pandas code without requiring major changes.

## Why Pandas Struggles with Performance

Despite its flexibility, pandas faces performance issues due to the following main reasons:

~> **Single-Threaded Operations**: Most operations in pandas are single-threaded, meaning the CPU remains underutilized, especially for large datasets.
~> **Memory Handling**: Pandas loads entire datasets into memory and may swap data to disk when the dataset exceeds available memory, which can significantly slow down operations.

## GPU Acceleration with cuDF

If you have access to an NVIDIA GPU, you can use **cuDF** to accelerate your pandas operations:

~> **cuDF** is part of the **RAPIDS** ecosystem and is designed to leverage the power of **NVIDIA GPUs** for data processing.
~> You can use cuDF to accelerate your pandas code without making significant changes, as it mimics the pandas API.

## How cuDF Works

~> **cuDF** provides a pandas-like API, but it runs computations on the GPU using **CUDA**.
~> This allows for substantial performance improvements, especially when dealing with large datasets, by utilizing the parallelism and speed of the GPU.

## Minimal Code Changes Required

~> You don’t need to rewrite your entire codebase to use cuDF. Simply replace:
  ```python
  import pandas as pd
  import cudf as pd
  ```

In [1]:
# !pip3 install --extra-index-url=https://pypi.nvidia.com polars[gpu] cudf-cu12

In [2]:
import time
import cudf
import pandas as pd
import polars as pl
import numpy as np
from sklearn.datasets import load_diabetes

In [3]:
X, _ = load_diabetes(return_X_y=True, as_frame=True)
X = X[['age', 'bmi', 'bp']]

In [4]:
repeat = 100000; X_big = pd.concat([X for _ in range(repeat)])
X_big.shape[0]

44200000

In [5]:
start_time = time.time()
(
    X_big.groupby('age')
    .agg({'bmi': 'mean', 'bp': 'max'})
    .sort_values(by='bmi')
)
end_time = time.time()

In [6]:
print(f"Pandas took {end_time - start_time:.2f}s")

Pandas took 0.57s


In [7]:
# Or alternatively load cudf.pandas extension 
X_cudf = cudf.DataFrame.from_pandas(X_big)

In [8]:
start_time = time.time()
(
    X_cudf.groupby('age')
    .agg({'bmi': 'mean', 'bp': 'max'})
    .sort_values(by='bmi') 
)
end_time = time.time()

In [9]:
print(f"cuDF took {end_time - start_time:.2f}s")

cuDF took 0.26s


In [10]:
# Polars (may require API changes)
X_polars = pl.from_dataframe(X_big).lazy()

In [11]:
start_time = time.time()
X_polars.group_by("age").agg([
    pl.col("bmi").mean(),
    pl.col("bp").max()
]).sort("bmi")
end_time = time.time()

In [12]:
print(f"Polars took {end_time - start_time:.4f}s")

Polars took 0.0021s


In [13]:
# In fact, you can use polars with GPU (requires tuning)
X_polars = pl.from_dataframe(X_big).lazy()

In [14]:
start_time = time.time()
X_polars.group_by("age").agg([
    pl.col("bmi").mean(),
    pl.col("bp").max()
]).sort("bmi").collect(engine='gpu')
end_time = time.time()

In [15]:
print(f"Polars (on GPU) took {end_time - start_time:.4f}s")

Polars (on GPU) took 0.6892s
