# Pandas (cuDF)

In this notebook we will test Pandas with cuDF acceleration. We will use the `pandas` and `cudf` libraries to load a CSV file and perform some operations on it. We will compare the performance of the operations with and without cuDF acceleration.

#### Dataset information
![dataset_information](../public/dataset_information.png)

In [2]:
import time
import pandas as pd
import cudf

csv_file = "../data/concat.csv"

### Time to load a large CSV

In [None]:
# Measure Pandas load time (CPU)
start_time = time.time()
df_pandas = pd.read_csv(csv_file)
pandas_time = time.time() - start_time
print(f"Pandas load time: {pandas_time:.4f} seconds")

# Measure cuDF load time (GPU)
start_time = time.time()
df_cudf = cudf.read_csv(csv_file)
cudf_time = time.time() - start_time
print(f"cuDF (GPU) load time: {cudf_time:.4f} seconds")

Pandas load time: 15.4155 seconds
cuDF (GPU) load time: 3.0132 seconds


In [4]:
%load_ext cudf.pandas
import pandas as pd
start_time = time.time()
df_pandas_load = pd.read_csv(csv_file)
pandas_load_time = time.time() - start_time
print(f"Pandas load time: {pandas_load_time:.4f} seconds")

Pandas load time: 2.7375 seconds
