# Comparison

In this notebook we will compare the following methods:
- Pandas
- PyArrow with Pandas
- Polars (CPU-based)
- Polars with CUDA acceleration

#### Dataset information
![dataset_information](./public/dataset_information.png)

### Time to load CSV

In [2]:
import time
import pandas as pd
import pyarrow.parquet as pq
import polars as pl

def time_csv_loading(file_path):
    """
    Measures and compares the time taken to load a CSV file using:
    - Pandas
    - PyArrow with Pandas
    - Polars (CPU)
    
    Returns a dictionary with execution times.
    """
    
    results = {}
    
    # Pandas
    start = time.time()
    df_pandas = pd.read_csv(file_path)
    results["Pandas"] = time.time() - start
    
    # Pandas with PyArrow backend
    start = time.time()
    df_pandas_pyarrow = pd.read_csv(file_path, engine="pyarrow")
    results["Pandas with PyArrow"] = time.time() - start
    
    # Polars (CPU)
    start = time.time()
    df_polars = pl.read_csv(file_path)
    results["Polars (CPU)"] = time.time() - start
    
    return results

In [3]:
file_path = "./data/concat.csv"
timings = time_csv_loading(file_path)

for method, time_taken in timings.items():
	print(f"{method}: {time_taken:.4f} seconds")

Pandas: 11.3736 seconds
Pandas with PyArrow: 1.8045 seconds
Polars (CPU): 1.1111 seconds
