ENH: preview_csv(***.csv) for Fast First-N-Line Preview on Large Plus Size (>100GB) #61281
Open
1 of 3 tasks
Labels
Enhancement
IO CSV
read_csv, to_csv
Needs Discussion
Requires discussion from core team before further action
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
The current
pandas.read_csv()
implementation is designed for robust and complete CSV parsing. However, even when users request only a few lines usingnrows=X
, the function:For large datasets (10–100GB CSVs), this results in significant I/O, CPU, and memory overhead — all when the user likely just wants a quick preview of the data.
This is a common pattern in:
Currently, users resort to workarounds like:
or shell-level hacks like:
These are non-intuitive, unstructured, or outside the pandas ecosystem.
Feature Description
Introduces a new Function
Goals
dtype_infer = true
Proposed API:
Alternative Solutions
pd.read_csv(nrows=X)
nrows
pd.read_csv(chunksize=X)
X
)DataFrame
directlycsv.reader + slicing
subprocess.run(["head", "-n"])
Polars: pl.read_csv(..., n_rows)
Dask: dd.read_csv(...).head()
open(...).readlines(N)
pyarrow.csv.read_csv(...)[0:X]
While workarounds exist, none provide a clean, idiomatic, native pandas function to:
DataFrame
immediatelyA dedicated
pandas.preview_csv()
would fill this gap and offer an elegant, performant solution for quick data previews.Additional Context
No response
The text was updated successfully, but these errors were encountered: