-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Component: C++Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement
Description
I'm trying to read only the first 1,000 rows of a huge CSV with PyArrow.
I don't see a way to do this with Arrow. I guess it should be easy to implement by adding a max_rows parameter to pyarrow.csv.ReadOptions.
After reading the first 1,000, it should be possible to load the next 1,000 (or any other chunk) by using both the new max_rows together with skip_rows (e.g. pyarrow.csv.read_csv(path, pyarrow.csv.ReadOption(skip_rows=1_000, max_rows=1_000) would read from 1,000 to 2,000).
Thanks!
Reporter: Marc Garcia
Note: This issue was originally created as ARROW-10419. Please see the migration documentation for further details.
RAMitchell
Metadata
Metadata
Assignees
Labels
Component: C++Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement