Skip to content

CSV inference reads in the whole file to memory, regardless of row limit #3658

@kmitchener

Description

@kmitchener

Describe the bug
When inferring the schema, the complete CSV will be read into memory even if you leave it at the default 1000 rows to infer from.

To Reproduce

let df = ctx.read_csv("./test/", CsvReadOptions::new()).await?;

Happens here:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/datasource/file_format/csv.rs#L109

Expected behavior
It should read in only as much data as it needs for the given row count to infer data from.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions