Skip to content

Conversation

@jychen7
Copy link
Contributor

@jychen7 jychen7 commented Apr 3, 2022

Which issue does this PR close?

Closes #2109

Rationale for this change

Almost 100x slowdown on 0.7.0 with CSV file due to parsing entire file to infer schema

What changes are included in this PR?

set default to 1000 instead of None (unlimit) for csv and json's schema_infer_max_rec

Are there any user-facing changes?

NA

jychen7 added 3 commits April 2, 2022 21:31
# Conflicts:
#	datafusion/core/src/datasource/file_format/csv.rs
#	datafusion/core/src/datasource/file_format/json.rs
#	datafusion/core/tests/sql/mod.rs
@jychen7 jychen7 marked this pull request as ready for review April 3, 2022 14:24
@alamb alamb merged commit 69ba713 into apache:master Apr 4, 2022
@alamb
Copy link
Contributor

alamb commented Apr 4, 2022

Thanks @jychen7 !

@alamb alamb changed the title #2109 schema infer max #2109 By default, use only 1000 rows to infer the schema Apr 4, 2022
@jychen7 jychen7 mentioned this pull request Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Almost 100x slowdown on 0.7.0 with CSV file due to parsing entire file to infer schema

3 participants