#2109 By default, use only 1000 rows to infer the schema #2139

jychen7 · 2022-04-03T02:20:50Z

Which issue does this PR close?

Closes #2109

Rationale for this change

Almost 100x slowdown on 0.7.0 with CSV file due to parsing entire file to infer schema

What changes are included in this PR?

set default to 1000 instead of None (unlimit) for csv and json's schema_infer_max_rec

Are there any user-facing changes?

NA

… during `cargo test`

# Conflicts: # datafusion/core/src/datasource/file_format/csv.rs # datafusion/core/src/datasource/file_format/json.rs # datafusion/core/tests/sql/mod.rs

alamb · 2022-04-04T18:18:14Z

Thanks @jychen7 !

jychen7 added 3 commits April 2, 2022 21:31

set default schema infer max record

e99a7fe

fix unrelated issue "error: format argument must be a string literal"…

0c1276e

… during `cargo test`

Merge remote-tracking branch 'origin/master' into 2109-schema-infer-max

b74571c

# Conflicts: # datafusion/core/src/datasource/file_format/csv.rs # datafusion/core/src/datasource/file_format/json.rs # datafusion/core/tests/sql/mod.rs

github-actions bot added the datafusion label Apr 3, 2022

Dandandan approved these changes Apr 3, 2022

View reviewed changes

Merge branch 'master' into 2109-schema-infer-max

79f0b07

jychen7 marked this pull request as ready for review April 3, 2022 14:24

alamb merged commit 69ba713 into apache:master Apr 4, 2022

alamb changed the title ~~#2109 schema infer max~~ #2109 By default, use only 1000 rows to infer the schema Apr 4, 2022

jychen7 mentioned this pull request Apr 5, 2022

#2109 schema infer max #2159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#2109 By default, use only 1000 rows to infer the schema #2139

#2109 By default, use only 1000 rows to infer the schema #2139

Uh oh!

jychen7 commented Apr 3, 2022

Uh oh!

alamb commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

#2109 By default, use only 1000 rows to infer the schema #2139

#2109 By default, use only 1000 rows to infer the schema #2139

Uh oh!

Conversation

jychen7 commented Apr 3, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants