Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return row count when inferring schema from JSON #5007

Closed
asayers opened this issue Oct 30, 2023 · 2 comments · Fixed by #5008
Closed

Return row count when inferring schema from JSON #5007

asayers opened this issue Oct 30, 2023 · 2 comments · Fixed by #5008
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@asayers
Copy link
Contributor

asayers commented Oct 30, 2023

arrow-csv's infer_schema() returns the number of records read, along with the schema. This is useful information! arrow-json's infer_json_schema(), on the other hand, only returns the schema itself. It would be nice if it matched arrow-csv.

@asayers asayers added the enhancement Any new improvement worthy of a entry in the changelog label Oct 30, 2023
@asayers
Copy link
Contributor Author

asayers commented Oct 30, 2023

Arguably it's less important in arrow-json than it is in arrow-csv, since with NDJSON you can just count the newline characters to get the row count. (As opposed to CSV, where quoted fields may contain newline characters without indicating a new row, so you really need to parse it to get the row count.)

@tustvold tustvold added the arrow Changes to the arrow crate label Nov 2, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 2, 2023

label_issue.py automatically added labels {'arrow'} from #5008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants