Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

burntsushi's issue #1092

Open
jariji opened this issue May 31, 2023 · 1 comment
Open

burntsushi's issue #1092

jariji opened this issue May 31, 2023 · 1 comment

Comments

@jariji
Copy link

jariji commented May 31, 2023

Andrew Gallant (aka burntsushi), author of ripgrep and xsv, wrote in 2020 that some CSVs won't work using CSV.jl's then-current strategy.

https://news.ycombinator.com/item?id=24747509

Just thought I'd bring it up in case there's something worth documenting here.

@Drvi
Copy link
Collaborator

Drvi commented May 31, 2023

That argument is exactly why in ChunkedCSV.jl we don't "jump and recover" even though I still think that is the most performant strategy. Ideally, the user gets to choose which strategy to employ for their file (e.g. if no string fields are present in the file, then what CSV.jl does is pretty much optimal and safe). Still, I think in practice CSV.jl seems to be safe enough and with some work could be made entirely safe -- it would just need to detect it got to an inconsistent state and use this information to retry with better chunking boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants