Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse really long CSV cell (breaks Parsers.jl) #935

Open
hs-ye opened this issue Oct 21, 2021 · 4 comments
Open

Unable to parse really long CSV cell (breaks Parsers.jl) #935

hs-ye opened this issue Oct 21, 2021 · 4 comments

Comments

@hs-ye
Copy link

hs-ye commented Oct 21, 2021

Hi Team,

Have an unusual situation where i'm trying to read in CSV files with really long (Geospatial data), about ~150k characters per row (Sample attached).

Using the default CSV.File method with quote chars (see my sample file attached) - i get this error. Following the stacktrace it seems the problem is with how Parsers.jl implements reading long strings from a file using their custom byte index, which only supports a maximum length of ~100k chars

segment_mini.csv

Error stacktrace:

csv

Wondering what's the stance on supporting this type of use case by CSV.jl? Will there ever be support for super long lines or should I raise with over at the Parsers.jl github instead?

@quinnj
Copy link
Member

quinnj commented Oct 21, 2021

Just a clarification that the ~100K characters is per cell, not per row. I think we can support double the current length without too much trouble; we just need to add the bigger definition in Parsers.jl, then need to provide a way in CSV.jl, probably just via a keyword arg, to specify that you need/want the larger PosLen.

@hs-ye
Copy link
Author

hs-ye commented Oct 21, 2021

Sorry yes, per Cell is correct, a limitation of the current PosLen primitive used for strings. In the data It's just the one column geoj_segment that's a polygon of GPS co-ordinates, which could be really long (the first data row in the sample i provided).

Double the length would be amazing, i think the 150k is the largest cell we have right now. I'm also looking at compressing/truncating the data from my end to solve my immediate problem, but if this could be a future feature it would help a lot!

@quinnj
Copy link
Member

quinnj commented Oct 22, 2021

Some thoughts/initial work at increasing capacity in Parsers.jl: JuliaData/Parsers.jl#98

@nickrobinson251 nickrobinson251 changed the title Parsing really long csv row breaks parsers.jl Parsing really long csv cell breaks parsers.jl Dec 6, 2022
@brad-ross
Copy link

Bumping this issue, since I'm running into a similar problem, and it appears there's no alternative parsing option in this case.

@nickrobinson251 nickrobinson251 changed the title Parsing really long csv cell breaks parsers.jl Unable to parse really long CSV cell (breaks Parsers.jl) Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants