-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing exception when a cell contains a newline #63
Comments
Hi @jpownby - thanks for reporting this issue and providing a very clear issue description! It's definitely a valid feedback, and I'll see what I can do. My main concern is that rapidcsv will need to make assumptions about the use-case in order to provide more detailed error messages, but perhaps there's some way around it. I'll update here again once I've had some time to look at this. |
Here's an example of the kind of error message that I think would be helpful:
I don't know what use-case assumptions you mean, but an exception like that would have helped me in my situation because I could have gone and looked at the file to see what the problem was. It also would have been helpful to have the problem identified immediately rather than having an exception get thrown later after I had assumed parsing was successful, regardless of what the exception message was. Hope that helps! |
Thanks!
Yeah I should've been a bit more specific. I think there could be CSV files that does not necessarily have same number of columns on all rows. Definitely not a typical use-case, but I think it could exist. So I wouldn't want to restrict this at parser-level. But the type of check you suggested could of course be added in relevant Get-functions. I'll play around with it a little and check performance impact, and update here again. Thanks again for the feedback, it is a good idea. |
Oh, ok, I see. Well, in that case, maybe the bug would be that an exception is thrown in GetColumn() if the index given is bigger than a particular row has :) |
Yeah, I think that's a reasonable and good idea for rapidcsv to support. It will also has minimal performance impact. The above commit implements support for this, with exception message on the format |
Thanks to you both, @d99kris and @jpownby. With this discussion, I was able to sidestep this error with this code:
|
A CSV file gave threw an exception ("invalid vector subscript") when I called:
document.GetColumn<std::string>(someIndex);
This exception was confusing to me.
someIndex
was less than the result returned bydocument.GetColumnCount()
, so I didn't understand what the problem was and had to debug the code to figure it out.It turns out that the CSV file has a newline
\n
character in the middle of a quoted cell. So, if I setpQuotedLinebreaks
totrue
in mySeparatorParameters
it fixes the problem.But, this was really non-obvious to me, and it seems strange that rapidcsv doesn't do any validation when parsing to catch that a row has the wrong number of cells and then assumes in
GetColumn()
that the number will be correct. The way the behavior currently works makes it seem like there is a problem withGetColumn()
, when really the problem is with the source data.I would suggest, in
ParseCsv(std::istream& pStream, std::streamsize p_FileLength)
, some kind of check whenevermData.push_back(row)
is about to be called to verify thatrow.size() == GetColumnCount()
(or similar), and if it doesn't then an exception could be thrown. That would help identify what the problem really is (whether it's the result of a newline or just bad data) rather than having parsing apparently succeed but then unexpected errors happen when the results are used.The text was updated successfully, but these errors were encountered: