Confusing exception when a cell contains a newline #63

jpownby · 2020-11-04T16:45:31Z

A CSV file gave threw an exception ("invalid vector subscript") when I called:

document.GetColumn<std::string>(someIndex);

This exception was confusing to me. someIndex was less than the result returned by document.GetColumnCount(), so I didn't understand what the problem was and had to debug the code to figure it out.

It turns out that the CSV file has a newline \n character in the middle of a quoted cell. So, if I set pQuotedLinebreaks to true in my SeparatorParameters it fixes the problem.

But, this was really non-obvious to me, and it seems strange that rapidcsv doesn't do any validation when parsing to catch that a row has the wrong number of cells and then assumes in GetColumn() that the number will be correct. The way the behavior currently works makes it seem like there is a problem with GetColumn(), when really the problem is with the source data.

I would suggest, in ParseCsv(std::istream& pStream, std::streamsize p_FileLength), some kind of check whenever mData.push_back(row) is about to be called to verify that row.size() == GetColumnCount() (or similar), and if it doesn't then an exception could be thrown. That would help identify what the problem really is (whether it's the result of a newline or just bad data) rather than having parsing apparently succeed but then unexpected errors happen when the results are used.

The text was updated successfully, but these errors were encountered:

d99kris · 2020-11-05T13:46:44Z

Hi @jpownby - thanks for reporting this issue and providing a very clear issue description!

It's definitely a valid feedback, and I'll see what I can do. My main concern is that rapidcsv will need to make assumptions about the use-case in order to provide more detailed error messages, but perhaps there's some way around it.

I'll update here again once I've had some time to look at this.

jpownby · 2020-11-05T16:52:49Z

Here's an example of the kind of error message that I think would be helpful:

if (!mData.empty() && (row.size() != GetColumnCount()))
{
	std::ostringstream errorMessage;
	errorMessage << "Row #" << mData.size() << " has " << row.size() << " cells (instead of " << GetColumnCount() << ")";
	throw std::invalid_argument(errorMessage.str());
}
mData.push_back(row);

I don't know what use-case assumptions you mean, but an exception like that would have helped me in my situation because I could have gone and looked at the file to see what the problem was. It also would have been helpful to have the problem identified immediately rather than having an exception get thrown later after I had assumed parsing was successful, regardless of what the exception message was.

Hope that helps!

d99kris · 2020-11-06T12:30:40Z

Thanks!

I don't know what use-case assumptions you mean

Yeah I should've been a bit more specific. I think there could be CSV files that does not necessarily have same number of columns on all rows. Definitely not a typical use-case, but I think it could exist. So I wouldn't want to restrict this at parser-level. But the type of check you suggested could of course be added in relevant Get-functions.

I'll play around with it a little and check performance impact, and update here again.

Thanks again for the feedback, it is a good idea.

jpownby · 2020-11-06T15:46:44Z

Oh, ok, I see. Well, in that case, maybe the bug would be that an exception is thrown in GetColumn() if the index given is bigger than a particular row has :)

d99kris · 2021-08-15T12:34:20Z

Yeah, I think that's a reasonable and good idea for rapidcsv to support. It will also has minimal performance impact.

The above commit implements support for this, with exception message on the format requested column index 2 >= 2 (number of columns on row index 1).

homer6 · 2022-06-19T15:40:10Z

Thanks to you both, @d99kris and @jpownby. With this discussion, I was able to sidestep this error with this code:

auto separator_param = rapidcsv::SeparatorParams();
separator_param.mQuotedLinebreaks = true;
rapidcsv::Document csv_doc( input_file, rapidcsv::LabelParams(), separator_param, rapidcsv::ConverterParams(true) );

d99kris self-assigned this Nov 5, 2020

d99kris mentioned this issue Apr 11, 2021

Load from string #91

Closed

d99kris mentioned this issue Aug 15, 2021

std::out_of_range -- rapidcsv thinks some rows don't have all the columns #96

Closed

d99kris closed this as completed in 0d16673 Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing exception when a cell contains a newline #63

Confusing exception when a cell contains a newline #63

jpownby commented Nov 4, 2020

d99kris commented Nov 5, 2020

jpownby commented Nov 5, 2020 •

edited

d99kris commented Nov 6, 2020

jpownby commented Nov 6, 2020

d99kris commented Aug 15, 2021

homer6 commented Jun 19, 2022

Confusing exception when a cell contains a newline #63

Confusing exception when a cell contains a newline #63

Comments

jpownby commented Nov 4, 2020

d99kris commented Nov 5, 2020

jpownby commented Nov 5, 2020 • edited

d99kris commented Nov 6, 2020

jpownby commented Nov 6, 2020

d99kris commented Aug 15, 2021

homer6 commented Jun 19, 2022

jpownby commented Nov 5, 2020 •

edited