Ensure we check for commented rows when skipping rows for header/data #789
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improves #788. In the original issue, a quote character on a commented
row messes the parsing positioning up because it's looking for a closing
quote character. By checking for and skipping commented rows, no matter
the characters present, we ensure parsing integrity.
One ramification of
this, however, is that commented rows now "no longer count" when
considering row numbers, i.e. when specifying the
header=2
ordatarow=4
keyword arguments, because the commented rows are literallyignored when parsing. This seems fine to me, but probably warrants some
documentation so it's clear.
Update: this PR has been updated from the original approach to count commented/empty rows when specifying a
header
ordatarow
argument; this seems more natural/intuitive (i.e. look at a file, count # of rows, and provide it as an arg), and helps preserve existing behavior (i.e. non-breaking). If a header/datarow keyword arg is provided and that row in the file is commented or empty, we will skip to the first non-commented/non-empty row to parse the header/data.