New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] CSV skip wrong rows #26306
Comments
Antoine Pitrou / @pitrou: |
Maciej / @mskrzypkowski: |
Joris Van den Bossche / @jorisvandenbossche: @pitrou when you say that skipping is difficult, is this because if you encounter an error in the value for a certain column, the values are already appended to the builder for the previous columns? |
Antoine Pitrou / @pitrou: |
Joris Van den Bossche / @jorisvandenbossche: In general it might be useful to add some options on how to deal with lines with a wrong number of elements (eg filling with nulls if too few, skipping the extra values if there are too many) |
Antoine Pitrou / @pitrou: Dealing with lines with wrong number of elements wouldn't easier, though the difficulties would reside in the parser. |
Micah Kornfield / @emkornfield:
|
Antoine Pitrou / @pitrou: |
It would be helpful to add another option to ReadOptions which will enable skipping rows with wrong data (e.g. data type mismatch with column type) and continue reading next rows. Wrong rows numbers may be reported at the end of processing.
This way I can deal with the wrongly formatted data or ignore it if I have a large load success rate and I don’t care about the exceptions.
Reporter: Maciej / @mskrzypkowski
Note: This issue was originally created as ARROW-10315. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: