Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show more diverse set of rows in dropped.txt #242

Open
Deep1998 opened this issue Dec 21, 2021 · 0 comments
Open

Show more diverse set of rows in dropped.txt #242

Deep1998 opened this issue Dec 21, 2021 · 0 comments
Labels
p4 P4

Comments

@Deep1998
Copy link
Collaborator

Currently, we keep appending bad rows to conv till we hit the byte limit and then dump them to dropped.txt. When dealing with large tables, usually we end up storing all rows from one table in the dropped.txt because a single issue is occuring across many rows.
There is scope for improvement by adding bad rows from different tables by removing some of the earlier ones, as more rows caused by the same error does not provide more information. It is more efficient to report a few samples of multiple types of bad rows.

@manitgupta manitgupta added the p4 P4 label Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4 P4
Projects
None yet
Development

No branches or pull requests

2 participants