You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
will skip levels for both repeated fields and non-repeated fields. We want to be able to skip rows for repeated fields, and skipping levels is not that useful.
We want skip(2) to skip the first two rows, so that the next value that we read is 20. However, it will skip the first two levels, and the next value that we read is 10.
Micah Kornfield / @emkornfield:
I think the current signature is Skip(num_rows_to_skip) which is why this is confusing. The docs seem accurate. Given the accurate documents (although they can probably be clarified), I think a new SkipRows method makes sense and we should rename the variable as you suggested.
The implementation of TypedColumnReader::Skip method with signature:
virtual int64_t Skip(int64_t num_levels_to_skip) = 0;
will skip levels for both repeated fields and non-repeated fields. We want to be able to skip rows for repeated fields, and skipping levels is not that useful.
For example, for the following rows:
message M { repeated int32 b = 1 }
rows: {}, {[10,10]}, {[20, 20, 20]}
values = {10, 10, 20, 20, 20};
def_levels = {0, 1, 1, 1, 1, 1};
rep_levels = {0, 0, 1, 0, 1, 1};
We want skip(2) to skip the first two rows, so that the next value that we read is 20. However, it will skip the first two levels, and the next value that we read is 10.
Reporter: fatemah / @fatemehp
Note: This issue was originally created as PARQUET-2175. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: