Add file stats range optimizations for DeleteFileIndex#1338
Add file stats range optimizations for DeleteFileIndex#1338aokolnychyi merged 11 commits intoapache:masterfrom
Conversation
e9d0806 to
85128bb
Compare
|
|
||
| boolean dropStats = ManifestReader.dropStats(dataFilter, columns); | ||
| if (!deleteFiles.isEmpty()) { | ||
| select(Streams.concat(columns.stream(), ManifestReader.STATS_COLUMNS.stream()).collect(Collectors.toList())); |
There was a problem hiding this comment.
This was needed to ensure the stats columns are projected for data files when there are delete files, even if the stats columns were not requested by the caller.
| private final Iterator<T> items; | ||
| private boolean closed; | ||
| private boolean hasNext; | ||
| private boolean nextReady; |
There was a problem hiding this comment.
This fixes the filter with reused containers, like GenericRecord. The advance call in next would replace values in a reused row, which would in effect return the next matching row.
| Preconditions.checkState(createWriterFunc != null, | ||
| "Cannot create delete file with deletes rows unless createWriterFunc is set"); | ||
|
|
||
| if (rowSchema != null && createWriterFunc != null) { |
There was a problem hiding this comment.
Checking for createWriterFunc here is needed because forTable sets the row schema. So rows can be included in position deletes when the writer func is added and when there is a row schema.
openinx
left a comment
There was a problem hiding this comment.
The patch looks good to me overall, just left several comments.
| this.keyMetadata = toCopy.keyMetadata == null ? null : Arrays.copyOf(toCopy.keyMetadata, toCopy.keyMetadata.length); | ||
| this.splitOffsets = toCopy.splitOffsets == null ? null : | ||
| Arrays.copyOf(toCopy.splitOffsets, toCopy.splitOffsets.length); | ||
| this.equalityIds = toCopy.equalityIds != null ? Arrays.copyOf(toCopy.equalityIds, toCopy.equalityIds.length) : null; |
| Preconditions.checkState(createWriterFunc != null, | ||
| "Cannot create delete file with deletes rows unless createWriterFunc is set"); | ||
|
|
||
| if (rowSchema != null && createWriterFunc != null) { |
There was a problem hiding this comment.
Why change this ? IMO, if rowSchema is not null and createWriteFunc is null, we should throw exception, rather than going to the delete files without row path ?
There was a problem hiding this comment.
Using forTable sets the row schema automatically, so we can't determine that the user intended to write rows that way.
|
Looks good to me too, some minor comments. |
This adds two optimizations for
DeleteFileIndex:file_pathcolumn, ignore the delete file