Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FilteredManifest and ManifestGroup #735

Merged
merged 5 commits into from Jan 27, 2020

Conversation

rdsr
Copy link
Contributor

@rdsr rdsr commented Jan 13, 2020

Changes

  1. liveEntries() and iterator() are now sharing underlying code
  2. Column stats are always projected when querying either allEntries() or liveEntries() or iterator()
    1. ManifestGroup now relies on a builder
  3. allEntries(), liveEntries() and iterator() does not reuse containers and creates a copy. If this is too expensive in some case we can have a option to reuseContainers

These changes will affect #315 as well

Update

  1. Column stats now projected in allEntries() or liveEntries() or iterator() if required for metrics evaluation
  2. Reverted to old behavior of allEntries(), liveEntries() do not create copies of entries whereas iterator() does

@rdblue
Copy link
Contributor

rdblue commented Jan 15, 2020

Thanks, @rdsr! I'll review this as soon as I can to unblock #315.

@rdsr rdsr force-pushed the refactor_filtered_manifest branch from d8627c9 to e36373a Compare January 19, 2020 04:00
@rdsr
Copy link
Contributor Author

rdsr commented Jan 19, 2020

Thanks for the review @rdblue! . I addressed your comments!

@rdsr rdsr requested a review from rdblue January 19, 2020 04:05
@rdsr rdsr force-pushed the refactor_filtered_manifest branch from e36373a to 4f8d017 Compare January 19, 2020 04:16
@rdsr
Copy link
Contributor Author

rdsr commented Jan 23, 2020

Any thoughts on the updated PR @rdblue ?

@rdblue
Copy link
Contributor

rdblue commented Jan 23, 2020

@rdsr, thanks for the update! I think it's close, but we need to still avoid a copy. Also, I think there is a better way to update ManifestGroup that requires fewer changes and accomplishes the same thing -- just make it mutable.

@rdsr rdsr force-pushed the refactor_filtered_manifest branch from 6e70203 to dd34f48 Compare January 23, 2020 23:34
@rdblue
Copy link
Contributor

rdblue commented Jan 24, 2020

Thanks @rdsr! Just one remaining issue with the stats column projection and I think we're ready to get this in.

}

/**
* @return an Iterator of DataFile. Makes defensive copies of files before returning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// Make sure we have all stats columns for metrics evaluator
return rowFilter != Expressions.alwaysTrue() &&
!columns.containsAll(ManifestReader.ALL_COLUMNS) &&
!columns.containsAll(STATS_COLUMNS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how straight-forward this is now.

@rdblue
Copy link
Contributor

rdblue commented Jan 27, 2020

@rdsr, looks good to me! I'll wait until tests pass and merge it.

Thanks for the quick update!

@rdblue rdblue merged commit c73b657 into apache:master Jan 27, 2020
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Apr 20, 2020
rdblue pushed a commit to rdblue/iceberg that referenced this pull request May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants