Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the SymlinkTextInputFormat manifest generation for Presto/Athena support #250

Closed
wants to merge 2 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Nov 13, 2019

This PR is the first in the sequence of PRs to add manifest file generation (SymlinkInputFormat) to OSS Delta for Presto/Athena read support (issue #76). Specifically, this PR adds the core functionality for manifest generation and rigorous tests to verify the contents of the manifest. Future PRs will add the public APIs for on-demand generation.

  • Added post-commit hooks to run tasks after a successful commit.

  • Added GenerateSymlinkManifest implementation of post-commit hook to generate the manifests.

    • Each manifest contains the name of data files to read for querying the whole table or partition
    • Non-partitioned table produces a single manifest file containing all the data files.
    • Partitioned table produces partitioned manifest files; same partition structured like the table, each partition directory containing one manifest file containing data files of that partition. This allows Presto/Athena partition-pruned queries to read only manifest files of the necessary partitions.
    • Each attempt to generate manifest will atomically (as much as possible) overwrite the manifest files in the directories (if they exist) and also delete manifest files of partitions that have been deleted from the table.

Co-authored-by: Tathagata Das tathagata.das1565@gmail.com
Co-authored-by: Rahul Mahadev rahul.mahadev@databricks.com

GitOrigin-RevId: 38d797a017fba103aa9750a8e465af8007ab0539

Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com>
Co-authored-by: Rahul Mahadev <rahul.mahadev@databricks.com>
@tdas tdas requested a review from zsxwing November 13, 2019 01:11
@tdas tdas changed the title Added the Symlink Manifest generation to OSS (Presto/Athena support) Added the Symlink Manifest generation for Presto/Athena support (issue #76) Nov 13, 2019
@tdas tdas changed the title Added the Symlink Manifest generation for Presto/Athena support (issue #76) Added the SymlinkTextInputFormat Manifest generation for Presto/Athena support (issue #76) Nov 13, 2019
@tdas tdas changed the title Added the SymlinkTextInputFormat Manifest generation for Presto/Athena support (issue #76) Added the SymlinkTextInputFormat manifest generation for Presto/Athena support (issue #76) Nov 13, 2019
@tdas tdas changed the title Added the SymlinkTextInputFormat manifest generation for Presto/Athena support (issue #76) Added the SymlinkTextInputFormat manifest generation for Presto/Athena support Nov 13, 2019
GitOrigin-RevId: 745f6f222be74e1486ab45e2d04cba254da135f1
@tdas
Copy link
Contributor Author

tdas commented Nov 18, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants