Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split files when planning scan tasks #36

Closed
rdblue opened this issue Dec 7, 2018 · 1 comment
Closed

Split files when planning scan tasks #36

rdblue opened this issue Dec 7, 2018 · 1 comment
Labels
good first issue Good for newcomers

Comments

@rdblue
Copy link
Contributor

rdblue commented Dec 7, 2018

When building a scan, the TableScan API can plan the files to read (planFiles) or group the files into combined splits (planTasks). Split planning should also split files at the target split size before bin packing to create the final splits.

This relates to adding split locations to the manifest file (row group or stripe offsets). The simple version of this issue is to split at the target split size and then combine, but eventually we want to take the split offsets into account if it does make sense to store them in the manifest file.

@rdblue
Copy link
Contributor Author

rdblue commented Jul 6, 2019

Fixed in #111.

@rdblue rdblue closed this as completed Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant