Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Ignore split offsets array when split offset is past file length #8925

Merged
merged 1 commit into from Oct 28, 2023

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Oct 26, 2023

Follow up from a miss in the fix done in #8860. There's an additional splitOffsetArray method which gets used here

return ((BaseFile<?>) file).splitOffsetArray();
. We would want to surface null for this array in case we know the split offsets are corrupted and invalid (stemming from an issue in 1.4.0). The split offsets are invalid when the final offset is beyond the end of the file.

Alternatively, we can undo the optimization done here: #8336 (comment) and then the original fix in #8860 would be sufficient, and be simpler. Since the optimization seemed impactful and there was community interest in it, this fix retains it and currently just fixes both cases explicitly.

@rdblue
Copy link
Contributor

rdblue commented Oct 26, 2023

@amogh-jahagirdar, maybe this time we should create a test to validate FileScanTask.split when there are bad split offsets? I think that would have caught this in the last PR.

@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-split-offsets-2 branch 5 times, most recently from 114387d to 0ff2819 Compare October 27, 2023 20:47
@amogh-jahagirdar
Copy link
Contributor Author

@amogh-jahagirdar, maybe this time we should create a test to validate FileScanTask.split when there are bad split offsets? I think that would have caught this in the last PR.

Done, added a new test which will exercise both splitOffsetArray and splitOffsets. This will also validate the number of tasks per file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants