-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.3: Remove use of deprecated SparkFilesScan #7106
Conversation
@@ -54,7 +54,7 @@ class SparkFilesScan extends SparkScan { | |||
@Override | |||
protected List<CombinedScanTask> taskGroups() { | |||
if (tasks == null) { | |||
FileScanTaskSetManager taskSetManager = FileScanTaskSetManager.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this class and SparkFilesScanBuilder
completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also drop FileScanTaskSetManager
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, 1.2 is not out yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to merge before 1.2, it is not safe to change SparkFilesScan
. We can migrate the compaction code but the old integration has to be functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to wait until 1.2 is released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this PR should only be merged once 1.2.0 is officially out. Also would be great to delete FileScanTaskSetManager
as part of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in latest commit
@@ -260,11 +260,6 @@ public MetadataColumn[] metadataColumns() { | |||
|
|||
@Override | |||
public ScanBuilder newScanBuilder(CaseInsensitiveStringMap options) { | |||
if (options.containsKey(SparkReadOptions.FILE_SCAN_TASK_SET_ID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either keep this or wait until 1.2 is out and then remove all classes and deprecated vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the class, let's wait after 1.2 release to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM, but we can only merge the PR after 1.2.0 is officially out. Also I think we can delete FileScanTaskSetManager
as part of this PR.
@@ -54,7 +54,7 @@ class SparkFilesScan extends SparkScan { | |||
@Override | |||
protected List<CombinedScanTask> taskGroups() { | |||
if (tasks == null) { | |||
FileScanTaskSetManager taskSetManager = FileScanTaskSetManager.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this PR should only be merged once 1.2.0 is officially out. Also would be great to delete FileScanTaskSetManager
as part of this PR
@@ -97,7 +97,7 @@ public void testBinPackRewrite() throws NoSuchTableException, IOException { | |||
FileRewriteCoordinator rewriteCoordinator = FileRewriteCoordinator.get(); | |||
Set<DataFile> rewrittenFiles = | |||
taskSetManager.fetchTasks(table, fileSetID).stream() | |||
.map(FileScanTask::file) | |||
.map(t -> ((FileScanTask) t).file()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(t -> ((FileScanTask) t).file()) | |
.map(t -> t.asFileScanTask().file()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks
@@ -167,7 +167,7 @@ public void testSortRewrite() throws NoSuchTableException, IOException { | |||
FileRewriteCoordinator rewriteCoordinator = FileRewriteCoordinator.get(); | |||
Set<DataFile> rewrittenFiles = | |||
taskSetManager.fetchTasks(table, fileSetID).stream() | |||
.map(FileScanTask::file) | |||
.map(t -> ((FileScanTask) t).file()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(t -> ((FileScanTask) t).file()) | |
.map(t -> t.asFileScanTask().file()) |
@@ -243,7 +243,7 @@ public void testCommitMultipleRewrites() throws NoSuchTableException, IOExceptio | |||
Set<DataFile> rewrittenFiles = | |||
fileSetIDs.stream() | |||
.flatMap(fileSetID -> taskSetManager.fetchTasks(table, fileSetID).stream()) | |||
.map(FileScanTask::file) | |||
.map(t -> ((FileScanTask) t).file()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
a916484
to
939d6c6
Compare
Let me take another look. |
Thanks, @szehon-ho! Thanks for reviewing, @nastra! |
Follow up of : #6924