Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.3: Remove use of deprecated SparkFilesScan #7106

Merged
merged 3 commits into from
Mar 21, 2023

Conversation

szehon-ho
Copy link
Collaborator

Follow up of : #6924

@@ -54,7 +54,7 @@ class SparkFilesScan extends SparkScan {
@Override
protected List<CombinedScanTask> taskGroups() {
if (tasks == null) {
FileScanTaskSetManager taskSetManager = FileScanTaskSetManager.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this class and SparkFilesScanBuilder completely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also drop FileScanTaskSetManager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, 1.2 is not out yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to merge before 1.2, it is not safe to change SparkFilesScan. We can migrate the compaction code but the old integration has to be functional.

Copy link
Collaborator Author

@szehon-ho szehon-ho Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to wait until 1.2 is released

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this PR should only be merged once 1.2.0 is officially out. Also would be great to delete FileScanTaskSetManager as part of this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest commit

@@ -260,11 +260,6 @@ public MetadataColumn[] metadataColumns() {

@Override
public ScanBuilder newScanBuilder(CaseInsensitiveStringMap options) {
if (options.containsKey(SparkReadOptions.FILE_SCAN_TASK_SET_ID)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either keep this or wait until 1.2 is out and then remove all classes and deprecated vars.

Copy link
Collaborator Author

@szehon-ho szehon-ho Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the class, let's wait after 1.2 release to merge

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, but we can only merge the PR after 1.2.0 is officially out. Also I think we can delete FileScanTaskSetManager as part of this PR.

@@ -54,7 +54,7 @@ class SparkFilesScan extends SparkScan {
@Override
protected List<CombinedScanTask> taskGroups() {
if (tasks == null) {
FileScanTaskSetManager taskSetManager = FileScanTaskSetManager.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this PR should only be merged once 1.2.0 is officially out. Also would be great to delete FileScanTaskSetManager as part of this PR

@@ -97,7 +97,7 @@ public void testBinPackRewrite() throws NoSuchTableException, IOException {
FileRewriteCoordinator rewriteCoordinator = FileRewriteCoordinator.get();
Set<DataFile> rewrittenFiles =
taskSetManager.fetchTasks(table, fileSetID).stream()
.map(FileScanTask::file)
.map(t -> ((FileScanTask) t).file())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(t -> ((FileScanTask) t).file())
.map(t -> t.asFileScanTask().file())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks

@@ -167,7 +167,7 @@ public void testSortRewrite() throws NoSuchTableException, IOException {
FileRewriteCoordinator rewriteCoordinator = FileRewriteCoordinator.get();
Set<DataFile> rewrittenFiles =
taskSetManager.fetchTasks(table, fileSetID).stream()
.map(FileScanTask::file)
.map(t -> ((FileScanTask) t).file())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(t -> ((FileScanTask) t).file())
.map(t -> t.asFileScanTask().file())

@@ -243,7 +243,7 @@ public void testCommitMultipleRewrites() throws NoSuchTableException, IOExceptio
Set<DataFile> rewrittenFiles =
fileSetIDs.stream()
.flatMap(fileSetID -> taskSetManager.fetchTasks(table, fileSetID).stream())
.map(FileScanTask::file)
.map(t -> ((FileScanTask) t).file())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@szehon-ho szehon-ho closed this Mar 15, 2023
@szehon-ho szehon-ho reopened this Mar 15, 2023
@aokolnychyi
Copy link
Contributor

Let me take another look.

@aokolnychyi aokolnychyi merged commit f8072ba into apache:master Mar 21, 2023
@aokolnychyi
Copy link
Contributor

Thanks, @szehon-ho! Thanks for reviewing, @nastra!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants