Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: apply rewrite manifest action fix to 3.1,3.2 #7296

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

bryanck
Copy link
Contributor

@bryanck bryanck commented Apr 7, 2023

This PR applies the changes from #7263 to Spark v3.2 and v3.1 also. It marks the unused method SparkUtil.serializableFileIO() as deprecated, as using this to broadcast a FileIO can lead to unintended consequences, i.e. the underlying S3 client being closed during broadcast removal.

@github-actions github-actions bot added the spark label Apr 7, 2023
@@ -77,6 +77,8 @@ public class SparkUtil {

private SparkUtil() {}

/** @deprecated will be removed in 1.4.0 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Maybe we can put a comment on why, and what the user can call.

Actually, I had a question, that I did not have a chance to ask in original pr: #7263, do we still need to close fileIO, at the end of action? Is that getting handled somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question and it impacts Spark jobs in general, not just this action, thus this is consistent with Iceberg behavior in general AFAIK. There has been some discussion on handling resource cleanup better, @danielcweeks @rdblue do you have any insights into that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this and added comments about the deprecation.

Copy link
Collaborator

@szehon-ho szehon-ho Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I see the SparkActions takes the table as argument, maybe it could be the responsibility of user then to close it, though it is a bit undocumented.

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well ! Thanks @bryanck.

@szehon-ho szehon-ho merged commit 92ef136 into apache:master Apr 11, 2023
21 checks passed
@szehon-ho
Copy link
Collaborator

Merged, thanks @bryanck , and others for review

ericlgoodman pushed a commit to ericlgoodman/iceberg that referenced this pull request Apr 12, 2023
manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants