-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: apply rewrite manifest action fix to 3.1,3.2 #7296
Conversation
@@ -77,6 +77,8 @@ public class SparkUtil { | |||
|
|||
private SparkUtil() {} | |||
|
|||
/** @deprecated will be removed in 1.4.0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Maybe we can put a comment on why, and what the user can call.
Actually, I had a question, that I did not have a chance to ask in original pr: #7263, do we still need to close fileIO, at the end of action? Is that getting handled somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question and it impacts Spark jobs in general, not just this action, thus this is consistent with Iceberg behavior in general AFAIK. There has been some discussion on handling resource cleanup better, @danielcweeks @rdblue do you have any insights into that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this and added comments about the deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I see the SparkActions takes the table as argument, maybe it could be the responsibility of user then to close it, though it is a bit undocumented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well ! Thanks @bryanck.
Merged, thanks @bryanck , and others for review |
apache#7263) (apache#7296) This change backports PR apache#7263 to Spark 3.1 and 3.2
apache#7263) (apache#7296) This change backports PR apache#7263 to Spark 3.1 and 3.2
This PR applies the changes from #7263 to Spark v3.2 and v3.1 also. It marks the unused method
SparkUtil.serializableFileIO()
as deprecated, as using this to broadcast a FileIO can lead to unintended consequences, i.e. the underlying S3 client being closed during broadcast removal.