-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32838][SQL] Static partition overwrite could use staging dir insert #35608
Conversation
cc @CHENXCHEN |
gentle ping @cloud-fan @HyukjinKwon @viirya @dongjoon-hyun This is a long term issue. and current code is an easy and reasonable way to resolve this problem. Hope for your reviews. Many spark user encounter this issue. ccc @TongWei1105 |
ping @cloud-fan |
We also encounter this issue for partitioned table(maybe converted from HiveTableRelation). Here change |
Why In hive we won't meet such issue is because we use staging dir.
What are you trying to express here? |
DynamicPartitionOverwrite does not delete data before job begin, so it will not encountered verify problem. Since we use directly outputCommitter to write files. For the case |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
@AngersZhuuuu @cloud-fan We are facing this issue with Spark 3.2. Can we get this fix merged? |
What changes were proposed in this pull request?
Currently, we verify path in DataSourceAnalysis
For static partition insert and read data form same table, it's really a normal case. This bug troubles user a lot.
In this pr, for static partition insert, we can use same logical like dynamic partition overwrite to avoid this issue.
Why are the changes needed?
Support more ETL case
Does this PR introduce any user-facing change?
After this patch, user can:
How was this patch tested?
Added UT