-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34926][SQL] PartitioningUtils.getPathFragment() should respect partition value is null #32018
Conversation
ping @MaxGekk Since your pr make partition value support value as |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136787 has finished for PR 32018 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that it is right fix. I guess, null
should be replaced by "__HIVE_DEFAULT_PARTITION__"
like:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
Lines 122 to 127 in 0494dc9
def getPartitionPathString(col: String, value: String): String = { | |
val partitionString = if (value == null || value.isEmpty) { | |
DEFAULT_PARTITION_NAME | |
} else { | |
escapePathName(value) | |
} |
Looks like we should handle it in spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala Lines 351 to 355 in 0494dc9
|
What confused me is that
The path can be c=null, for current code, which case the path will be null and which case it can be |
The DSv2 impl should handle |
Thanks you a lot for clarify this problem. This make me confused for a long time. |
I know Why I am confused, I test this in spark 3.0, this behavior has been change to keep consistence. |
ping @MaxGekk Updated, current code should be ok since and I have checked that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngersZhuuuu Can you add a test for the changes?
@@ -350,7 +350,12 @@ object PartitioningUtils { | |||
*/ | |||
def getPathFragment(spec: TablePartitionSpec, partitionSchema: StructType): String = { | |||
partitionSchema.map { field => | |||
escapePathName(field.name) + "=" + escapePathName(spec(field.name)) | |||
val value = if (spec(field.name) == null || spec(field.name).isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-use existing function or if not, could you extract common code?
val value = if (spec(field.name) == null || spec(field.name).isEmpty) { | ||
DEFAULT_PARTITION_NAME | ||
} else { | ||
escapePathName(spec(field.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, look up time to spec
is not big deal but I would store spec(field.name)
to a val.
Kubernetes integration test starting |
UT added and it from the case https://issues.apache.org/jira/browse/SPARK-24937, without current change it will failed
|
Also cc @wangyum |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136821 has finished for PR 32018 at commit
|
Test build #136816 has finished for PR 32018 at commit
|
Test build #136817 has finished for PR 32018 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @AngersZhuuuu could you update PR's description.
DOne |
GA are failing on Avro tests, for instance. And jenkins build failed on the latest commit. @AngersZhuuuu To continue with the fix, let's re-trigger tests. Also @cloud-fan could you look at this PR since you reviewed previous changes related to null part values. |
jenkins, retest this, please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
+1, LGTM. Merging to master. The failed GA is a known issue. |
BTW, @AngersZhuuuu does the issue exist in 3.1/3.0/2.4? If so, please, backport the changes. |
Need to check. I will update here after check this |
Test build #136841 has finished for PR 32018 at commit
|
LGTM2 |
Checked all branch, we need to backport to branch-3.0/branch-3.1. Should I raise separated pr or you can just merge to that branchs? |
Could you open separate PRs per each branch, please. |
OK, I will ping you when finish these things. |
What changes were proposed in this pull request?
When we insert data into a partition table partition with empty DataFrame. We will call
PartitioningUtils.getPathFragment()
then to update this partition's metadata too.
When we insert to a partition when partition value is
null
, it will throw exception likePartitioningUtils.getPathFragment()
should supportnull
value tooWhy are the changes needed?
Fix bug
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UT