Skip to content

[SPARK-36861][SQL] Use yyyy-MM-dd as the date pattern in partition discovery#34700

Closed
MaxGekk wants to merge 3 commits intoapache:masterfrom
MaxGekk:fix-infer-of-date-part
Closed

[SPARK-36861][SQL] Use yyyy-MM-dd as the date pattern in partition discovery#34700
MaxGekk wants to merge 3 commits intoapache:masterfrom
MaxGekk:fix-infer-of-date-part

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 24, 2021

What changes were proposed in this pull request?

In the PR, I propose to explicitly set the date pattern to yyyy-MM-dd while inferring types of partition values.

Why are the changes needed?

The existing date partition parser is much more tolerant to its input, and can skip some parts of date strings. For example, see SPARK-36861. As a consequence, it can loose some user's info (pieces of partition values).

Does this PR introduce any user-facing change?

No. New behaviour introduced by #33709 hasn't released yet.

How was this patch tested?

By running the modified test suite:

$ build/sbt "test:testOnly *ParquetV2PartitionDiscoverySuite"
$ build/sbt "test:testOnly *ImageFileFormatSuite"

@MaxGekk MaxGekk requested a review from cloud-fan November 24, 2021 20:43
@github-actions github-actions bot added the SQL label Nov 24, 2021
@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50062/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50062/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145590 has finished for PR 34700 at commit c80127c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the ML label Nov 25, 2021
.collect()

assert(Set(result: _*) === Set(
Row("29.5.a_b_EGDP022204.jpg", "kittens", Date.valueOf("2018-01-01")),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted changes made by https://github.com/apache/spark/pull/33709/files#r688851936. Now the test looks the same as in branch-3.2.

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50090/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50090/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2c69f14 Nov 25, 2021
@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145618 has finished for PR 34700 at commit 505f8ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants