Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path #31449

Closed
wants to merge 2 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

Why are the changes needed?

The test DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths is failing conditionally, depending on the length of the temp directory.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Modified UTs explain the missing points, which also do the test.

@@ -122,24 +123,45 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
test("SPARK-31793: FileSourceScanExec metadata should contain limited file paths") {
withTempPath { path =>
val dir = path.getCanonicalPath

// create a sub-directory with long name so that each root path will always exceed the limit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just ensure we test on path truncation; other cases should be tested against UtilsSuite.

@HeartSaVioR HeartSaVioR requested review from HyukjinKwon, gengliangwang and maropu and removed request for HyukjinKwon February 3, 2021 04:48
@HeartSaVioR
Copy link
Contributor Author

Sorry there looked to be some case I haven't taken care of. (The log file doesn't say about the actual path, but I'm suspecting the difference between path without schema vs with schema - file:///.)

Instead of struggling with thinking of all possible cases, I think it's more appropriate if we ensure the test always verifies the case path truncation happens and only one path is accepted. Other cases are covered by UtilsSuite.

Please take a look at this again. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39395/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39395/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Test build #134807 has finished for PR 31449 at commit 5f37003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// create a sub-directory with long name so that each root path will always exceed the limit
// this is to ensure we always test the case for the path truncation
// 110 = limit 100 + margin 10 to clearly avoid edge case
val dataDir = if (dir.length >= 110) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit comment; for test simplicity, how about always creating a new dir with fixed-length long name (e.g., 110?) instead of checking the length (if (dir.length >= 110))?

        val tooLongDirName = Random.alphanumeric.take(110).toList.mkString
        val f = new File(path, tooLongDirName)
        f.mkdir()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about the possible glitch on long long path (like exceeding 255 chars), but I agree that should be simpler if there's no problem on the length of path. Do you have any idea on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. If so, the current one looks fine. Could you leave some comments there about the limit? I didn't know it and it seems the max name length is 255 and the max path length is 4096? (I googled it)

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the actual problem; probably just worrying too much (that's why I said "not 100% sure"). Let's simplify as you suggested and revisit if we trigger any glitch.

@maropu
Copy link
Member

maropu commented Feb 3, 2021

To check if we can resolve the flakiness, could you run GA tests multiple times (3~4?) before merging it?

@HeartSaVioR
Copy link
Contributor Author

Both Jenkins and GA are happy. Will retrigger both again. (Leaving the fact as comment because GA doesn't leave the build result to the comment.)

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Test build #134822 has started for PR 31449 at commit 5f37003.

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39408/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39408/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39410/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39410/

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Test build #134823 has finished for PR 31449 at commit c6cd289.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HeartSaVioR
Copy link
Contributor Author

OK second run for Jenkins / GA passed. I'll execute another run and check tomorrow morning, and merge if the next run also passes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Test build #134838 has finished for PR 31449 at commit c6cd289.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Feb 3, 2021

Jenkins failed, but not from related UTs. Let me retrigger one. (GA is still running, looks like the workflow was started less than 2 hours?)

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@HeartSaVioR
Copy link
Contributor Author

GA failed but also not from related UT. retriggering...

@SparkQA
Copy link

SparkQA commented Feb 3, 2021

Test build #134846 has finished for PR 31449 at commit c6cd289.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

OK Jenkins/GA passed. Merging...

HeartSaVioR added a commit that referenced this pull request Feb 3, 2021
…e length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes #31449 from HeartSaVioR/SPARK-34326-v2.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 44dcf00)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@HeartSaVioR
Copy link
Contributor Author

Thanks for the quick reviews! Merged into master/branch-3.1.

@HeartSaVioR
Copy link
Contributor Author

Will monitor the GA run for master/branch-3.1.

@HyukjinKwon
Copy link
Member

Nice, thanks for fixing it @HeartSaVioR

@HeartSaVioR HeartSaVioR deleted the SPARK-34326-v2 branch February 4, 2021 01:30
@maropu
Copy link
Member

maropu commented Feb 5, 2021

Thanks~, @HeartSaVioR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants