Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path #31435

Closed
wants to merge 1 commit into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

Why are the changes needed?

The test DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths is failing conditionally, depending on the length of the temp directory.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Modified UTs explain the missing points, which also do the test.

val pathsInLocation = location.get.substring(
location.get.indexOf('[') + 1, location.get.indexOf(']')).split(", ").toSeq

// If the temp path length is less than (stop appending threshold - 1), say, 100 - 1 = 99,
Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally excluded the thing on closing bracket (']'), but the closing bracket will be added in any way, regardless of the current location string length. This was also not accounted as well.

@HeartSaVioR
Copy link
Contributor Author

cc.ing @gengliangwang @cloud-fan @HyukjinKwon @maropu who are author/reviewers of #28610

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Test build #134770 has started for PR 31435 at commit 36943c6.

@@ -137,9 +137,24 @@ class DataSourceScanExecRedactionSuite extends DataSourceScanRedactionTest {
assert(location.isDefined)
// The location metadata should at least contain one path
assert(location.get.contains(paths.head))
// If the temp path length is larger than 100, the metadata length should not exceed
// twice of the length; otherwise, the metadata length should be controlled within 200.
assert(location.get.length < Math.max(paths.head.length, 100) * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this can be flaky.

// path.
// (Note we apply subtraction with 1 to count start bracket '['.)
if (paths.head.length < 99) {
assert(pathsInLocation.size >= 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to check that the paths are truncated here.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. Could you please elaborate?

We don't truncate the path itself, right? I think it's also something to be fixed (I'd rather want to see path being truncated with ellipses (...) instead of not adding and leaving it as it is.) but it's more likely bigger fix which may worth another fix instead of test fix.

If you meant counting the number of paths or something for edge-case, dealing with another UT would be easier, like we could simply add edge-cases there in this PR. (And that UT already tests basic functionality.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean simply checking

assert(pathsInLocation.size < paths.size)

But this is a minor comment. You can ignore it since there is UT for it already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense, but let's leave it as it is, as I think I'd propose the another form which would make clear how many elements they are instead of only showing available paths and don't say there're more.

@maropu
Copy link
Member

maropu commented Feb 2, 2021

Nice catch!

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@HeartSaVioR
Copy link
Contributor Author

Just retriggered Github Action as well.

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39369/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39369/

@SparkQA
Copy link

SparkQA commented Feb 2, 2021

Test build #134782 has finished for PR 31435 at commit 36943c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Merging to master/3.1.

HeartSaVioR added a commit that referenced this pull request Feb 2, 2021
…e length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes #31435 from HeartSaVioR/SPARK-34326.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 6386602)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@HeartSaVioR
Copy link
Contributor Author

Thanks all for the quick reviews!

@HeartSaVioR HeartSaVioR deleted the SPARK-34326 branch February 2, 2021 22:38
@HyukjinKwon
Copy link
Member

Hm, the tests started to fail:

[info] - SPARK-31793: FileSourceScanExec metadata should contain limited file paths *** FAILED *** (265 milliseconds)
[info]   1 was not greater than or equal to 2 (DataSourceScanExecRedactionSuite.scala:154)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$9(DataSourceScanExecRedactionSuite.scala:154)
[info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$9$adapted(DataSourceScanExecRedactionSuite.scala:123)
[info]   at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
[info]   at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
[info]   at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:34)
[info]   at org.apache.spark.sql.execution.DataSourceScanExecRedactionSuite.$anonfun$new$8(DataSourceScanExecRedactionSuite.scala:123)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

https://github.com/apache/spark/runs/1818352990
https://github.com/apache/spark/runs/1818908589

@HeartSaVioR, please let me revert this for now since it targeted to fix the test but it started to fail in any event.

@HeartSaVioR
Copy link
Contributor Author

Ah thanks for the check. Looks odd... I'll check it.

@HeartSaVioR
Copy link
Contributor Author

Raised a new PR #31449

1 similar comment
@HeartSaVioR
Copy link
Contributor Author

Raised a new PR #31449

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…e length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes apache#31435 from HeartSaVioR/SPARK-34326.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants