Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32526][SQL]Fix some test cases of sql/catalyst module in scala 2.13 #29370

Closed

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Aug 6, 2020

What changes were proposed in this pull request?

The purpose of this pr is to partial resolve SPARK-32526, total of 88 failed and 2 aborted test cases were fixed, the related suite as follow:

  • DataSourceV2AnalysisBaseSuite related test cases (71 FAILED -> Pass)
  • TreeNodeSuite (1 FAILED -> Pass)
  • MetadataSuite (1 FAILED -> Pass)
  • InferFiltersFromConstraintsSuite (3 FAILED -> Pass)
  • StringExpressionsSuite (1 FAILED -> Pass)
  • JacksonParserSuite (1 FAILED -> Pass)
  • HigherOrderFunctionsSuite (1 FAILED -> Pass)
  • ExpressionParserSuite (1 FAILED -> Pass)
  • CollectionExpressionsSuite (6 FAILED -> Pass)
  • SchemaUtilsSuite (2 FAILED -> Pass)
  • ExpressionSetSuite (ABORTED -> Pass)
  • ArrayDataIndexedSeqSuite (ABORTED -> Pass)

The main change of this pr as following:

  • Optimizer and Analyzer are changed to pass compile, ArrayBuffer is not a Seq in scala 2.13, call toSeq method manually to compatible with Scala 2.12

  • m.mapValues().view.force pattern return a Map in scala 2.12 but return a IndexedSeq in scala 2.13, call toMap method manually to compatible with Scala 2.12. TreeNode are changed to pass DataSourceV2AnalysisBaseSuite related test cases and TreeNodeSuite failed case.

  • call toMap method of Metadata#hash method case map branch because map.mapValues return Map in Scala 2.12 and return MapView in Scala 2.13.

  • impl contact method of ExpressionSet in Scala 2.13 version refer to ExpressionSet in Scala 2.12 to support + + method conform to ExpressionSet semantics

  • GenericArrayData not accept ArrayBuffer input, call toSeq when use ArrayBuffer construction GenericArrayData for Scala version compatibility

  • Call toSeq in RandomDataGenerator#randomRow method to ensure contents of fields is Seq not ArrayBuffer

  • Call toSeq Let JacksonParser#parse still return a Seq because the check method of JacksonParserSuite#"skipping rows using pushdown filters" dependence on Seq type

  • Call toSeq in AstBuilder#visitFunctionCall, otherwise ctx.argument.asScala.map(expression) is Buffer in Scala 2.13

  • Add a LongType match to ArraySetLike.nullValueHolder

  • Add a sorted to ensure duplicateColumns string in SchemaUtils.checkColumnNameDuplication method error message have a deterministic order

Why are the changes needed?

We need to support a Scala 2.13 build.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Scala 2.12: Pass the Jenkins or GitHub Action

  • Scala 2.13: Do the following:

dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/catalyst -Pscala-2.13 -am
mvn test -pl sql/catalyst -Pscala-2.13

Before

Tests: succeeded 3853, failed 103, canceled 0, ignored 6, pending 0
*** 3 SUITES ABORTED ***
*** 103 TESTS FAILED ***

After

Tests: succeeded 4035, failed 17, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 15 TESTS FAILED ***

@LuciferYang LuciferYang changed the title [SPARK-32526][SQL]Pass DataSourceV2AnalysisBaseSuite and TreeNodeSuite in scala 2.13 [SPARK-32526][SQL]Pass DataSourceV2AnalysisBaseSuite and TreeNodeSuite related test cases in scala 2.13 Aug 6, 2020
@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 6, 2020

Thanks. You don't need to fix everything in one shot. Please group the failures and fix the one you think this is related to many others. You can proceed by suite by suite.

@dongjoon-hyun, this pr try to fix DataSourceV2AnalysisBaseSuite and TreeNodeSuite related test cases in scala 2.13 of sql/catalyst module, after this pr the failed cases count from 103 decrease to; 31, but no aborted cases fixed. Can you give a review? thx ~

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 6, 2020

cc @srowen from the numbers alone, this pr fixed about 70% failed cases of sql/catalyst module, do we accept partial fixed and fix the remaining cases in other pr?

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 6, 2020

I will update failure list to JIRA after this pr later

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 6, 2020

details of remaining failed cases already updated to the attachment and added comments in SPARK-32526

@srowen
Copy link
Member

srowen commented Aug 6, 2020

This change itself looks fine. If you think the rest of the changes in sql / catalyst are similar, I'd make them in one go. If there are difficult or logically distinct other fixes, that can be another PR. They can all be attached to one JIRA

@HyukjinKwon
Copy link
Member

Yeah, let's try to do it in one go.

@LuciferYang
Copy link
Contributor Author

@HyukjinKwon @srowen ok ~ let me give it a try~

@dongjoon-hyun
Copy link
Member

ok to test

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 7, 2020

The main change of Address beb3928 is call toMap method of Metadata#hash method case map branch because map.mapValues return Map in Scala 2.12 and return MapView in Scala 2.13.

The effect as follow:

  • Let ExpressionSetSuite no longer ABORTED but add 1 failed case because of ExpressionSet not implement ++ method

  • Pass MetadataSuite

Before

Tests: succeeded 3925, failed 31, canceled 0, ignored 6, pending 0
*** 3 SUITES ABORTED ***
*** 31 TESTS FAILED ***

After

Tests: succeeded 3973, failed 31, canceled 0, ignored 6, pending 0
*** 2 SUITES ABORTED ***
*** 31 TESTS FAILED ***

@SparkQA
Copy link

SparkQA commented Aug 7, 2020

Test build #127179 has finished for PR 29370 at commit beb3928.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

…etSuite and InferFiltersFromConstraintsSuite
@LuciferYang
Copy link
Contributor Author

The main change of Address eeeb28a impl contact method of ExpressionSet in Scala 2.13 version refer to ExpressionSet in Scala 2.12

The effect as follow:

  • Pass ExpressionSetSuite

  • Pass InferFiltersFromConstraintsSuite

Before

Tests: succeeded 3973, failed 31, canceled 0, ignored 6, pending 0
*** 2 SUITES ABORTED ***
*** 31 TESTS FAILED ***

After

Tests: succeeded 3973, failed 31, canceled 0, ignored 6, pending 0
*** 2 SUITES ABORTED ***
*** 27 TESTS FAILED ***

@SparkQA
Copy link

SparkQA commented Aug 7, 2020

Test build #127188 has finished for PR 29370 at commit eeeb28a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 8, 2020

Looking good. I think you're welcome to keep going and we can 'checkpoint' and commit some fixes whenever there is a logical and large enough group of them to commit. That could be now too, up to you.

@LuciferYang
Copy link
Contributor Author

The main change of Address 4018b1f as follow:

  • GenericArrayData not accept ArrayBuffer input, call toSeq for Scala version compatibility

  • Call toSeq in RandomDataGenerator#randomRow method to ensure contents of fields is Seq not ArrayBuffer

  • Call toSeq Let JacksonParser#parse still return a Seq because the check method of JacksonParserSuite#"skipping rows using pushdown filters" dependence on Seq type

The effect as follow:

  • Let ArrayDataIndexedSeqSuite no longer ABORTED and all pass

  • Pass StringExpressionsSuite

  • Pass JacksonParserSuite

  • Pass HigherOrderFunctionsSuite

  • Partial Pass CollectionExpressionsSuite , failed case from 6 to 3

Before

Tests: succeeded 3973, failed 31, canceled 0, ignored 6, pending 0
*** 2 SUITES ABORTED ***
*** 27 TESTS FAILED ***

After

Tests: succeeded 4035, failed 21, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 21 TESTS FAILED ***

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 10, 2020

@srowen agree with u, we can make a 'checkpoint' if 4018b1f no problem, and I will update the PR description.

I found that It is difficult to fix the cases caused by WrappedArray.

For example, WrappedArray.make method return WrappedArray in Scala 2.12 and return ArraySeq in Scala 2.13, then in Scala 2.13 Row.getSeq method can not convert ArraySeq to Seq type, the failed cases in RowEncoderSuite related to this problem.

@LuciferYang LuciferYang changed the title [SPARK-32526][SQL]Pass DataSourceV2AnalysisBaseSuite and TreeNodeSuite related test cases in scala 2.13 [SPARK-32526][SQL]Fix some test cases of sql/catalyst module in scala 2.13 Aug 10, 2020
@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127260 has finished for PR 29370 at commit 4018b1f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 10, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2020

Test build #127283 has finished for PR 29370 at commit 4018b1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 11, 2020

If there aren't any more fixes that are similar here, I can merge this. We can add more now if there are others that are easily fixable in SQL.

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127356 has started for PR 29370 at commit db90b1a.

@LuciferYang
Copy link
Contributor Author

@srowen There are still some fixes left to commit and will be done today

@LuciferYang
Copy link
Contributor Author

The main change of Address db90b1a as follow:

  • Call toSeq in AstBuilder#visitFunctionCall, otherwise ctx.argument.asScala.map(expression) is Buffer in Scala 2.13

The effect as follow:

  • Pass ExpressionParserSuite

Before

Tests: succeeded 4035, failed 21, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 21 TESTS FAILED ***

After

Tests: succeeded 4035, failed 20, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 20 TESTS FAILED ***

@LuciferYang
Copy link
Contributor Author

The main change of Address fca4fa7 as follow:

  • Add a LongType match to ArraySetLike.nullValueHolder to resolve problem as follow:
java.lang.Integer cannot be cast to java.lang.Long
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
	at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:103)
	at scala.collection.mutable.ArrayBuilder$ofLong.addOne(ArrayBuilder.scala:321)
	at scala.collection.mutable.Growable.$plus$eq(Growable.scala:38)
	at scala.collection.mutable.Growable.$plus$eq$(Growable.scala:38)
	at scala.collection.mutable.ArrayBuilder.$plus$eq(ArrayBuilder.scala:23)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.ArrayUnion_0$(Unknown Source)

The effect as follow:

  • Pass CollectionExpressionsSuite

Before

Tests: succeeded 4035, failed 20, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 20 TESTS FAILED ***

After

Tests: succeeded 4035, failed 17, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 17 TESTS FAILED ***

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127360 has started for PR 29370 at commit fca4fa7.

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127364 has started for PR 29370 at commit 5328d33.

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Aug 12, 2020

The main change of Address 5328d33 is add a sort

  • Add a sorted to ensure duplicateColumns string in SchemaUtils#checkColumnNameDuplication method error message have a deterministic order because IterableOps.groupBy in Scala 2.13 and TraversableLike.groupBy in Scala 2.12 have diffent result order

The effect as follow:

  • Pass SchemaUtilsSuite

Before

Tests: succeeded 4035, failed 17, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 17 TESTS FAILED ***

After

Tests: succeeded 4035, failed 17, canceled 0, ignored 6, pending 0
*** 1 SUITE ABORTED ***
*** 15 TESTS FAILED ***

@LuciferYang
Copy link
Contributor Author

@srowen After 5328d33, I think that there are no more similar problems that can be fixed.

The reasons for the remaining failure cases can be divided into two categories:

  1. Need to deal with the pattern matching problem of 'ArraySeq' and 'ArrayBuffer' except 'StarJoinCostBasedReorderSuite`

  2. `StarJoinCostBasedReorderSuite' a little special, the plan that caused the failure maybe also correct because it has the same cost as the expected plan

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127367 has finished for PR 29370 at commit 75d85ee.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 12, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127385 has finished for PR 29370 at commit 75d85ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 6ae2cb2 Aug 13, 2020
@srowen
Copy link
Member

srowen commented Aug 13, 2020

Merged to master. I didn't resolve the JIRA yet because there are a few more tests to fix.
Good work! let's keep going.

@HyukjinKwon
Copy link
Member

Nice to get this being fixed!

@LuciferYang
Copy link
Contributor Author

@srowen @HyukjinKwon I will try to give a new pr to resolve rest problems

@LuciferYang LuciferYang deleted the fix-DataSourceV2AnalysisBaseSuite branch June 6, 2022 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants