refine UT framework to promote GPU evaluation, Part 1 #10851

binmahone · 2024-05-21T06:01:14Z

contributes to #10850

changes in this PR:

changed the default way to evaluate an expression, taking into consideration that:

bound reference is not well supported
many of RAPIDS' expressions do not accept vectorized parameters

Change default timezone to UTC to promote more expressions being evaluated in GPU

binmahone · 2024-05-21T06:01:49Z

build

binmahone · 2024-05-21T06:12:07Z

build

binmahone · 2024-05-21T07:07:31Z

build

binmahone · 2024-05-21T08:44:04Z

build

binmahone · 2024-05-21T08:52:48Z

hi @jlowe can you please review the PR?

revans2 · 2024-05-21T12:39:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern")
    .stringConf
    .createWithDefault(false.toString)

+  val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed")


I'm not sure we need this at all. The foldable check was put in place a long time ago when each expression had to have explicit support for scalar values. We refectored the code a while ago and removed that. Just the fact that it works for testing means that the refactor was correct and I think we can just delete the check entirely.

should I do it in this PR or in another separate one?

I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.

this issue will be tracked in #10887

jlowe · 2024-05-21T15:24:43Z

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsSQLTestsBaseTrait.scala

@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait {
        "org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback")
      .set("spark.sql.warehouse.dir", warehouse)
      .set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer")
+      .set("spark.sql.session.timeZone", "UTC")
+      .set("spark.rapids.sql.explain", "ALL")


This will make the tests very verbose. Intentional, or debug code accidentally left in?

both are Intentional.

For "spark.sql.session.timeZone", we find many expressions won't go to GPU routine unless timezone is set to UTC. Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone

For "spark.rapids.sql.explain", since by default tests/src/test/resources/log4j2.properties already set console appender's log level to error, we won't see verbose logs even if spark.rapids.sql.explain is set to ALL here. The benifit of setting it to ALL is, when we want to check&debug test cases, we only need to change log4j2.proerties, instead of changing both log4j2.proerties and adding .set("spark.rapids.sql.explain", "ALL") in code

Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone

I'm not questioning that we need to control the timezone for these tests, but rather the discussion is about how to control the timezone. Normally we don't want hardcoding like this, because now the test cannot check if we're doing the right thing in any timezone other that UTC, and like I said, we do have some timezone support with more on the way. Hardcoding this precludes using these tests to verify we're doing the right thing in any other timezone. Having the test environment setup script control that makes the test reusable.

jlowe · 2024-05-21T15:27:54Z

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsSQLTestsBaseTrait.scala

@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait {
        "org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback")
      .set("spark.sql.warehouse.dir", warehouse)
      .set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer")
+      .set("spark.sql.session.timeZone", "UTC")


Do we really want to force UTC in the tests? We have partial support for timezones and it's likely more will be added later. Typically what we've done in the past is leave it up to the environment setup to specify the timezone desired to be tested (e.g.: CI scripts will set the TZ environment variable or other JVM settings to setup the timezone) rather than have the tests smash the timezone directly. That way the tests can be run across multiple timezones by changing the environment before running the tests.

Comment applies to a couple of other places below where the timezone is being smashed with UTC.

the consideration of spark.sql.session.timeZone is explained in previous comment

@GaryShen2008 told me that they're already suggesting our customer to set timezone to UTC to facilitate GPU execution. So at least we're not testing a fictitious scenario

We should enable non-UTC test cases later, but in the first step, setting UTC here is to enable us testing in a fixed config which we support now. It can help us to detect the failure in UTC case. Without this UTC setting, the test cases may pass because of fallback to CPU. Currently we haven't been able to detect the wrong fallback in UT.

I'm not sure why we're not setting UTC in the environment of the test setup, which lets us reuse these tests. If we're going to leave this as-is, there needs to be a TODO comment linking to a tracking issue to fix this.

I opened a TODO issue for this. #10874, will add it to code's comment.

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestsTrait.scala

gerashegalov · 2024-05-21T16:32:39Z

.../src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsJsonExpressionsSuite.scala

+class RapidsJsonExpressionsSuite extends JsonExpressionsSuite with RapidsTestsTrait {
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true")


here and elsewhere. compile constant

Suggested change

SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true")

SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", true.toString)

Hi @gerashegalov , I will change accordingly. However it seems not generally enforced by everyone? One anti-pattern example at https://github.com/NVIDIA/spark-rapids/blob/branch-24.06/tests/src/test/scala/com/nvidia/spark/rapids/ApproximatePercentileSuite.scala#L108

Fair, we can add an issue to enforce it via scalastyle

gerashegalov · 2024-05-21T16:41:15Z

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsJsonFunctionsSuite.scala

+  override def afterAll(): Unit = {
+    super.afterAll()
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson")
+  }


Consider extracting into a base trait and reuse conf handling across unit tests

spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/SparkQueryCompareTestSuite.scala

Lines 158 to 161 in 56de325

override def afterAll(): Unit = {

super.afterAll()

TrampolineUtil.cleanupAnyExistingSession()

}

binmahone · 2024-05-22T05:00:33Z

build

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone · 2024-05-22T06:44:00Z

build

jlowe

Looks OK to me, but should get feedback from @revans2 and @gerashegalov as well.

revans2 · 2024-05-23T19:04:25Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern")
    .stringConf
    .createWithDefault(false.toString)

+  val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed")


I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.

gerashegalov · 2024-05-23T23:35:24Z

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsJsonConfTrait.scala

+  override def afterAll(): Unit = {
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs")
+    SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson")
+    super.afterAll()
+  }


It is not how I thought my previous comment would be addressed. Should not beforeAll and afterAll largely be the same implementation as in
other Unit Tests based off SparkQueryCompareTestSuite

especially the cleanup after?

override def afterAll(): Unit = { super.afterAll() TrampolineUtil.cleanupAnyExistingSession() }

And we should just make sure that we reuse for new UT and

RapidsJsonConfTrait is an extra trait to inject more json related configs, and it will call the beforeAll/afterAll methods in its parent traits ( because of trait linerlization)

I am talking about the fact that we already have the code cleaning out SparkSession via TrampolineUtil.cleanupAnyExistingSession() . Instead of doing a special logic for this one test we could reuse the more general cleanup code.

As discussed with Gera offline, we'll use #10886 to track this

gerashegalov

LGTM. the comment about conf cleanup between the tests is optional

binmahone · 2024-05-24T01:51:15Z

So far all of the comment of this issue is addressed. I'll close this PR and turn to #10861 as it is a superset of this PR

binmahone force-pushed the 240521_refine_ut_framework branch from 2ffcb81 to c6ee8d5 Compare May 21, 2024 06:11

This was referenced May 21, 2024

[BUG] Issues found by Spark UT Framework on RapidsJsonExpressionsSuite #10849

Open

[BUG] Issues found by Spark UT Framework on RapidsJsonFunctionsSuite #10852

Open

binmahone mentioned this pull request May 21, 2024

[BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite #10774

Closed

4 tasks

binmahone mentioned this pull request May 21, 2024

[BUG] Issues found by Spark UT Framework on RapidsStringExpressionsSuite #10775

Open

8 tasks

revans2 reviewed May 21, 2024

View reviewed changes

jlowe linked an issue May 21, 2024 that may be closed by this pull request

[BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite #10774

Closed

4 tasks

jlowe reviewed May 21, 2024

View reviewed changes

sameerz added the test Only impacts tests label May 21, 2024

gerashegalov reviewed May 21, 2024

View reviewed changes

binmahone force-pushed the 240521_refine_ut_framework branch from 1c39df7 to 92fb3c9 Compare May 22, 2024 03:51

binmahone added 4 commits May 22, 2024 14:37

refine UT framework to promote GPU evaluation

265b1e5

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

enable some exprs for json

6780184

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

exclude flaky tests

e900d50

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

fix review comments

7237cb6

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone force-pushed the 240521_refine_ut_framework branch from 92fb3c9 to 7237cb6 Compare May 22, 2024 06:37

binmahone mentioned this pull request May 22, 2024

refine ut framework including Part 1 and Part 2 #10861

Merged

binmahone changed the title ~~refine UT framework to promote GPU evaluation~~ refine UT framework to promote GPU evaluation, Part 1 May 22, 2024

jlowe approved these changes May 23, 2024

View reviewed changes

revans2 approved these changes May 23, 2024

View reviewed changes

gerashegalov reviewed May 23, 2024

View reviewed changes

gerashegalov reviewed May 24, 2024

View reviewed changes

binmahone closed this May 24, 2024

GaryShen2008 removed a link to an issue May 24, 2024

[BUG] Issues found by Spark UT Framework on RapidsRegexpExpressionsSuite #10774

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refine UT framework to promote GPU evaluation, Part 1 #10851

refine UT framework to promote GPU evaluation, Part 1 #10851

binmahone commented May 21, 2024 •

edited

Loading

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

revans2 May 21, 2024

binmahone May 21, 2024

revans2 May 23, 2024

binmahone May 24, 2024

jlowe May 21, 2024

binmahone May 22, 2024

jlowe May 22, 2024

jlowe May 21, 2024

binmahone May 22, 2024

binmahone May 22, 2024

GaryShen2008 May 22, 2024

jlowe May 22, 2024

binmahone May 23, 2024 •

edited

Loading

gerashegalov May 21, 2024

binmahone May 22, 2024

gerashegalov May 22, 2024

gerashegalov May 21, 2024

binmahone May 22, 2024

binmahone commented May 22, 2024

binmahone commented May 22, 2024

jlowe left a comment

revans2 May 23, 2024

gerashegalov May 23, 2024

binmahone May 24, 2024 •

edited

Loading

gerashegalov May 24, 2024

binmahone May 24, 2024

gerashegalov left a comment

binmahone commented May 24, 2024

	SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true")
	SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", true.toString)

	override def afterAll(): Unit = {
	super.afterAll()
	TrampolineUtil.cleanupAnyExistingSession()
	}

refine UT framework to promote GPU evaluation, Part 1 #10851

refine UT framework to promote GPU evaluation, Part 1 #10851

Conversation

binmahone commented May 21, 2024 • edited Loading

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

binmahone commented May 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binmahone May 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binmahone commented May 22, 2024

binmahone commented May 22, 2024

jlowe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binmahone May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerashegalov left a comment

Choose a reason for hiding this comment

binmahone commented May 24, 2024

binmahone commented May 21, 2024 •

edited

Loading

binmahone May 23, 2024 •

edited

Loading

binmahone May 24, 2024 •

edited

Loading