Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine UT framework to promote GPU evaluation, Part 1 #10851

Closed

Conversation

binmahone
Copy link
Collaborator

@binmahone binmahone commented May 21, 2024

contributes to #10850

changes in this PR:

  1. changed the default way to evaluate an expression, taking into consideration that:
  • bound reference is not well supported
  • many of RAPIDS' expressions do not accept vectorized parameters
  1. Change default timezone to UTC to promote more expressions being evaluated in GPU

@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

hi @jlowe can you please review the PR?

@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern")
.stringConf
.createWithDefault(false.toString)

val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this at all. The foldable check was put in place a long time ago when each expression had to have explicit support for scalar values. We refectored the code a while ago and removed that. Just the fact that it works for testing means that the refactor was correct and I think we can just delete the check entirely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I do it in this PR or in another separate one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this issue will be tracked in #10887

@jlowe jlowe linked an issue May 21, 2024 that may be closed by this pull request
4 tasks
@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait {
"org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback")
.set("spark.sql.warehouse.dir", warehouse)
.set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer")
.set("spark.sql.session.timeZone", "UTC")
.set("spark.rapids.sql.explain", "ALL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the tests very verbose. Intentional, or debug code accidentally left in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are Intentional.

  • For "spark.sql.session.timeZone", we find many expressions won't go to GPU routine unless timezone is set to UTC. Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone

  • For "spark.rapids.sql.explain", since by default tests/src/test/resources/log4j2.properties already set console appender's log level to error, we won't see verbose logs even if spark.rapids.sql.explain is set to ALL here. The benifit of setting it to ALL is, when we want to check&debug test cases, we only need to change log4j2.proerties, instead of changing both log4j2.proerties and adding .set("spark.rapids.sql.explain", "ALL") in code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we want to test as many test cases as possible in GPU, we'd better use UTC timezone

I'm not questioning that we need to control the timezone for these tests, but rather the discussion is about how to control the timezone. Normally we don't want hardcoding like this, because now the test cannot check if we're doing the right thing in any timezone other that UTC, and like I said, we do have some timezone support with more on the way. Hardcoding this precludes using these tests to verify we're doing the right thing in any other timezone. Having the test environment setup script control that makes the test reusable.

@@ -121,6 +121,12 @@ object RapidsSQLTestsBaseTrait {
"org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback")
.set("spark.sql.warehouse.dir", warehouse)
.set("spark.sql.cache.serializer", "com.nvidia.spark.ParquetCachedBatchSerializer")
.set("spark.sql.session.timeZone", "UTC")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to force UTC in the tests? We have partial support for timezones and it's likely more will be added later. Typically what we've done in the past is leave it up to the environment setup to specify the timezone desired to be tested (e.g.: CI scripts will set the TZ environment variable or other JVM settings to setup the timezone) rather than have the tests smash the timezone directly. That way the tests can be run across multiple timezones by changing the environment before running the tests.

Comment applies to a couple of other places below where the timezone is being smashed with UTC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the consideration of spark.sql.session.timeZone is explained in previous comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaryShen2008 told me that they're already suggesting our customer to set timezone to UTC to facilitate GPU execution. So at least we're not testing a fictitious scenario

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enable non-UTC test cases later, but in the first step, setting UTC here is to enable us testing in a fixed config which we support now. It can help us to detect the failure in UTC case. Without this UTC setting, the test cases may pass because of fallback to CPU. Currently we haven't been able to detect the wrong fallback in UT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we're not setting UTC in the environment of the test setup, which lets us reuse these tests. If we're going to leave this as-is, there needs to be a TODO comment linking to a tracking issue to fix this.

Copy link
Collaborator Author

@binmahone binmahone May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a TODO issue for this. #10874, will add it to code's comment.

@sameerz sameerz added the test Only impacts tests label May 21, 2024
class RapidsJsonExpressionsSuite extends JsonExpressionsSuite with RapidsTestsTrait {
override def beforeAll(): Unit = {
super.beforeAll()
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and elsewhere. compile constant

Suggested change
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", "true")
SQLConf.get.setConfString("spark.rapids.sql.expression.JsonTuple", true.toString)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gerashegalov , I will change accordingly. However it seems not generally enforced by everyone? One anti-pattern example at https://github.com/NVIDIA/spark-rapids/blob/branch-24.06/tests/src/test/scala/com/nvidia/spark/rapids/ApproximatePercentileSuite.scala#L108

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, we can add an issue to enforce it via scalastyle

Comment on lines 35 to 41
override def afterAll(): Unit = {
super.afterAll()
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple")
SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject")
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs")
SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting into a base trait and reuse conf handling across unit tests

override def afterAll(): Unit = {
super.afterAll()
TrampolineUtil.cleanupAnyExistingSession()
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone changed the title refine UT framework to promote GPU evaluation refine UT framework to promote GPU evaluation, Part 1 May 22, 2024
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but should get feedback from @revans2 and @gerashegalov as well.

@@ -1485,6 +1485,13 @@ val GPU_COREDUMP_PIPE_PATTERN = conf("spark.rapids.gpu.coreDump.pipePattern")
.stringConf
.createWithDefault(false.toString)

val FOLDABLE_NON_LIT_ALLOWED = conf("spark.rapids.sql.test.isFoldableNonLitAllowed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with doing it later as this is an internal config, and it might be a large-ish change. But at the same time I don't want to forget about it.

Comment on lines +35 to +41
override def afterAll(): Unit = {
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonTuple")
SQLConf.get.unsetConf("spark.rapids.sql.expression.GetJsonObject")
SQLConf.get.unsetConf("spark.rapids.sql.expression.JsonToStructs")
SQLConf.get.unsetConf("spark.rapids.sql.expression.StructsToJson")
super.afterAll()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not how I thought my previous comment would be addressed. Should not beforeAll and afterAll largely be the same implementation as in
other Unit Tests based off SparkQueryCompareTestSuite

especially the cleanup after?

 override def afterAll(): Unit = { 
   super.afterAll() 
   TrampolineUtil.cleanupAnyExistingSession() 
 } 

And we should just make sure that we reuse for new UT and

Copy link
Collaborator Author

@binmahone binmahone May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RapidsJsonConfTrait is an extra trait to inject more json related configs, and it will call the beforeAll/afterAll methods in its parent traits ( because of trait linerlization)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about the fact that we already have the code cleaning out SparkSession via TrampolineUtil.cleanupAnyExistingSession() . Instead of doing a special logic for this one test we could reuse the more general cleanup code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with Gera offline, we'll use #10886 to track this

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. the comment about conf cleanup between the tests is optional

@binmahone
Copy link
Collaborator Author

So far all of the comment of this issue is addressed. I'll close this PR and turn to #10861 as it is a superset of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants