Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite #41572

Closed
wants to merge 8 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 13, 2023

What changes were proposed in this pull request?

The pr aims to improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite, include:

  • When generating GOLDEN files, we should first delete the corresponding directories and generate new ones to avoid submitting some redundant files during the review process. eg:
    When we write a test named make_timestamp_ltz for the overloaded method, and during the review process, the reviewer wishes to add more tests for the method. The name of this method has changed during the next submission process, such as make_timestamp_ltz without timezone.At this point, if the queries/function_make_timestamp_ltz.json, queries/function_make_timestamp_ltz.proto.bin and explain-results/function_make_timestamp_ltz.explain files of function_make_timestamp_ltz are already in the commit, and there are many of these files, we generally do not notice the above problem, which leads to the incorrect submission of queries/function_make_timestamp_ltz.json, queries/function_make_timestamp_ltz.proto.bin and explain-results/function_make_timestamp_ltz.explain files without any impact on UT. These files are redundant.

  • Clear and update some redundant files submitted incorrectly

Why are the changes needed?

Make code clear.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA.

@@ -103,6 +104,12 @@ class PlanGenerationTestSuite
val client = SparkConnectClient(InProcessChannelBuilder.forName("/dev/null").build())
session =
new SparkSession(client, cleaner = SparkSession.cleaner, planIdGenerator = new AtomicLong)
if (regenerateGoldenFiles) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple method is to delete all the corresponding directories when generating golden files. This way, in our logic, we won't accidentally submit redundant files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thank you ! Personally, I feet delete all golden files and regenerate is OK.

@panbingkun
Copy link
Contributor Author

@@ -103,6 +104,12 @@ class PlanGenerationTestSuite
val client = SparkConnectClient(InProcessChannelBuilder.forName("/dev/null").build())
session =
new SparkSession(client, cleaner = SparkSession.cleaner, planIdGenerator = new AtomicLong)
if (regenerateGoldenFiles) {
if (queryFilePath.toFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can use Utils.deleteRecursively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zhengruifeng
Copy link
Contributor

cc @hvanhovell

@@ -103,6 +105,12 @@ class PlanGenerationTestSuite
val client = SparkConnectClient(InProcessChannelBuilder.forName("/dev/null").build())
session =
new SparkSession(client, cleaner = SparkSession.cleaner, planIdGenerator = new AtomicLong)
if (regenerateGoldenFiles) {
if (queryFilePath.toFile.exists()) {
Copy link
Contributor

@beliefer beliefer Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract queryFilePath.toFile as a variable like val ...Path = queryFilePath.toFile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,
Meanwhile, let's verify #41579.

@@ -90,6 +92,13 @@ class ProtoToParsedPlanTestSuite
" theid INTEGER, \"Dept\" INTEGER)")
.executeUpdate()
conn.commit()

if (regenerateGoldenFiles) {
if (goldenFilePath.toFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 108 to 109
val queryFile = queryFilePath.toFile
if (regenerateGoldenFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val queryFile = queryFilePath.toFile
if (regenerateGoldenFiles) {
if (regenerateGoldenFiles) {
val queryFile = queryFilePath.toFile

Comment on lines 96 to 97
val goldenFile = goldenFilePath.toFile
if (regenerateGoldenFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val goldenFile = goldenFilePath.toFile
if (regenerateGoldenFiles) {
if (regenerateGoldenFiles) {
val goldenFile = goldenFilePath.toFile

@beliefer
Copy link
Contributor

LGTM.

@zhengruifeng
Copy link
Contributor

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite -- -z lpad"
...

[info] PlanGenerationTestSuite:
[info] - function lpad (35 milliseconds)
[info] - function lpad binary (1 millisecond)
[info] Run completed in 2 seconds, 58 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 120 s (02:00), completed Jun 15, 2023, 10:42:52 AM

will re-generating golden files for single test or a group of tests still be supported after this PR?

@panbingkun
Copy link
Contributor Author

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite -- -z lpad"
...

[info] PlanGenerationTestSuite:
[info] - function lpad (35 milliseconds)
[info] - function lpad binary (1 millisecond)
[info] Run completed in 2 seconds, 58 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 120 s (02:00), completed Jun 15, 2023, 10:42:52 AM

will re-generating golden files for single test or a group of tests still be supported after this PR?

No longer supported, Let me improve it again.

@hvanhovell
Copy link
Contributor

I have another concern, for testing backwards compatibility it might be useful to keep 'orphaned' protos around. This would effectively kill that.

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 24, 2023

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite -- -z lpad"
...

[info] PlanGenerationTestSuite:
[info] - function lpad (35 milliseconds)
[info] - function lpad binary (1 millisecond)
[info] Run completed in 2 seconds, 58 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 120 s (02:00), completed Jun 15, 2023, 10:42:52 AM

will re-generating golden files for single test or a group of tests still be supported after this PR?

@zhengruifeng
The current logic already supports the above scenario.

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 24, 2023

I have another concern, for testing backwards compatibility it might be useful to keep 'orphaned' protos around. This would effectively kill that.

@hvanhovell

1.Very good reminder, but currently the orphan files deleted from the above, such as function_percentile.proto.bin, function_regexp_extract_all.proto.bin,function_regexp_instr.proto.bin, these are all files submitted by mistake during the review process.

2.At the same time, I have added additional explanatory notes in the code comments.
https://github.com/apache/spark/pull/41572/files#diff-a2b06ed831053e086e47d191c1e86d26da81c2d2a5cbaedd34a712297a2b418dR70
image

3.We should provide an automated function to find orphaned files, otherwise, finding files that can be deleted from over 600 files may be a bit difficult. As for whether to delete them, I think it needs to be weighed between the submitter and the code reviewer. Otherwise, many orphaned files are increasing, and many of them are only generated by submitting them incorrectly. For example, from the first submission of this PR to today, two orphan files have been generated: function_regexp_extract_all.proto.bin, function_regexp_instr.proto.bin.

@hvanhovell
Copy link
Contributor

Alright works for me.

I guess orphaned protos will show up in the git diff now. I guess we can create a separate folder for the ones we want to keep.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants