Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48060][SS][TESTS] Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly #46304

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 30, 2024

What changes were proposed in this pull request?

This PR aims to fix StreamingQueryHashPartitionVerifySuite to update golden files correctly.

  • The documentation is added.
  • Newly generated files are updated.

Why are the changes needed?

Previously, SPARK_GENERATE_GOLDEN_FILES doesn't work as expected because it updates the files under target directory. We need to update src/test files.

BEFORE

$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite"

$ git status
On branch master
Your branch is up to date with 'apache/master'.

nothing to commit, working tree clean

AFTER

$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4

$ git status
On branch SPARK-48060
Your branch is up to date with 'dongjoon/SPARK-48060'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
	modified:   sql/core/src/test/resources/structured-streaming/partition-tests/rowsAndPartIds

no changes added to commit (use "git add" and/or "git commit -a")

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs. I regenerate the data like the following.

$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48060][SS][TESTS] Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly [SPARK-48060][SS][TESTS] Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly Apr 30, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as draft April 30, 2024 16:51
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review April 30, 2024 18:00
@dongjoon-hyun
Copy link
Member Author

Could you review this test PR about generating golden files, too, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !
Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-48060 branch April 30, 2024 19:52
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
… to update golden files correctly

### What changes were proposed in this pull request?

This PR aims to fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly.
- The documentation is added.
- Newly generated files are updated.

### Why are the changes needed?

Previously, `SPARK_GENERATE_GOLDEN_FILES` doesn't work as expected because it updates the files under `target` directory. We need to update `src/test` files.

**BEFORE**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite"

$ git status
On branch master
Your branch is up to date with 'apache/master'.

nothing to commit, working tree clean
```

**AFTER**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4

$ git status
On branch SPARK-48060
Your branch is up to date with 'dongjoon/SPARK-48060'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
	modified:   sql/core/src/test/resources/structured-streaming/partition-tests/rowsAndPartIds

no changes added to commit (use "git add" and/or "git commit -a")
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. I regenerate the data like the following.

```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46304 from dongjoon-hyun/SPARK-48060.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants