Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45827][SQL][FOLLOWUP] Fix variant parquet reader. #43825

Closed

Conversation

chenhao-db
Copy link
Contributor

What changes were proposed in this pull request?

This is a follow-up of #43707. The previous PR missed a piece in the variant parquet reader: we are treating the variant type as struct<value binary, metadata binary>, so it also needs a similar assembleStruct process in the Parquet reader to correctly set the nullness of variant values from def/rep levels.

How was this patch tested?

Extend the existing unit test. It would fail without the change.

@github-actions github-actions bot added the SQL label Nov 15, 2023
@chenhao-db
Copy link
Contributor Author

@cloud-fan @HyukjinKwon could you help take a look? Thanks!

@@ -73,5 +73,12 @@ class VariantSuite extends QueryTest with SharedSparkSession {
values.map(v => if (v == null) "null" else v.debugString()).sorted
}
assert(prepareAnswer(input) == prepareAnswer(result))

withTempDir { dir =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic tests test case also test parquet write an read, why it didn't expose the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the variant values it writes are all non-null. This only causes an issue when there is a null variant value.

@HyukjinKwon HyukjinKwon changed the title [SPARK-45827] Fix variant parquet reader. [SPARK-45827][SQL[ Fix variant parquet reader. Nov 16, 2023
@HyukjinKwon HyukjinKwon changed the title [SPARK-45827][SQL[ Fix variant parquet reader. [SPARK-45827][SQL] Fix variant parquet reader. Nov 16, 2023
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f7d56e2 Nov 16, 2023
@cloud-fan cloud-fan changed the title [SPARK-45827][SQL] Fix variant parquet reader. [SPARK-45827][SQL][FOLLOWUP] Fix variant parquet reader. Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants