Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45891][SQL][FOLLOW-UP] Added length check to the is_variant_null expression #46311

Closed
wants to merge 26 commits into from

Conversation

harshmotw-db
Copy link
Contributor

What changes were proposed in this pull request?

Added a check in the is_variant_null expression where the length of the value field is verified to be greater than zero. If the length is zero, a MALFORMED_VARIANT exception is thrown.

Why are the changes needed?

Earlier, is_variant_null was simply checking if the first byte of a variant value was zero. However, if the value field is empty, the first byte logically doesn't exist and therefore, it could result in undefined behavior. Such a case should ideally never be seen but it could appear in the case of data corruption.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Additional unit test to check if the zero-length variant throws an exception.

Was this patch authored or co-authored using generative AI tooling?

No

@harshmotw-db harshmotw-db marked this pull request as ready for review April 30, 2024 22:31
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-45891][FOLLOW-UP] Added length check to the is_variant_null expression [SPARK-45891][SQL][FOLLOW-UP] Added length check to the is_variant_null expression May 1, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @harshmotw-db .

@harshmotw-db
Copy link
Contributor Author

Thanks for the reviews @cloud-fan @HyukjinKwon @dongjoon-hyun. The tests are completed and it is ready to be merged now.

@dongjoon-hyun
Copy link
Member

Yep. Merged to master for Apache Spark 4.0.0. Thank you, @harshmotw-db .

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…ll expression

### What changes were proposed in this pull request?

Added a check in the `is_variant_null` expression where the length of the value field is verified to be greater than zero. If the length is zero, a `MALFORMED_VARIANT` exception is thrown.

### Why are the changes needed?

Earlier, `is_variant_null` was simply checking if the first byte of a variant value was zero. However, if the value field is empty, the first byte logically doesn't exist and therefore, it could result in undefined behavior. Such a case should ideally never be seen but it could appear in the case of data corruption.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Additional unit test to check if the zero-length variant throws an exception.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46311 from harshmotw-db/is_variant_null_fix.

Authored-by: Harsh Motwani <harsh.motwani@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants