Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData #37807

Closed
wants to merge 1 commit into from

Conversation

c27kwan
Copy link
Contributor

@c27kwan c27kwan commented Sep 6, 2022

What changes were proposed in this pull request?

There is no explicit hashCode() function override for ArrayBasedMapData. As a result, there is a non-deterministic error where the hashCode() computed for Literals of ArrayBasedMapData can be different for two equal objects (Literals of ArrayBasedMapData with equal keys and values).

In this PR, we add a hashCode function so that it works exactly as we expect.

Why are the changes needed?

This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the hashCode method instead of relying on defaults. We can't add the hashCode directly to ArrayBasedMapData because of SPARK-9415.

Does this PR introduce any user-facing change?

No

How was this patch tested?

A simple unit test was added.

@@ -369,6 +369,9 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression {
val valueHashCode = value match {
case null => 0
case binary: Array[Byte] => util.Arrays.hashCode(binary)
// SPARK-40315: Literals of ArrayBasedMapData should have deterministic hashCode.
case arrayBasedMapData: ArrayBasedMapData =>
arrayBasedMapData.keyArray.hashCode() * 37 + arrayBasedMapData.valueArray.hashCode()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply re-adding the old hashCode function that was removed in: #13847

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I didn't realize that we already handled ArrayBasedMapData in Literal.equals

@cloud-fan cloud-fan closed this in e85a4ff Sep 6, 2022
cloud-fan pushed a commit that referenced this pull request Sep 6, 2022
### What changes were proposed in this pull request?
There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values).

In this PR, we add a `hashCode` function so that it works exactly as we expect.

### Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A simple unit test was added.

Closes #37807 from c27kwan/SPARK-40315-lit.

Authored-by: Carmen Kwan <carmen.kwan@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e85a4ff)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Sep 6, 2022
### What changes were proposed in this pull request?
There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values).

In this PR, we add a `hashCode` function so that it works exactly as we expect.

### Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A simple unit test was added.

Closes #37807 from c27kwan/SPARK-40315-lit.

Authored-by: Carmen Kwan <carmen.kwan@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e85a4ff)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Sep 6, 2022
### What changes were proposed in this pull request?
There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values).

In this PR, we add a `hashCode` function so that it works exactly as we expect.

### Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A simple unit test was added.

Closes #37807 from c27kwan/SPARK-40315-lit.

Authored-by: Carmen Kwan <carmen.kwan@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e85a4ff)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.3/3.2/3.1!

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
### What changes were proposed in this pull request?
There is no explicit `hashCode()` function override for `ArrayBasedMapData`. As a result, there is a non-deterministic error where the `hashCode()` computed for `Literal`s of `ArrayBasedMapData` can be different for two equal objects (`Literal`s of `ArrayBasedMapData` with equal keys and values).

In this PR, we add a `hashCode` function so that it works exactly as we expect.

### Why are the changes needed?
This is a bug fix for a non-deterministic error. It is also more consistent with the rest of Spark if we implement the `hashCode` method instead of relying on defaults. We can't add the `hashCode` directly to `ArrayBasedMapData` because of SPARK-9415.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
A simple unit test was added.

Closes apache#37807 from c27kwan/SPARK-40315-lit.

Authored-by: Carmen Kwan <carmen.kwan@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e85a4ff)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit c7cc0ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants