Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45078][SQL] Fix array_insert ImplicitCastInputTypes not work #42951

Closed

Conversation

Hisoka-X
Copy link
Member

What changes were proposed in this pull request?

This PR fix call array_insert with different type between array and insert column, will throw exception. Sometimes it should be execute successed.
eg:

select array_insert(array(1), 2, cast(2 as tinyint))

The ImplicitCastInputTypes in ArrayInsert always return empty array at now. So that Spark can not convert tinyint to int.

Why are the changes needed?

Fix error behavior in array_insert

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 16, 2023
@Hisoka-X
Copy link
Member Author

@Daniel-Davies
Copy link
Contributor

Thank you for fixing this @Hisoka-X!

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we require if someone extends ExpectsInputTypes then inputTypes shall return a non-empty Seq?

@Hisoka-X
Copy link
Member Author

Can't we require if someone extends ExpectsInputTypes then inputTypes shall return a non-empty Seq?

Maybe because empty means do nothing this time?

@MaxGekk
Copy link
Member

MaxGekk commented Sep 16, 2023

Maybe because empty means do nothing this time?

If do nothing, what's the reason to expect some input types which extending of ExpectsInputTypes assumes?

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Sep 16, 2023

Maybe because empty means do nothing this time?

If do nothing, what's the reason to expect some input types which extending of ExpectsInputTypes assumes?

Different input types may require different implicit conversions. Whether to perform conversion depends on the type of input and the characteristics of the function. There is no way to know whether conversion is required when the function is defined.

You mean we should do like this? @MaxGekk

  override def inputTypes: Seq[AbstractDataType] = {
    (srcArrayExpr.dataType, posExpr.dataType, itemExpr.dataType) match {
      case (ArrayType(e1, hasNull), e2: IntegralType, e3) if (e2 != LongType) =>
        TypeCoercion.findTightestCommonType(e1, e3) match {
          case Some(dt) => Seq(ArrayType(dt, hasNull), IntegerType, dt)
          case _ => Seq(ArrayType(e1), IntegerType, e1)
        }
      case (e1, e2, e3) => Seq(e1, e2, e1)
    }
  }

@MaxGekk
Copy link
Member

MaxGekk commented Sep 16, 2023

Maybe because empty means do nothing this time?
You mean we should do like this? @MaxGekk

Yep, eliminate the special meaning of empty Seq.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, this PR LGTM

@Hisoka-X
Copy link
Member Author

The collectionOperations.scala have a lots of Seq.empty. If we need remove it all, I can create a PR for it.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 17, 2023

The collectionOperations.scala have a lots of Seq.empty. If we need remove it all, I can create a PR for it.

Let's leave them as is so far.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 17, 2023

+1, LGTM. Merging to master/3.5/3.4.
Thank you, @Hisoka-X.

MaxGekk pushed a commit that referenced this pull request Sep 17, 2023
### What changes were proposed in this pull request?
This PR fix call `array_insert` with different type between array and insert column, will throw exception. Sometimes it should be execute successed.
eg:
```sql
select array_insert(array(1), 2, cast(2 as tinyint))
```
The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at now. So that Spark can not convert `tinyint` to `int`.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch.

Authored-by: Jia Fan <fanjiaeminem@qq.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit e84c66d)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk closed this in e84c66d Sep 17, 2023
@MaxGekk
Copy link
Member

MaxGekk commented Sep 17, 2023

@Hisoka-X The changes cause some conflicts in 3.4. Could you open a PR with backport to branch-3.4, please.

Hisoka-X added a commit to Hisoka-X/spark that referenced this pull request Sep 17, 2023
### What changes were proposed in this pull request?
This PR fix call `array_insert` with different type between array and insert column, will throw exception. Sometimes it should be execute successed.
eg:
```sql
select array_insert(array(1), 2, cast(2 as tinyint))
```
The `ImplicitCastInputTypes` in `ArrayInsert` always return empty array at now. So that Spark can not convert `tinyint` to `int`.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42951 from Hisoka-X/SPARK-45078_arrayinsert_type_mismatch.

Authored-by: Jia Fan <fanjiaeminem@qq.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>

(cherry picked from commit e84c66d)
@@ -4749,7 +4749,6 @@ case class ArrayInsert(
}
case (e1, e2, e3) => Seq.empty
}
Seq.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a mistake...

@Hisoka-X Hisoka-X deleted the SPARK-45078_arrayinsert_type_mismatch branch September 18, 2023 06:32
dongjoon-hyun pushed a commit that referenced this pull request Sep 18, 2023
… work

### What changes were proposed in this pull request?
This is a backport PR for #42951, to fix `array_insert` ImplicitCastInputTypes not work.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42960 from Hisoka-X/arrayinsert-fix-3.4.

Authored-by: Jia Fan <fanjiaeminem@qq.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
… work

### What changes were proposed in this pull request?
This is a backport PR for apache#42951, to fix `array_insert` ImplicitCastInputTypes not work.

### Why are the changes needed?
Fix error behavior in `array_insert`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42960 from Hisoka-X/arrayinsert-fix-3.4.

Authored-by: Jia Fan <fanjiaeminem@qq.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants