Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params #40349

Closed
wants to merge 3 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Mar 9, 2023

What changes were proposed in this pull request?

Make LiteralExpression support array

Why are the changes needed?

MLIib requires literal to carry the array params, like IntArrayParam, DoubleArrayArrayParam.

Note that this PR doesn't affect existing functions.lit method which apply unresolved CreateArray expression to support array input.

Does this PR introduce any user-facing change?

No, dev-only

How was this patch tested?

added UT

@zhengruifeng
Copy link
Contributor Author

cc @WeichenXu123 @HyukjinKwon

@zhengruifeng zhengruifeng changed the title [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params Mar 9, 2023
@zhengruifeng zhengruifeng changed the title [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params [SPARK-42725][CONNECT][PYTHON][ML] Make LiteralExpression support array params Mar 9, 2023
@zhengruifeng zhengruifeng changed the title [SPARK-42725][CONNECT][PYTHON][ML] Make LiteralExpression support array params [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params Mar 9, 2023
@@ -277,14 +281,26 @@ def _infer_type(cls, value: Any) -> DataType:
return DateType()
elif isinstance(value, datetime.timedelta):
return DayTimeIntervalType()
else:
elif isinstance(value, np.generic):
if isinstance(value, np.generic):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems duplicate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@@ -437,7 +436,6 @@ def test_literal_with_unsupported_type(self):
(0.1, DecimalType()),
(datetime.date(2022, 12, 13), TimestampType()),
(datetime.timedelta(1, 2, 3), DateType()),
([1, 2, 3], ArrayType(IntegerType())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add test for Array<Array> type as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrayType was not support before, here just check the raised AssertionError

self.assertEqual(l2.array.elements[1].double, -3.0)
self.assertEqual(l2.array.elements[2].double, 0.0)

l3 = LiteralExpression._from_value([[3, 4], [5, 6, 7]]).to_plan(None).literal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 l3 and l4 are array < array >

fix test

fix test
add a test for array_string
zhengruifeng added a commit that referenced this pull request Mar 10, 2023
…arams

### What changes were proposed in this pull request?
Make LiteralExpression support array

### Why are the changes needed?
MLIib requires literal to carry the array params, like  `IntArrayParam`, `DoubleArrayArrayParam`.

Note that this PR doesn't affect existing `functions.lit` method which apply unresolved `CreateArray` expression to support array input.

### Does this PR introduce _any_ user-facing change?
No, dev-only

### How was this patch tested?
added UT

Closes #40349 from zhengruifeng/connect_py_ml_lit.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit d6d0fc7)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Contributor Author

merged into master/branch-3.4

@zhengruifeng zhengruifeng deleted the connect_py_ml_lit branch March 10, 2023 02:19
DataType elementType = 1;
repeated Literal element = 2;
DataType element_type = 1;
repeated Literal elements = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first defined it, I confirmed to @hvanhovell that it should be element rather than elements , so I defined it as element.

#40218 (comment)

At the same time, I found that we sometimes use the singular and sometimes the plural in the definition of the repeated field.

repeated Expression partition_spec = 2;
// (Optional) Ordering of rows in a partition.
repeated SortOrder order_spec = 3;

I think we should have a unified rule.

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…arams

### What changes were proposed in this pull request?
Make LiteralExpression support array

### Why are the changes needed?
MLIib requires literal to carry the array params, like  `IntArrayParam`, `DoubleArrayArrayParam`.

Note that this PR doesn't affect existing `functions.lit` method which apply unresolved `CreateArray` expression to support array input.

### Does this PR introduce _any_ user-facing change?
No, dev-only

### How was this patch tested?
added UT

Closes apache#40349 from zhengruifeng/connect_py_ml_lit.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit d6d0fc7)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants