Skip to content

[SPARK-46028][CONNECT][PYTHON] Make Column.__getitem__ accept input column#43930

Closed
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:connect_column_getitem
Closed

[SPARK-46028][CONNECT][PYTHON] Make Column.__getitem__ accept input column#43930
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:connect_column_getitem

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Make Column.__getitem__ accept input column

Why are the changes needed?

Column.__getitem__ should accept column as input

In [1]: from pyspark.sql.functions import col,lit,create_map
   ...: from itertools import chain
   ...:
   ...: mapping = {
   ...:   'A': '20',
   ...:   'B': '28',
   ...:   'C': '34'
   ...: }
   ...:
   ...: x = [['A','10'],['B','14'],['C','17']]
   ...: df = spark.createDataFrame(data=x, schema = ["key", "value"])
   ...:
   ...: mapping_expr = create_map([lit(x) for x in chain(*mapping.items())])
   ...: df = df.withColumn("square_value", mapping_expr[col("key")])
   ...: df.show()
---------------------------------------------------------------------------
PySparkTypeError                          Traceback (most recent call last)
Cell In[1], line 14
     11 df = spark.createDataFrame(data=x, schema = ["key", "value"])
     13 mapping_expr = create_map([lit(x) for x in chain(*mapping.items())])
---> 14 df = df.withColumn("square_value", mapping_expr[col("key")])
     15 df.show()

File ~/Dev/spark/python/pyspark/sql/connect/column.py:465, in Column.__getitem__(self, k)
    463     return self.substr(k.start, k.stop)
    464 else:
--> 465     return Column(UnresolvedExtractValue(self._expr, LiteralExpression._from_value(k)))

File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:336, in LiteralExpression._from_value(cls, value)
    334 @classmethod
    335 def _from_value(cls, value: Any) -> "LiteralExpression":
--> 336     return LiteralExpression(value=value, dataType=LiteralExpression._infer_type(value))

File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:329, in LiteralExpression._infer_type(cls, value)
    323         raise PySparkTypeError(
    324             error_class="CANNOT_INFER_ARRAY_TYPE",
    325             message_parameters={},
    326         )
    327     return ArrayType(LiteralExpression._infer_type(first), True)
--> 329 raise PySparkTypeError(
    330     error_class="UNSUPPORTED_DATA_TYPE",
    331     message_parameters={"data_type": type(value).__name__},
    332 )

PySparkTypeError: [UNSUPPORTED_DATA_TYPE] Unsupported DataType `Column`.

Does this PR introduce any user-facing change?

yes

How was this patch tested?

added ut

Was this patch authored or co-authored using generative AI tooling?

no

init
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @zhengruifeng and @HyukjinKwon .
Merged to master for Apache Spark 4.0.0

@zhengruifeng zhengruifeng deleted the connect_column_getitem branch November 22, 2023 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants