Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41779][SPARK-41771][CONNECT][PYTHON] Make __getitem__ support filter and select #39300

Closed

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Dec 30, 2022

What changes were proposed in this pull request?

Make dataframe __getitem__ support:
1, filter: cdf[cdf.a.isin(1, 2, 3)]
2, select: cdf[["col1", cdf.a]]
3, index: cdf[0]

Why are the changes needed?

to be consistent with PySpark

Does this PR introduce any user-facing change?

yes

How was this patch tested?

added UT

@zhengruifeng
Copy link
Contributor Author

this pr is to fix

File "/.../spark/python/pyspark/sql/connect/column.py", line 364, in pyspark.sql.connect.column.Column.isin
Failed example:
    df[df.name.isin("Bob", "Mike")].collect()
Exception raised:
    Traceback (most recent call last):
      File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest pyspark.sql.connect.column.Column.isin[1]>", line 1, in <module>
        df[df.name.isin("Bob", "Mike")].collect()
      File "/.../workspace/forked/spark/python/pyspark/sql/connect/dataframe.py", line 888, in __getitem__
        return col(name)
      File "/.../workspace/forked/spark/python/pyspark/sql/connect/functions.py", line 161, in col
        return Column(ColumnReference(col))
      File "/.../spark/python/pyspark/sql/connect/expressions.py", line 322, in __init__
        assert isinstance(name, str)
    AssertionError

init
@zhengruifeng
Copy link
Contributor Author

cc @HyukjinKwon @grundprinzip

@zhengruifeng
Copy link
Contributor Author

thanks, merged into master

@zhengruifeng zhengruifeng deleted the connect_df_getitem_filter branch December 30, 2022 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants