Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41002][CONNECT][PYTHON] Compatible take, head and first API in Python client #38488

Closed
wants to merge 4 commits into from

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Nov 3, 2022

What changes were proposed in this pull request?

  1. Add take(n) API.
  2. Change head(n) API to return Union[Optional[Row], List[Row]].
  3. Update first() to return Optional[Row].

Why are the changes needed?

Improve API coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM!

@amaliujia amaliujia changed the title [SPARK-41002][CONNECT][PYTHON] Compatible take and head API in Python client [SPARK-41002][CONNECT][PYTHON] Compatible take, head and first API in Python client Nov 3, 2022
:class:`Row`
First row if :class:`DataFrame` is not empty, otherwise ``None``.
"""
return self.head()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is copied from PySpark, but isn't it better to signal intent here with a 1 as explicit param?

Copy link
Contributor Author

@amaliujia amaliujia Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implementation details though but updated to self.head(1).

Copy link
Contributor Author

@amaliujia amaliujia Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually we cannot. self.head() returns Optional[Row] but self.head(n) returns List[Row]. self.head() is to make sure mypy check pass.

@amaliujia
Copy link
Contributor Author

actually I will follow https://spark.apache.org/contributing.html. to add a short description for test cases in this PR/

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@amaliujia
Copy link
Contributor Author

Ok added short description for the new test cases.

@zhengruifeng
Copy link
Contributor

merged into master

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…API in Python client

### What changes were proposed in this pull request?

1. Add `take(n)` API.
2. Change `head(n)` API to return `Union[Optional[Row], List[Row]]`.
3. Update `first()` to return `Optional[Row]`.

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes apache#38488 from amaliujia/SPARK-41002.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants