Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head #45197

Closed
wants to merge 4 commits into from

Conversation

wunderalbert
Copy link
Contributor

@wunderalbert wunderalbert commented Feb 21, 2024

What changes were proposed in this pull request?

Change description of returns in docstring for pyspark's dataframe's head to match actual behaviour.

Why are the changes needed?

The docstring claimed that head(n) would return a Row (rather than a list of rows) iff n == 1, but that's incorrect.

Type hints, example, and implementation show that the difference between row or list of rows lies in whether n is supplied at all -- if it isn't, head() returns a Row, if it is, even if it is 1, head(n) returns a list.

Does this PR introduce any user-facing change?

No, only a Python docstring is changed.

How was this patch tested?

No tests were added since only docs changed.

Was this patch authored or co-authored using generative AI tooling?

No.

The docstring claimed that `head(n)` would return a `Row` (rather than a list of rows) iff n == 1, but that's incorrect.

Type hints, example, and implementation show that the difference between row or list of rows lies in whether n is supplied at all -- if it isn't, `head()` returns a `Row`, if it is, even if it is 1, `head(n)` returns a list.

This commit corrects the docstring.
@xinrong-meng
Copy link
Member

xinrong-meng commented Feb 21, 2024

Good catch! Thanks for working on that!

Would you create a JIRA and add the JIRA number to the PR title, along with [DOCS][PYTHON] labels?

Also feel free to answer no to user-facing change section in the PR description, considering that's a doc change only.

@@ -3526,8 +3526,8 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:

Returns
-------
If n is greater than 1, return a list of :class:`Row`.
If n is 1, return a single Row.
If n is supplied, return a list of :class:`Row`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> spark.range(3).head(0)
[]

I'm wondering if we should call out the empty list when n=0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'm suggesting adding this as an example, and extending the returns section to mention the length of the list (see commit below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@wunderalbert wunderalbert changed the title Correct docstring for pyspark's dataframe.head [SPARK-47132][DOCS][PYTHON] Correct docstring for pyspark's dataframe.head Feb 22, 2024
@wunderalbert
Copy link
Contributor Author

Thank you @xinrong-meng ! I've created the JIRA and edited the PR.

@@ -3526,8 +3526,9 @@ def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:

Returns
-------
If n is greater than 1, return a list of :class:`Row`.
If n is 1, return a single Row.
If n is supplied, return a list of :class:`Row` of length n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is so much better! Thanks!

@xinrong-meng
Copy link
Member

Merged to master, thank you!

ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
….head

### What changes were proposed in this pull request?

Change description of `returns` in docstring for pyspark's dataframe's `head` to match actual behaviour.

### Why are the changes needed?

The docstring claimed that `head(n)` would return a `Row` (rather than a list of rows) iff n == 1, but that's incorrect.

Type hints, example, and implementation show that the difference between row or list of rows lies in whether n is supplied at all -- if it isn't, `head()` returns a `Row`, if it is, even if it is 1, `head(n)` returns a list.

### Does this PR introduce _any_ user-facing change?

No, only a Python docstring is changed.

### How was this patch tested?

No tests were added since only docs changed.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45197 from wunderalbert/patch-1.

Authored-by: Albert Ziegler <55346556+wunderalbert@users.noreply.github.com>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants