Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Series.item #1502

Merged
merged 4 commits into from May 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 0 additions & 1 deletion databricks/koalas/missing/series.py
Expand Up @@ -53,7 +53,6 @@ class _MissingPandasLikeSeries(object):
first = unsupported_function("first")
infer_objects = unsupported_function("infer_objects")
interpolate = unsupported_function("interpolate")
item = unsupported_function("item")
items = unsupported_function("items")
iteritems = unsupported_function("iteritems")
last = unsupported_function("last")
Expand Down
25 changes: 25 additions & 0 deletions databricks/koalas/series.py
Expand Up @@ -4730,6 +4730,31 @@ def asof(self, where):
result_series.name = self.name
return result_series

def item(self):
"""
Return the first element of the underlying data as a Python scalar.

Returns
-------
scalar
The first element of Series.

Raises
------
ValueError
If the data is not length-1.

Examples
--------
>>> kser = ks.Series([10])
>>> kser.item()
10
"""
item_top_two = self[:2]
if len(item_top_two) != 1:
raise ValueError("can only convert an array of size 1 to a Python scalar")
return item_top_two[0]
Comment on lines +4753 to +4756
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this still runs Spark jobs twice? We should explicitly call to_pandas() or collect()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'll check and fix it. Thanks !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be very careful on this @itholic next time - this isn't a trivial mistake. This one line mistake makes the performance two times slower. Thanks for pointing this out @ueshin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Yeah, I should have considered much more carefully.
Thanks for reminding me once again, @HyukjinKwon ! I must keep that in mind.


def _cum(self, func, skipna, part_cols=()):
# This is used to cummin, cummax, cumsum, etc.

Expand Down
4 changes: 4 additions & 0 deletions databricks/koalas/tests/test_series.py
Expand Up @@ -1520,3 +1520,7 @@ def test_shape(self):
pser = kser.to_pandas()

self.assert_eq(pser.shape, kser.shape)

def test_item(self):
kser = ks.Series([10, 20])
self.assertRaises(ValueError, lambda: kser.item())
1 change: 1 addition & 0 deletions docs/source/reference/series.rst
Expand Up @@ -56,6 +56,7 @@ Indexing, iteration
Series.iloc
Series.keys
Series.pop
Series.item
Series.xs
Series.get

Expand Down