Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Series.item #1502

Merged
merged 4 commits into from May 19, 2020
Merged

Implement Series.item #1502

merged 4 commits into from May 19, 2020

Conversation

itholic
Copy link
Contributor

@itholic itholic commented May 16, 2020

This PR proposes Series.item

>>> kser = ks.Series([10])
>>> kser.item()
10

@itholic itholic changed the title Implemented Series.item Implement Series.item May 16, 2020
@codecov-io
Copy link

codecov-io commented May 16, 2020

Codecov Report

Merging #1502 into master will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1502      +/-   ##
==========================================
- Coverage   93.93%   93.90%   -0.04%     
==========================================
  Files          36       36              
  Lines        8445     8448       +3     
==========================================
  Hits         7933     7933              
- Misses        512      515       +3     
Impacted Files Coverage Δ
databricks/koalas/missing/series.py 100.00% <ø> (ø)
databricks/koalas/base.py 98.02% <100.00%> (ø)
databricks/koalas/series.py 96.98% <100.00%> (+0.01%) ⬆️
databricks/koalas/frame.py 95.43% <0.00%> (-0.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fbe6d0...17f9dfc. Read the comment docs.

@HyukjinKwon HyukjinKwon merged commit 8c0f135 into databricks:master May 19, 2020
Comment on lines +4753 to +4756
item_top_two = self[:2]
if len(item_top_two) != 1:
raise ValueError("can only convert an array of size 1 to a Python scalar")
return item_top_two[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this still runs Spark jobs twice? We should explicitly call to_pandas() or collect()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'll check and fix it. Thanks !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be very careful on this @itholic next time - this isn't a trivial mistake. This one line mistake makes the performance two times slower. Thanks for pointing this out @ueshin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Yeah, I should have considered much more carefully.
Thanks for reminding me once again, @HyukjinKwon ! I must keep that in mind.

HyukjinKwon pushed a commit that referenced this pull request May 21, 2020
According to the comment #1502 (comment), fixed `Series.item` to run a Spark job not twice, once.
@itholic itholic deleted the s_item branch May 29, 2020 00:50
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
According to the comment databricks/koalas#1502 (comment), fixed `Series.item` to run a Spark job not twice, once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants