Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MultiIndex for DataFrame.duplicated, drop_duplicates. #1363

Merged
merged 1 commit into from Mar 24, 2020

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Mar 24, 2020

Currently DataFrame.duplicated and DataFrame.drop_duplicates don't support MultiIndex.
We should support them.

e.g.,

>>> pdf
          a  b  c
0.660073  1  1  1
0.255808  1  1  1
0.796535  2  1  1
0.562986  3  4  5
>>> pdf.duplicated()
0.660073    False
0.255808     True
0.796535    False
0.562986    False
dtype: bool
>>> pdf.set_index("a", append=True).duplicated()
          a
0.660073  1    False
0.255808  1     True
0.796535  2     True
0.562986  3    False
dtype: bool

or

>>> pdf.drop_duplicates()
          a  b  c
0.660073  1  1  1
0.796535  2  1  1
0.562986  3  4  5
>>> pdf.set_index("a", append=True).drop_duplicates()
            b  c
         a
0.660073 1  1  1
0.562986 3  4  5

@ueshin ueshin requested a review from HyukjinKwon March 24, 2020 01:48
@codecov-io
Copy link

Codecov Report

Merging #1363 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1363      +/-   ##
=========================================
- Coverage   95.22%   95.2%   -0.02%     
=========================================
  Files          34      34              
  Lines        7705    7699       -6     
=========================================
- Hits         7337    7330       -7     
- Misses        368     369       +1
Impacted Files Coverage Δ
databricks/koalas/frame.py 96.73% <100%> (-0.02%) ⬇️
databricks/koalas/generic.py 97.13% <0%> (-0.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d7427f...7b8f546. Read the comment docs.

@itholic
Copy link
Contributor

itholic commented Mar 24, 2020

LGTM.

(FYI: Although we have some problem in Travis now, seems okay to just merge this since GitHub Actions is passed)

@ueshin
Copy link
Collaborator Author

ueshin commented Mar 24, 2020

Thanks! Let me merge this for now since Github Action builds passed.

@ueshin ueshin merged commit 704747f into databricks:master Mar 24, 2020
@ueshin ueshin deleted the duplicated branch March 24, 2020 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants