Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy.rowNumbers() #16

Closed
andrus opened this issue Mar 28, 2019 · 0 comments
Closed

GroupBy.rowNumbers() #16

andrus opened this issue Mar 28, 2019 · 0 comments
Milestone

Comments

@andrus
Copy link
Collaborator

andrus commented Mar 28, 2019

Add a GroupBy.rowNumber() "window" function that would return a Series of the same size and sequence as the original pre-GroupBy DataFrame, containing each row order in the series. E.g.:

df = df.addColumn("row_number", df.group("a").sort("a").rowNumbers())

In the future we can add more ranking functions similar to Spark "rank" and "dense_rank".

@andrus andrus changed the title GroupBy.rank() GroupBy.rowNumber() Mar 28, 2019
@andrus andrus changed the title GroupBy.rowNumber() GroupBy.rowNumbers() Mar 31, 2019
andrus added a commit that referenced this issue Mar 31, 2019
* preliminary refactoring - extracting index sort algorithms in a standalone IndexSorter
andrus added a commit that referenced this issue Mar 31, 2019
* preliminary refactoring - redoing GroupBy to store groups index instead of full DataFrames
andrus added a commit that referenced this issue Mar 31, 2019
* adding Series.concat functionality to concatenate series
andrus added a commit that referenced this issue Mar 31, 2019
@andrus andrus added this to the 1.0 milestone Mar 31, 2019
@andrus andrus closed this as completed Mar 31, 2019
andrus added a commit that referenced this issue Mar 31, 2019
andrus added a commit that referenced this issue Mar 31, 2019
* testing combination of sort and row number
andrus added a commit that referenced this issue Mar 31, 2019
* refactoring
andrus added a commit that referenced this issue Jul 11, 2019
* making sure this doesnt blow up on an emty DataFrame
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant