Skip to content

[SPARK-46976][PS] Implement DataFrameGroupBy.corr#45028

Closed
zhengruifeng wants to merge 6 commits into
apache:masterfrom
zhengruifeng:ps_df_groupby_corr
Closed

[SPARK-46976][PS] Implement DataFrameGroupBy.corr#45028
zhengruifeng wants to merge 6 commits into
apache:masterfrom
zhengruifeng:ps_df_groupby_corr

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Implement DataFrameGroupBy.corr

Why are the changes needed?

for pandas parity
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.corr.html

Does this PR introduce any user-facing change?

yes

In [5]: pdf = pd.DataFrame({'A': [0, 0, 0, 1, 1, 2], 'B': [-1, 2, 3, 5, 6, 0], 'C': [4, 6, 5, 1, 3, 0]}, columns=['A', 'B', 'C'])

In [6]: pdf.groupby("A").corr()
Out[6]: 
            B         C
A                      
0 B  1.000000  0.720577
  C  0.720577  1.000000
1 B  1.000000  1.000000
  C  1.000000  1.000000
2 B       NaN       NaN
  C       NaN       NaN

In [7]: psdf = ps.from_pandas(pdf)

In [8]: psdf.groupby("A").corr()
                                                                                
            B         C
A                      
0 B  1.000000  0.720577
  C  0.720577  1.000000
1 B  1.000000  1.000000
  C  1.000000  1.000000
2 B       NaN       NaN
  C       NaN       NaN

How was this patch tested?

added tests

Was this patch authored or co-authored using generative AI tooling?

no

Comment thread python/pyspark/pandas/groupby.py Outdated
2 B NaN NaN
C NaN NaN

>>> psdf.groupby("A").corr(min_periods=2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: df

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@xinrong-meng
Copy link
Copy Markdown
Member

LGTM after fixing the typo in doctest, thanks!

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

thanks for the reviews
merged to master

@zhengruifeng zhengruifeng deleted the ps_df_groupby_corr branch February 6, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants