Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Make ps.DataFrame(data, index) support the same anchor

Why are the changes needed?

before:

In [1]: import pyspark.pandas as ps

In [2]: psdf = ps.DataFrame([[1,2], [3,4]], columns=["A", "B"])

In [3]: ps.DataFrame(data=psdf[["A"]] * 2, index=psdf.index)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: Cannot combine the series or dataframe because it comes from a different dataframe. In order to allow this operation, enable 'compute.ops_on_diff_frames' option.

after:

In [1]: import pyspark.pandas as ps

In [2]: psdf = ps.DataFrame([[1,2], [3,4]], columns=["A", "B"])

In [3]: ps.DataFrame(data=psdf[["A"]] * 2, index=psdf.index)
                                                                                
   A
0  2
1  6

Does this PR introduce any user-facing change?

yes, DataFrame creation now support same anchor

How was this patch tested?

added UT

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Cc @ueshin

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test it with the MultiIndex and other index types such as CategoricalIndex and TimedeltaIndex ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@zhengruifeng zhengruifeng force-pushed the ps_creation_same_anchor branch from 2a19763 to 38044d7 Compare September 5, 2022 08:09
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the ps_creation_same_anchor branch September 6, 2022 01:49
@zhengruifeng
Copy link
Contributor Author

Thank you @itholic @HyukjinKwon for reivews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants