Skip to content

[SPARK-56607][PYTHON] Remove unnecessary __new__ and __init__#55526

Closed
gaogaotiantian wants to merge 5 commits intoapache:masterfrom
gaogaotiantian:remove-unnecessary-new
Closed

[SPARK-56607][PYTHON] Remove unnecessary __new__ and __init__#55526
gaogaotiantian wants to merge 5 commits intoapache:masterfrom
gaogaotiantian:remove-unnecessary-new

Conversation

@gaogaotiantian
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

  • Removed __init__ call in __new__ because it's unnecessary and could be unsound.
  • Removed unnecessary __new__ method when it just does the default thing.
  • Kept the weird behavior of DataFrame where the base class calls the derived class __new__ method, but made it cleaner.

Why are the changes needed?

Remove some unnecessary and potentially buggy code. Also less mypy ignores.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI.

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -99,11 +100,6 @@ def tearDownClass(cls):


class MockDataset(DataFrame):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually why should we change this parent dataframe to classic dataframe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is supposed to use the classic dataframe. The "parent" dataframe is not supposed to be used directly.

    # HACK ALERT!! this is to reduce the backward compatibility concern, and returns
    # Spark Classic DataFrame by default. This is NOT an API, and NOT supposed to
    # be directly invoked. DO NOT use this constructor.

We have a really weird way to keep backward compatibility by calling child __new__ in parent class. If we do not inherit the classic dataframe, we need to write a __new__ method for MockDataset because the parent dataframe's __new__ expects certain arguments that MockDataset does not provide. Both classic and connect DataFrame is fine because they have __init__ to indicate type annotations.

@zhengruifeng
Copy link
Copy Markdown
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants