[SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py #34671

nchammas · 2021-11-19T21:57:16Z

What changes were proposed in this pull request?

This PR inlines the type annotations for {ml, mllib}/common.py.

Why are the changes needed?

This allows us to run type checks against the code within both versions of common.py.

This would help contributors catch some issues more easily, like this one: #34606 (comment)

Does this PR introduce any user-facing change?

Potentially. The C TypeVar is now public.

How was this patch tested?

Existing tests.

python/pyspark/ml/common.py

zero323

We should probably separate ml and mllib part and add umbrella tickets for both for bookkeeping.

Additionally, we'll need return types for all functions.

zero323 · 2021-11-19T22:17:49Z

cc @HyukjinKwon @ueshin @xinrong-databricks FYI

python/mypy.ini

nchammas · 2021-11-19T22:28:12Z

We should probably separate ml and mllib part and add umbrella tickets for both for bookkeeping.

Just to be clear, are you saying I should split this PR into ml/common.py vs. mllib/common.py?

And then have an umbrella ticket for adding type annotations to all of ml/, and another one for mllib/?

SparkQA · 2021-11-19T22:31:04Z

Test build #145468 has finished for PR 34671 at commit 8bb6757.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-19T22:45:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49940/

zero323 · 2021-11-19T22:57:11Z

Just to be clear, are you saying I should split this PR into ml/common.py vs. mllib/common.py?

And then have an umbrella ticket for adding type annotations to all of ml/, and another one for mllib/?

For the context ‒ we're in the middle of the process of inlining hints from stubs to inline hints. At the moment we have two umbrella tickets ‒ SPARK-36845 and SPARK-37094 for SQL and core respectively. We should follow this convention for ml and mllib as well.

It should be OK to have two tickets (ml.common and mllib.common) and resolve both in this PR, since you've already started working on that.

SparkQA · 2021-11-19T23:43:00Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49940/

SparkQA · 2021-11-22T16:25:16Z

Test build #145514 has finished for PR 34671 at commit db6a5b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-22T16:45:45Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49985/

SparkQA · 2021-11-22T17:32:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49985/

python/pyspark/ml/_typing.pyi

python/pyspark/ml/common.py

nchammas · 2021-11-22T18:42:32Z

The number of ignore[attr-defined] hints required seems a little wrong. But I suppose addressing that would require changes to SparkContext, which is out of scope for this PR.

SparkQA · 2021-11-22T18:55:42Z

Test build #145516 has finished for PR 34671 at commit ed56686.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2021-11-22T18:58:17Z

The number of ignore[attr-defined] hints required seems a little wrong. But I suppose addressing that would require changes to SparkContext, which is out of scope for this PR.

That's not optimal, but expected (see #34680 (comment)). If you encounter case where there is no ongoing migration work and you can avoid ignores, it should be OK to extend the stub. Otherwise, we'll do another pass (ignores on _jvm are likely to stay, even if different error code, because we cannot lurk into JVM at this stage).

python/pyspark/mllib/_typing.pyi

python/pyspark/mllib/common.py

zero323

Ignoring minor issues LGTM.

This reverts commit 2f54cf4.

SparkQA · 2021-11-22T19:14:36Z

Test build #145518 has finished for PR 34671 at commit 9b08ad0.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-22T19:38:58Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49988/

SparkQA · 2021-11-22T19:45:49Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49990/

SparkQA · 2021-11-22T20:06:14Z

Test build #145519 has finished for PR 34671 at commit 15b5fc4.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-22T20:20:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49988/

SparkQA · 2021-11-22T20:29:35Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49990/

SparkQA · 2021-11-22T20:43:25Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49991/

zero323 · 2021-11-22T21:11:08Z

python/pyspark/ml/common.py

@@ -15,11 +15,15 @@
 # limitations under the License.
 #

+from typing import Any, Callable
+from pyspark.ml._typing import C, JavaObjectOrPickleDump


This import should happen in TYPE_CHECKING block

from typing import TYPE_CHECKING if TYPE_CHECKING: from pyspark.ml._typing import C, JavaObjectOrPickleDump

Consequently, JavaObjectOrPickleDump and C have to be quoted when used ("JavaObjectOrPickleDump"), i.e.

def _java2py(sc: SparkContext, r: "JavaObjectOrPickleDump", encoding: str = "bytes") -> Any: ...

That's because objects in stubs have no runtime equivalents.

Ah OK. The TYPE_CHECKING block addresses the namespace pollution issue, I suppose.

But can you elaborate on why the type needs to be quoted? I understand that's for when the type is not known at that point in time (e.g. a self-referential type), but that isn't the case here.

In general, whatever goes into TYPE_CHECKING is not imported during normal execution. So as is, all the names we use would be undefined when these scripts are imported.

PEP 563 introduces a concept of postponed evaluation, but were not ready to go there yet (for starters, we still didn't formally drop 3.6 support ‒ I am working on cleaning the code, then we have some components that might require code changes).

SparkQA · 2021-11-22T21:25:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49991/

SparkQA · 2021-11-23T16:19:45Z

Test build #145550 has finished for PR 34671 at commit 61fe9fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-23T16:39:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50022/

SparkQA · 2021-11-23T17:24:34Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50022/

zero323

I'll merge it tomorrow, unless there are any further comments.

zero323 · 2021-11-24T10:54:48Z

Merged to master.

Thanks all!

github-actions bot added CORE ML MLLIB PYTHON labels Nov 19, 2021

nchammas commented Nov 19, 2021

View reviewed changes

python/pyspark/ml/common.py Outdated Show resolved Hide resolved

zero323 reviewed Nov 19, 2021

View reviewed changes

python/mypy.ini Outdated Show resolved Hide resolved

zero323 reviewed Nov 19, 2021

View reviewed changes

python/mypy.ini Outdated Show resolved Hide resolved

zero323 changed the title ~~[SPARK-37393][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py~~ [SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py Nov 20, 2021

merge common.pyi into common.py

db6a5b9

nchammas force-pushed the SPARK-37393-inline-ml-common-type-annotations branch from 8bb6757 to db6a5b9 Compare November 22, 2021 15:47

nchammas added 2 commits November 22, 2021 12:40

add missing types to ml/common.py

3826399

remove incorrect union

2715553

nchammas commented Nov 22, 2021

View reviewed changes

python/pyspark/ml/_typing.pyi Show resolved Hide resolved

python/pyspark/ml/common.py Show resolved Hide resolved

nchammas added 2 commits November 22, 2021 13:37

add remaining type annotations to mllib/common.py

1d06035

create JavaObjectType alias

ed56686

unused import

9b08ad0

zero323 reviewed Nov 22, 2021

View reviewed changes

python/pyspark/mllib/_typing.pyi Outdated Show resolved Hide resolved

zero323 reviewed Nov 22, 2021

View reviewed changes

python/pyspark/mllib/common.py Outdated Show resolved Hide resolved

undo JavaObjectType alias

2f54cf4

zero323 reviewed Nov 22, 2021

View reviewed changes

Revert "undo JavaObjectType alias"

408c7f1

This reverts commit 2f54cf4.

JavaObjectType -> JavaObjectOrPickleDump

15b5fc4

zero323 reviewed Nov 22, 2021

View reviewed changes

import types only during type checking

61fe9fb

zero323 approved these changes Nov 23, 2021

View reviewed changes

zero323 closed this in 3565c3a Nov 24, 2021

nchammas deleted the SPARK-37393-inline-ml-common-type-annotations branch November 24, 2021 15:30

zero323 mentioned this pull request Nov 24, 2021

[SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat #34354

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py #34671

[SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py #34671

nchammas commented Nov 19, 2021

zero323 left a comment

zero323 commented Nov 19, 2021

nchammas commented Nov 19, 2021

SparkQA commented Nov 19, 2021

SparkQA commented Nov 19, 2021

zero323 commented Nov 19, 2021 •

edited

Loading

SparkQA commented Nov 19, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

nchammas commented Nov 22, 2021

SparkQA commented Nov 22, 2021

zero323 commented Nov 22, 2021

zero323 left a comment

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

zero323 Nov 22, 2021 •

edited

Loading

nchammas Nov 22, 2021

zero323 Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

zero323 left a comment

zero323 commented Nov 24, 2021

[SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py #34671

[SPARK-37399][SPARK-37403][PySpark][ML] Merge {ml, mllib}/common.pyi into common.py #34671

Conversation

nchammas commented Nov 19, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zero323 left a comment

Choose a reason for hiding this comment

zero323 commented Nov 19, 2021

nchammas commented Nov 19, 2021

SparkQA commented Nov 19, 2021

SparkQA commented Nov 19, 2021

zero323 commented Nov 19, 2021 • edited Loading

SparkQA commented Nov 19, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

nchammas commented Nov 22, 2021

SparkQA commented Nov 22, 2021

zero323 commented Nov 22, 2021

zero323 left a comment

Choose a reason for hiding this comment

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

SparkQA commented Nov 22, 2021

zero323 Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

nchammas Nov 22, 2021

Choose a reason for hiding this comment

zero323 Nov 22, 2021

Choose a reason for hiding this comment

SparkQA commented Nov 22, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

SparkQA commented Nov 23, 2021

zero323 left a comment

Choose a reason for hiding this comment

zero323 commented Nov 24, 2021

zero323 commented Nov 19, 2021 •

edited

Loading

zero323 Nov 22, 2021 •

edited

Loading