Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37421][PYTHON] Inline type hints for python/pyspark/mllib/evaluation.py #34680

Closed
wants to merge 9 commits into from

Conversation

dchvn
Copy link
Contributor

@dchvn dchvn commented Nov 22, 2021

What changes were proposed in this pull request?

Inline type hints for evaluation.py in python/pyspark/mllib/

Why are the changes needed?

We can take advantage of static type checking within the functions by inlining the type hints.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@dchvn
Copy link
Contributor Author

dchvn commented Nov 22, 2021

cc @zero323, @HyukjinKwon, @ueshin FYI. Many thanks!

java_model = java_class(df._jdf)
super(BinaryClassificationMetrics, self).__init__(java_model)

@property
@property # type: ignore[misc]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met error: Decorated property not supported [misc] when checking mypy for every @property. As mypy's issue #1362, I think we should ignore this.

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Test build #145503 has finished for PR 34680 at commit e686682.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • java_class = (
  • java_class = (
  • java_class = (
  • java_class = (

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49975/

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49975/

@zero323
Copy link
Member

zero323 commented Nov 22, 2021

Let's wait with this until we clear the backlog and all dependencies (to clearly handle this, and majority of modules from ml and mllib, we need SPARK-37094 to be completed) are annotated.

If there are any modules that have no direct dependencies on core, that's not an issue (though it seems like we already saturated our reviewing capabilities for this particular component), but I think we want to avoid having FOLLOW-UPs for each module, or worse, putting more strain on SPARK-37152

@zero323
Copy link
Member

zero323 commented Feb 21, 2022

Both RDD and core mllib modules should have stable annotations for the moment, so it might be a good idea to resync this and other PRs and check if everything still passes (and if any quotes, casts, and type ignores can be removed).

@dchvn
Copy link
Contributor Author

dchvn commented Feb 21, 2022

Can you have a look back in #37405 before moving to mllib? @zero323 , many thanks!

@dchvn
Copy link
Contributor Author

dchvn commented Feb 22, 2022

cc @zero323 , I have updated this PR, hope you review it again. Thanks

@dchvn
Copy link
Contributor Author

dchvn commented Feb 22, 2022

cc @HyukjinKwon , TIA

Copy link
Member

@zero323 zero323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't check this thoroughly, but at first glance looks OK.

We might change

class RankingMetrics(JavaModelWrapper):

to

class RankingMetrics(JavaModelWrapper, Generic[T]): 

though it doesn't make much difference, as far as I am aware.

@dchvn dchvn requested a review from zero323 February 24, 2022 03:16
@dchvn
Copy link
Contributor Author

dchvn commented Feb 24, 2022

Didn't check this thoroughly, but at first glance looks OK.

We might change

class RankingMetrics(JavaModelWrapper):

to

class RankingMetrics(JavaModelWrapper, Generic[T]): 

though it doesn't make much difference, as far as I am aware.

updated

@dchvn
Copy link
Contributor Author

dchvn commented Feb 24, 2022

@zero323 Could you please review the latest update to me as soon as possible?
many thanks

@zero323
Copy link
Member

zero323 commented Feb 26, 2022

Sorry @dchvn, but I won't be available for reviews at the moment. Given the situation, my properties are somewhere else (blame Putin).

@itholic, @ueshin, @xinrong-databricks ‒ would be so kind and take a look at the and remaining ML / MLLib tasks? Important prerequisites are already done, so the remaining PRs should go smoothly. I'll owe you one!

@dchvn
Copy link
Contributor Author

dchvn commented Feb 26, 2022

Sorry @dchvn, but I won't be available for reviews at the moment. Given the situation, my properties are somewhere else (blame Putin).

@itholic, @ueshin, @xinrong-databricks ‒ would be so kind and take a look at the and remaining ML / MLLib tasks? Important prerequisites are already done, so the remaining PRs should go smoothly. I'll owe you one!

Hope you are not affected by the war, good luck!

@zero323
Copy link
Member

zero323 commented Feb 27, 2022

Hope you are not affected by the war, good luck!

Oh, I am on the safe side of the border, so I am OK. There is just tons of stuff to be done here, will all the people coming.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

Btw, maybe we can create the ticket to track the https://github.com/apache/spark/pull/34680/files#r754003277 so that we can resolve the current ignores after mypy issue resolving ??

@zero323
Copy link
Member

zero323 commented Mar 6, 2022

Btw, maybe we can create the ticket to track the https://github.com/apache/spark/pull/34680/files#r754003277 so that we can resolve the current ignores after mypy issue resolving ??

@itholic I was actually thinking about disallowing dead casts and ignores completely #35740. This should handle properties as well, in case it is solved upstream.

@@ -61,7 +67,7 @@ class BinaryClassificationMetrics(JavaModelWrapper):
0.88...
"""

def __init__(self, scoreAndLabels):
def __init__(self, scoreAndLabels: RDD[Tuple[float, float]]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. No need for -> None here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated! Thanks

Copy link
Member

@zero323 zero323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a quick pass and it looks good, as long as #34680 (review) is addressed.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good except #34680 (comment)

@dchvn dchvn requested a review from zero323 March 8, 2022 06:40
@itholic
Copy link
Contributor

itholic commented Mar 9, 2022

LGTM. cc @zero323

@zero323 zero323 closed this in 62e4c29 Mar 9, 2022
@zero323
Copy link
Member

zero323 commented Mar 9, 2022

Merged into master.

@dchvn
Copy link
Contributor Author

dchvn commented Mar 9, 2022

Thanks all! Hope you review all my PR in SPARK-37396 and SPARK-37395

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Mar 10, 2022
…uation.py

### What changes were proposed in this pull request?
Inline type hints for evaluation.py in python/pyspark/mllib/
### Why are the changes needed?
We can take advantage of static type checking within the functions by inlining the type hints.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests

Closes apache#34680 from dchvn/SPARK-37421.

Lead-authored-by: dch nguyen <dchvn.dgd@gmail.com>
Co-authored-by: dch nguyen <dgd_contributor@viettel.com.vn>
Signed-off-by: zero323 <mszymkiewicz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants