[SPARK-34544][PYTHON] Convert PandasDataFrameLike and PandasSeriesLike to aliases of Pandas types#34927
[SPARK-34544][PYTHON] Convert PandasDataFrameLike and PandasSeriesLike to aliases of Pandas types#34927zero323 wants to merge 46 commits intoapache:masterfrom
Conversation
|
FYI @HyukjinKwon There is still a lot of work to done here (reduced typing errors from ~150 to <50 so far). (Also, commit messages contain explanations why certain fixes are needed, I'll include these here once I am closer to completion). |
|
Test build #146302 has finished for PR 34927 at commit
|
|
I like this!! |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
I've brief reviewed of the work, and so far looks pretty nice to me!! Let me revisit and take a closer look when it's ready to review. |
|
Test build #146332 has finished for PR 34927 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146339 has finished for PR 34927 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
I'll ping you once I am closer to completion ‒ I resolved most of the minor problems, but hit a bigger issue with |
|
Test build #146341 has finished for PR 34927 at commit
|
|
We can remove the pandas entry from |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146356 has finished for PR 34927 at commit
|
|
Test build #146357 has finished for PR 34927 at commit
|
|
Kubernetes integration test starting |
|
Test build #146358 has finished for PR 34927 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
Type cannot be more precise, because hole param is unbound
Issue with upstream annotations
Internal, not exposed in upstream hints
|
Test build #146370 has finished for PR 34927 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Test build #146445 has finished for PR 34927 at commit
|
|
Kubernetes integration test status failure |
itholic
left a comment
There was a problem hiding this comment.
Thanks for the bunch of cleanup!
|
Merged into master. Thanks all! |
### What changes were proposed in this pull request? This PR is a minor followup of #34927 that adds `pandas-stubs` dependency into `dev/requirements.txt`. ### Why are the changes needed? For easier development setup. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested as below: ```bash pip install -r dev/requirements.txt ``` Closes #35029 from HyukjinKwon/SPARK-34544. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR proposes replacing currently used
Protocols:PandasDataFrameLikePandasSeriesLikewith simple aliases of upstream types.
This exposed a number of typing issues, pirmairly around
pyspark.pandasAPI, which will be resolved in this PR.Additionally it adds VirtusLab/pandas-stubs to CI dependencies.
Why are the changes needed?
Currently used
Protocolswhere a workaround, included to improve typing coverage until Pandas exposes their type hints.In a meantime, relatively stable stubs sources emerged (with a lot of ongoing discussions around) and
pandas-stubspackage (available on PyPI and conda-forge) provides better coverage without adding maintenance overhead on our side.Does this PR introduce any user-facing change?
Better typing experience around Pandas UDFs.
How was this patch tested?
Existing typecheck pipeline and unit tests.