Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48068][PYTHON] mypy should have --python-executable parameter #46314

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented May 1, 2024

What changes were proposed in this pull request?

This PR aims to fix mypy failure by propagating lint-python's PYTHON_EXECUTABLE to mypy's parameter correctly.

Why are the changes needed?

We assumed that PYTHON_EXECUTABLE is used for dev/lint-python like the following. That's not always guaranteed. We need to use mypy's parameter to make it sure.

run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python

This patch is useful whose python3 chooses one of multiple Python installation like our CI environment.

$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root@2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19

For example, the following shows that PYTHON_EXECUTABLE is not considered by mypy.

root@18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
3428
root@18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root@18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this, @HyukjinKwon ?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48068][PYTHON] Fix mypy failure in Python 3.10 [SPARK-48068][PYTHON] Fix mypy failure in Python 3.10+ May 1, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48068][PYTHON] Fix mypy failure in Python 3.10+ [SPARK-48068][PYTHON] Fix mypy failure in Python 3.10 and 3.11 May 1, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as draft May 1, 2024 06:34
@dongjoon-hyun
Copy link
Member Author

Although I verified it locally with Python 3.9, CI seems to behave differently. Let me check more.

$ python3 --version
Python 3.9.18

$ dev/lint-python
starting python compilation test...
python compilation succeeded.

starting black test...
black checks passed.

starting PySpark custom errors check...
PySpark custom errors check passed.

starting flake8 test...
flake8 checks passed.

starting mypy annotations test...
annotations passed mypy checks.

starting mypy examples test...
examples passed mypy checks.

starting mypy data test...
annotations passed data checks.


all lint-python tests passed!

@HyukjinKwon
Copy link
Member

I have the same problem in my local but I think it's because of the Python version we're using

@HyukjinKwon
Copy link
Member

Maybe we should just upgrade our Python version in CI.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon .

Ya, let me upgrade it, @HyukjinKwon .

@dongjoon-hyun
Copy link
Member Author

Since Infra Image is using 3.9.19 already, let me upgrade Python version to Python 3.10 at linter job first.

$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
root@7983729755b6:/# python3.9 --version
Python 3.9.19

@github-actions github-actions bot added the BUILD label May 1, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48068][PYTHON] Fix mypy failure in Python 3.10 and 3.11 [SPARK-48068][PYTHON] mypy should have --python-executable parameter May 1, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review May 1, 2024 15:39
@dongjoon-hyun
Copy link
Member Author

For the record, I refocused this PR to --python-executable issue only in this PR. Python linter passed without any issues.

Screenshot 2024-05-01 at 10 11 51

@dongjoon-hyun
Copy link
Member Author

All linters passed.

Screenshot 2024-05-01 at 10 35 56

Could you review this PR, @viirya ?

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @huaxingao ?

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @huaxingao !
Merged to master/3.5/3.4.

dongjoon-hyun added a commit that referenced this pull request May 1, 2024
### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure.
https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment.
```
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 26c871f)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request May 1, 2024
### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure.
https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment.
```
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 26c871f)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-48068 branch May 1, 2024 17:51
@viirya
Copy link
Member

viirya commented May 1, 2024

Looks good to me. Thanks @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s `PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the following. That's not always guaranteed. We need to use `mypy`'s parameter to make it sure.
https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python installation like our CI environment.
```
$ docker run -it --rm ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --python-executable=python3.11 --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy --namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants