Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48107][PYTHON] Exclude tests from Python distribution #46354

Closed
wants to merge 1 commit into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented May 3, 2024

What changes were proposed in this pull request?

Change the Python manifest so that tests are excluded from the packages that are built for distribution.

Why are the changes needed?

Tests were unintentionally included in the distributions as part of #44920. See this comment.

Does this PR introduce any user-facing change?

No, since #44920 hasn't been released to any users yet.

How was this patch tested?

I built Python packages and inspected SOURCES.txt to confirm that tests were excluded:

cd python
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python3 packaging/classic/setup.py sdist
python3 packaging/connect/setup.py sdist
find dist -name '*.tar.gz' | xargs -I _ tar xf _ --directory=dist
cd ..
open python/dist
find python/dist -name SOURCES.txt | xargs code

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the PYTHON label May 3, 2024
graft pyspark
recursive-include pyspark *.pyi py.typed *.json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon - Instead of this, I prefer to exclude tests as follows:

$ diff python/MANIFEST-json.in python/MANIFEST-notests.in 
19c19
< recursive-include pyspark *.pyi py.typed *.json
---
> graft pyspark
31a32
> global-exclude **/tests/**

That way, if we add new files in the future that don't happen to match the glob expressions on this line they will still be picked up.

But I believe you prefer the approach taken here, so I've started with that.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

:-)

@dongjoon-hyun
Copy link
Member

:)

@nchammas nchammas deleted the SPARK-48107-package-json branch May 3, 2024 16:35
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

Change the Python manifest so that tests are excluded from the packages that are built for distribution.

### Why are the changes needed?

Tests were unintentionally included in the distributions as part of apache#44920. See [this comment](https://github.com/apache/spark/pull/44920/files#r1586979834).

### Does this PR introduce _any_ user-facing change?

No, since apache#44920 hasn't been released to any users yet.

### How was this patch tested?

I built Python packages and inspected `SOURCES.txt` to confirm that tests were excluded:

```sh
cd python
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python3 packaging/classic/setup.py sdist
python3 packaging/connect/setup.py sdist
find dist -name '*.tar.gz' | xargs -I _ tar xf _ --directory=dist
cd ..
open python/dist
find python/dist -name SOURCES.txt | xargs code
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46354 from nchammas/SPARK-48107-package-json.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants