Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fetch trove-classifiers from the web #31

Closed
FRidh opened this issue Mar 30, 2022 · 8 comments · Fixed by pypa/setuptools#3229
Closed

Don't fetch trove-classifiers from the web #31

FRidh opened this issue Mar 30, 2022 · 8 comments · Fixed by pypa/setuptools#3229

Comments

@FRidh
Copy link

FRidh commented Mar 30, 2022

In order for builds to be reproducible, it means everything that is being checked and build will be checked and build consistently when repeated. When downloading the trove-classifiers from the web, it is (theoretically) possible that a validation pass one time and fail another. This should be avoided.

Furthermore, downstreams such as distributors do a lot of effort to avoid unwanted network lookups. We should not be adding more.

Note that if there is a setuptools option to disable this, this could make distributors happy!

By the way, I put it here instead of setuptools as the code path is in here if I am correct.

@abravalheri
Copy link
Owner

Hi @FRidh, thank you very much for reporting this.

Adding an option seems very reasonable, but I have to engineer a way of passing them through the chain.

Currently there is a way of achieving that, but it involves setting a environment variable: NO_NETWORK. Does that work for you?

@FRidh
Copy link
Author

FRidh commented Mar 30, 2022

Best ask other redistributors what their point of view is on this matter. Let's start with arch, cc @FFY00
For me that works, but as you said, in the end this needs to be in setuptools.

@FFY00
Copy link

FFY00 commented Mar 30, 2022

Yup, the NO_NETWORK environment variable as it is currently implemented should be perfectly fine for us. Note that I haven't really tried it, but looking at the code I don't see any architectural reason why it wouldn't.

@abravalheri
Copy link
Owner

abravalheri commented Mar 30, 2022

For me that works, but as you said, in the end this needs to be in setuptools.

Hi @FRidh, I am not sure if I understood this correctly.

I am planning to disable the trove-classifiers when running setuptools tests.
However disabling trove-classifiers by default every time that the validations are running via setuptools seems like a drop in functionality to me.

Isn't it a good thing that packages are having their classifiers validated by default during the build? I would say that is a feature...

validate-pyproject will not fail if it does not manage to download files, so it does play nice even if the end user is offline.


When running the tests people can opt out of this particular behaviour by setting the environment variable, or have a more consistent one by installing a pinned version of the trove-classifiers package in the build environment.

@FRidh
Copy link
Author

FRidh commented Mar 30, 2022

Isn't it a good thing that packages are having their classifiers validated by default during the build? I would say that is a feature...

Absolutely! However, fetching something from the web is not the way, since as I mentioned, it can affect reproducibility. When a classifier is in the future deprecated, it will fail the validation and thereby the build, right?

I'm going to drop this here. https://reproducible-builds.org/.

@abravalheri
Copy link
Owner

abravalheri commented Mar 30, 2022

Thank you very much @FRidh.

I feel like if the users are interested in reproducibility, it would be fair to expect them to pin trove-classifiers in the build environment.

However I understand that this is not something you cannot teach easily and it is very easy to get wrong, so is just easier to sacrifice the classifier validation for the sake of pragmatism.

Unfortunately 😢

@FRidh
Copy link
Author

FRidh commented Mar 30, 2022

Thanks for your understanding!

I feel like if the users are interested in reproducibility, it would be fair to expect them to pin trove-classifiers in the build environment.

They would have to know about that then. You could ask this for this package, but what about every other package out there?

The issue really is that there can be a near infinite amount of ways impurities are introduced. So in this case, while providing an environment variable is a good idea, it becomes something those that bother with reproducibility will need to be aware of. Knowing these aspects of every package (and it's vendored dependencies!) is not doable. Many fixes have been in the past years by a lot of people to achieve reproducible builds that it would be a pity to take a step back, even for something as relatively small as this.

@abravalheri
Copy link
Owner

It is a pity to loose the functionality but I understand. I will work on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants