Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Discuss removing unnecessary files from collections #126

Closed
gotmax23 opened this issue Aug 10, 2022 · 15 comments
Closed

Discuss removing unnecessary files from collections #126

gotmax23 opened this issue Aug 10, 2022 · 15 comments

Comments

@gotmax23
Copy link
Contributor

Summary

Many collections install a bunch of unnecessary files. Just look at the number of files the Fedora ansible-collection-kubernetes-core has to remove:

# Exclude some files from being installed
# Last last yaml key is build_ignore: so this works, despite being a bit hacky.
cat << EOF >> galaxy.yml
  - .github
  - .gitignore
  - .yamllint
  - setup.cfg
  - codecov.yml
  - tox.ini
  - Makefile
  - tests
EOF

(The package also remove docs and licenses from the built collection, as we install them under /usr/share/doc and /usr/share/licenses, but that's not relevant here).

@dmsimard and I submitted PRs 1 2 to community.general to remove unnecessary files from that collection, but @felixfontein claimed there were some legal issues with doing so.

I really don't think there are legal issues here. The collection is a built artifact and doesn't (and already isn't given e.g. the missing galaxy.yml) need to be a full representation of the source code. As long as the collection includes a link to the source, it fulfills the GPL obligations.

I propose that the steering committee allow and encourage collections to drop unnecessary files from the built collection artifacts (i.e. include them in build_ignore).

@mariolenz
Copy link
Contributor

@felixfontein said:

The legal problem still is (if I remember correctly): the collection tarball acts both as the source and the installable artefact. Excluding files used for the development from it would violate the GPL.

I don't understand why the collection tarball should be both the source and the installable artifact. It's only the artifact, the source is somewhere on Github or Gitlab or (does this still exist?) Sourceforge... or wherever.

As far as I know, GPL isn't violated as long as you provide the source code somehow. You don't have to provide it in the same package. Otherwise, all Linux distributions would violate the GPL because they typically have the compiled kernel and C libraries under GPL in packages without including the source code in the same package.

As @bcoca said:

even including a URI to those files should be sufficient, the files and any changes must be available, but it does not mean every possible form of distribution of the product must have them ... otherwise distributing binary packages would be untenable.

Yep, you have to provide the sources. A URI to them is sufficient.

@gotmax23
Copy link
Contributor Author

gotmax23 commented Aug 10, 2022

Explanation for RH Legal

Ansible collections contain modules and other types of Ansible plugins. These plugins are usually written in either Python or Powershell. These collections are built using ansible-galaxy collection build. This command takes the configuration from a galaxy.yml file and creates a tarball containing all of the files (including tests and other development files) in the collection's repository besides galaxy.yml and any files that are manually excluded. The tarball also contains some metadata files that are generated as part of the build process. These collection artifacts are then published to Ansible Galaxy (https://galaxy.ansible.com).

The question we have is whether we can exclude the tests and other development files from the built collection artifact and fulfill our GPL obligations by linking to the actual source code on Github (or to whatever Git forge the project is hosted on). Note that collections already link to their respective source repository, and the community collection guidelines require that all collection releases are tagged in said repositories.

@felixfontein
Copy link
Contributor

This might be linked with legal questions regarding the ansible community package. Namely we need to provide the sources of that one as well, and I guess the question is what would count as sources. Right now, we assume that the Galaxy tarballs are the sources of the collections, and thus the ansible tarball is the source distribution of ansible since it contains the contents of the Galaxy tarballs for all collections included.

If the tarballs on Galaxy would not be the sources, we might have to actively engage in collecting the sources of the collections (next to the collections themselves) to be able to provide the sources of the ansible package.

(At least I'm not remembering that this was part of the discussion as well. There might have been more points which I do not remember right now.)

@bcoca
Copy link

bcoca commented Aug 11, 2022

Yet you are not providing the sources to build 'ansible' itself (aka anstibull). In any case, I think this is your choice as packager, other people are free to repackage in other ways (no docs, no tests).

It would not be the first time, ansible has had an 'ansible-doc' package (with the html docs from site) in many distros for a long time.

Another option is to have ansible-galaxy install w/o tests/docs (not extract them from package), but that would be a core feature instead.

@felixfontein
Copy link
Contributor

@bcoca the ansible tarball contains a script that will re-build the tarball using antsibull (templated from https://github.com/ansible-community/antsibull/blob/main/src/antsibull/data/build-ansible.sh.j2).

@bcoca
Copy link

bcoca commented Aug 11, 2022

yet you still don't distribute anstibull in it, my point was not to force you to do so, but to show you you don't need to distribute everything in it.

@felixfontein
Copy link
Contributor

@bcoca I would be really glad if RH legal would agree with that. That would make our life a lot easier. But for some reason they didn't wanted to do that so far (I don't even know what exactly they were asked, or if they actually read the question(s), so 🤷).

@bcoca
Copy link

bcoca commented Aug 11, 2022

wasn't that posted above? In any case I would keep the current distribution 'as is' and if someone wants an 'ansible-lean' that moves out docs/tests ... DIY?

@gotmax23
Copy link
Contributor Author

Thanks for bringing up the ansible community package, @felixfontein. I did not think about that when bringing this up, as I thought about it as a separate issue.

This might be linked with legal questions regarding the ansible community package. Namely we need to provide the sources of that one as well, and I guess the question is what would count as sources. Right now, we assume that the Galaxy tarballs are the sources of the collections, and thus the ansible tarball is the source distribution of ansible since it contains the contents of the Galaxy tarballs for all collections included.

Would it be practical to just rebuild the collections in the ansible community package from the source repositories to begin with? All collections are already supposed to tag releases in the source repository1. We could require that all tags exactly match (with an optional leading v) the version from the galaxy.yml.

Either way, it shouldn't be too hard to provide a separate source tarball that includes all of the tests and development files and then remove them from the sdist. So many upstreams do already do this that we have a specific guideline about handling it in the Fedora Python Packaging Guidelines. Currently, ansible uses some deprecated setuptools hackery (it's hackery in my opinion) to remove the unnecessary from the built wheel that is deprecated and still leaves a fair amount of cruft included2. It would be a lot easier if we didn't have to do this.

If the tarballs on Galaxy would not be the sources, we might have to actively engage in collecting the sources of the collections (next to the collections themselves) to be able to provide the sources of the ansible package.

This is basically what I said above. Even if RH Legal says we don't have to, I still think we should try to gather the full sources including the tests so upstream and downstreams can run the tests for the whole package. Currently, Fedora does not run the tests in the ansible package (we may in the future), but I know that the Debian package runs some of the unit tests. We run unit tests in the RPM build for our packaged standalone collections where we already get the source from Github and build the collections (i.e. ansible-galaxy collection build and ansible-galaxy collection install) from source.

Footnotes

  1. Notably, the fortinet collections don't do this and instead have an inane version per branch structure.

  2. I know, because @dmsimard and I have taken lengths to remove the remaining unnecessary files from the Fedora ansible packages.

@bcoca
Copy link

bcoca commented Aug 15, 2022

One thing about spiting the tests/docs/etc into their own package, this does not work well for collection nor pip formats as both tend to assume that each package owns it's directory and are not friendly to 'sharing' and those files are expected to be within the collection to be functional.

@felixfontein
Copy link
Contributor

Not directly related, by related and potentially of interest: #65 and ansible/ansible#78422

@sivel
Copy link

sivel commented Aug 18, 2022

If the plans are to extend the ignores in the collection_template, switching to using manifest over build_ignore would be an improvement, even without listing any explicit directives, as the defaults do a pretty good job on their own.

https://docs.ansible.com/ansible-core/devel/dev_guide/developing_collections_distributing.html#manifest-directives

The default manifest directives would already exclude what is in the build_ignore in collection_template. It would also already exclude all of the extra build_ignores referenced at the top of this issue other than tests are included by default.

See also ansible-collections/news-for-maintainers#22

@gotmax23
Copy link
Contributor Author

It would also already exclude all of the extra build_ignores referenced at the top of this issue other than tests are included by default.

Thanks for the update, @sivel. That's definitely an improvement! I do wish we could figure out the tests issue, but I'm happy to see progress.

@mariolenz
Copy link
Contributor

@gotmax23 Please close this issue if done, or open a new forum topic and then close the issue with a pointer to the new discussion: Community-topics: Archiving the repo

@Andersson007
Copy link
Contributor

as @gotmax23 is notified, closing, thanks everyone!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants