Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of tests in Python package installations is causing collisions #81482

Closed
Shados opened this issue Mar 2, 2020 · 13 comments
Closed

Inclusion of tests in Python package installations is causing collisions #81482

Shados opened this issue Mar 2, 2020 · 13 comments

Comments

@Shados
Copy link
Contributor

@Shados Shados commented Mar 2, 2020

Describe the bug
Many Python packages store their tests in a top-level tests package, and include this in their sdist tarballs, which is typically what nixpkgs installs from. Combining said packages in a Python environment thus results in multiple tests packages being installed, with a decent chance of collision.

To Reproduce
Example collision:

shados@forcedperspective[~] λ nix-build -E 'with import <nixpkgs> { }; python38.withPackages(ps: with ps; [poetry])'
these derivations will be built:
  /nix/store/bfkjlxi7v6amg0kf2jyx69j3nkxnwkh4-python3-3.8.1-env.drv
building '/nix/store/bfkjlxi7v6amg0kf2jyx69j3nkxnwkh4-python3-3.8.1-env.drv'...
collision between `/nix/store/7fmcrha8ars9zq5lpakad3cf6yc2nmlv-python3.8-tomlkit-0.5.8/lib/python3.8/site-packages/tests/__pycache__/__init__.cpython-38.pyc' and `/nix/store/wmzbyb7r9qf4g3wqxamv1b228zdk603v-python3.8-cachy-0.3.0/lib/python3.8/site-packages/tests/__pycache__/__init__.cpython-38.pyc'
builder for '/nix/store/bfkjlxi7v6amg0kf2jyx69j3nkxnwkh4-python3-3.8.1-env.drv' failed with exit code 255
error: build of '/nix/store/bfkjlxi7v6amg0kf2jyx69j3nkxnwkh4-python3-3.8.1-env.drv' failed

Expected behavior
I would expect the above command to succeed, and produce a working Python environment.

Additional context
Arguably, Python packages that wish to include their tests in their sdist tarball should not be marking them as an installable package. However, this practice doesn't really cause any issues outside of Nix-land; tests are never run from the installed tests package, and we're unfortunately the only ones who care about file collisions.

As a result, I'm not sure we'd have much luck convincing package maintainers to alter their behaviour, so we might be better off working around the issue in nixpkgs. Possibly we could either prevent installation of tests somehow, or remove the tests package in post-install.

Metadata

  • system: "x86_64-linux"
  • host os: Linux 5.5.0, NixOS, 20.09.git.353f2215dc6 (Nightingale)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.3.3
  • channels(shados): "home-manager"
  • channels(root): "nixos-20.09pre215333.42f0be81ae0"
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

@FRidh

@Shados Shados added the 0.kind: bug label Mar 2, 2020
@FRidh

This comment has been minimized.

Copy link
Member

@FRidh FRidh commented Mar 2, 2020

They're included in a tarball through MANIFEST file. They are installed depending on the contents of setup.py. Maybe if if find_packages is used, and a tests folder is present, it is considered a package. I have not checked this, but if it is the case, then this would typically be the case when fetching from say GitHub. cc @jonringer

@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 2, 2020

this is likely an upstream issue, the test directories should be getting filtered out with

setup(
...
find_packages(...,exclude=["tests"])`
@FRidh

This comment has been minimized.

Copy link
Member

@FRidh FRidh commented Mar 2, 2020

You are right there, though most won't bother with that because they don't add tests to MANIFEST to begin with.

@Shados

This comment has been minimized.

Copy link
Contributor Author

@Shados Shados commented Mar 3, 2020

In the case of tomlkit, specifically, it's using poetry, with the following configuration:

packages = [
    {include = "tomlkit"},
    {include = "tests", format = "sdist"}
]

This appears to be poetry's officially documented approach for including tests in the sdist tarball, and the resultant generated setup.py in the sdist tarball includes:

packages = \
['tests', 'tomlkit']

So I suspect we'd see this with a lot of poetry packages.

@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 3, 2020

.............. If this installs "tests" into site-packages, then this is just wrong

@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 3, 2020

what's the benefit to using poetry? Seems to be switching one packaging paradigm with a fair amount of gotchas with another. Syntax?

@jtrakk

This comment has been minimized.

Copy link

@jtrakk jtrakk commented Mar 3, 2020

Poetry provides a static declarative syntax, virtualenv management, a dependency-version resolver, lockfiles, and publication to PyPI, none of which are offered by setuptools. poetry2nix is a new tool for helping Nix use Poetry-based projects by pulling the hashes from the lockfiles.

@jtrakk

This comment has been minimized.

Copy link

@jtrakk jtrakk commented Mar 3, 2020

That include = "tests" documentation does seem likely to lead to conflicts.

jtrakk added a commit to jtrakk/poetry that referenced this issue Mar 3, 2020
If multiple distributions have `tests` as a top-level package, they'll conflict whenever both are installed. (Examples [here]((https://github.com/python-poetry/poetry/issues/1905) and [here](NixOS/nixpkgs#81482). Two common alternative strategies are:

1. not distributing tests (as [here](https://github.com/pypa/sampleproject)), or 
2. placing tests in a subdirectory of the main package, rather than adjacent (as [here](http://blog.habnab.it/blog/2013/07/21/python-packages-and-you/) and [here](http://as.ynchrono.us/2007/12/filesystem-structure-of-python-project_21.html)). Each of these strategies will avoid this issue. 

Users may fall into this trap because of the [package documentation](https://python-poetry.org/docs/pyproject/#packages) page, which gives an example:

```
packages = [
    { include = "my_package" },
    { include = "tests", format = "sdist" },
]
```

Having two top-level packages in a distribution is relatively unusual, but does have some use cases.  Using `"tests"` as a top-level package name in the example is likely to lead to conflicts, however. The alternate package in the documentation example could have a unique name like `"my_other_package"`, which would reduce the likelihood of this kind of overlap.
@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 3, 2020

I won't call the python tooling "cohesive", but I think all of these have analogous tools/solutions:

  • static declarative syntax -> setup.cfg (although user experience..... is not great, and toml isn't significantly better)
  • virtualenv managment -> venv, pipenv, conda
  • dependency-version resolver -> both setuptools, conda and pip are able to resolve ranges, (although pip is more "aware" of transitive dependencies)
  • lockfiles -> pip freeze > requirements-frozen.txt, i think conda has something
  • publication to PyPI -> twine

So yes, none of which are offered by setuptools is generally correct. Maybe poetry is the future of python tooling. But, to the defense of setuptools a lot of the changes coincided with PEP's which have organically evolved over time. Individually, these PEPs make a lot of sense and solve the issues they are tying to address with the constraints given. Holistically, this has made somewhat of an ergonomic mess, for example: requirements can be in setup.py, requirements.txt, setup.cfg, flit.ini(deprecated) or a pyproject.toml.

This is in contrast to rust with cargo. Where not only does it address all the points you raised, but is also the canonical tool for all rust development and packaging. Maybe poetry will be python's cargo. I hope so.

@jtrakk

This comment has been minimized.

Copy link

@jtrakk jtrakk commented Mar 3, 2020

Yep, I agree on all of those :-)

@jtrakk

This comment has been minimized.

Copy link

@jtrakk jtrakk commented Mar 3, 2020

I've submitted a PR to Poetry's docs to avoid user confusion. The particular case of tomlkit has already been reported upstream.

@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 14, 2020

#81538 has made it's way to master, this now works on master

$ nix-build -E 'with import <nixpkgs> { }; python38.withPackages(ps: with ps; [poetry])'
/nix/store/qgn56a6gc5swg8h766vwf0mf0wp9prv3-python3-3.8.2-env
@jonringer jonringer closed this Mar 14, 2020
@jonringer

This comment has been minimized.

Copy link
Contributor

@jonringer jonringer commented Mar 14, 2020

thank you @Shados for opening the issue

stephsamson pushed a commit to python-poetry/poetry that referenced this issue Mar 29, 2020
If multiple distributions have `tests` as a top-level package, they'll conflict whenever both are installed. (Examples [here]((https://github.com/python-poetry/poetry/issues/1905) and [here](NixOS/nixpkgs#81482). Two common alternative strategies are:

1. not distributing tests (as [here](https://github.com/pypa/sampleproject)), or 
2. placing tests in a subdirectory of the main package, rather than adjacent (as [here](http://blog.habnab.it/blog/2013/07/21/python-packages-and-you/) and [here](http://as.ynchrono.us/2007/12/filesystem-structure-of-python-project_21.html)). Each of these strategies will avoid this issue. 

Users may fall into this trap because of the [package documentation](https://python-poetry.org/docs/pyproject/#packages) page, which gives an example:

```
packages = [
    { include = "my_package" },
    { include = "tests", format = "sdist" },
]
```

Having two top-level packages in a distribution is relatively unusual, but does have some use cases.  Using `"tests"` as a top-level package name in the example is likely to lead to conflicts, however. The alternate package in the documentation example could have a unique name like `"my_other_package"`, which would reduce the likelihood of this kind of overlap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.