Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Pin only minimum versions of pip and setuptools #2

Open
blag opened this issue Apr 7, 2020 · 4 comments
Open

Proposal: Pin only minimum versions of pip and setuptools #2

blag opened this issue Apr 7, 2020 · 4 comments

Comments

@blag
Copy link

blag commented Apr 7, 2020

TL;DR

If we use our tools properly, we might be able to relax their version requirements a bit, take advantage of their new features, and allow our transitive dependencies to take advantage of the new features as well, all without breaking the build all of the time.

Background

PR #4895

This came up in PR StackStorm/st2#4895.

In that PR, pip and setuptools were pinned to specific versions:

$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "pip==19.3.1"
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "setuptools==41.0.1"

The setuptools version being pinned that tightly caused a bug when a transitive dependency (read: a dependency of a dependency) was updated to only support Python 3. This happened twice in recent history:

First case

  • python-mistralclient depended on
    • osc-lib >= 1.8.0 (resolved to 2.0.0) which depended on
      • openstacksdk >= 0.15.0 (resolved to 0.44.0) which depended on
        • futurist >= 2.1.0 which does not support for Python 2

Second case

As noted in this PR to importlib-metadata, all of these dependencies properly support the Requires-Python metadata field.

Example

It's easier to give an example than describe it in prose. This means that in a Python project's setup.py file, they can specify the requires_python keyword argument to setup(), like so:

from setuptools import setup

setup(
    ...
    requires_python='>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*',
    ...
)

That means that pip version 9 or later, when installing packages in a Python 2.7 environment, will ignore any potential packages that do not specify they can run in Python 2.7.

Building ST2 in Travis

However, some of our "build" (eg: CI) commands involve installing packages, but do not involve pip. We run python setup.py develop and python setup.py install in a few different parts of the `Makefile:

All of these commands don't rely on pip to install packages, they rely on setuptools to do so itself. However, the problem comes in when...

The problem

setuptools < 42 doesn't support requires_python keyword argument

The setuptools project added support for requires_python in version 42.0.0.

This means that when we pin setuptools to a specific version, we are guaranteeing that, when transitive dependencies of ST2 start relying on features of more modern versions of our tools, we will miss out on those changes, and our transitive dependencies will break our build.

Problems with other solutions

There are some potential workarounds.

Pin transitive dependency versions

We could pin the versions of the futurist and configparser to versions that still support Python 2. This would ensure that both older versions of pip and setuptools do not break the Python 2.7 build.

However, this approach is not scalable. We would have to track layers upon layers of dependencies, even though we do not directly depend on or import those packages. Furthermore, this problem is only going to get worse and worse as more and more projects drop support for Python 2.7. Each one will cause a build failure for us, and we will have to troubleshoot, solve, test, review, merge, and test every time that happens. This is not a good use of anybody's time.

Use pip-compile (from pip-tools) to catch dependency conflicts

As implemented in StackStorm/st2#4895, we now use pip-compile to check for dependency version conflicts across our entire installation. The output of pip-compile is much more complete and explicit than the previous output of pipconflictchecker.

But this solution only checks that our dependencies don't conflict. Our setuptools commands will continue to try to install dependencies that are Python 3-only, even in our Python 2.7 tests.

The proposal

The only workable solution was to update setuptools to version 42.0.0 (or later). Now, both pip and setuptools respected project's requires_python field, and when run in Python 2.7 environments, only try to install packages that denote compatibility with that version of Python.

However, pinning setuptools, or pip as well, to a specific version just creates the same situation in the future. If there is some other feature that is introduced to pip or setuptools, and any of our transitive dependencies start using that feature before we can update our build to rely on up-to-date versions of pip and setuptools, those changes will again cause our builds to fail.

Instead of pinning pip and setuptools to specific versions during the build, I propose that we instead only specify their known minimum versions. Specifying pip >= 9.0and setuptools >= 42.0.0 would ensure that whatever version of pip is installed at least supports Requires-Python. As pip and setuptools release new versions that support new features, our builds would automatically incorporate those versions in, and our transitive dependencies could use those features as soon as they were released without breaking our build.

Potential drawbacks

So what if pip or setuptools release versions that break things?

That's a valid concern, and it's something that has happened in the past, and it has also - wait for it - broken our build.

But I think pip and setuptools are used by nearly every single project that they have a pretty good test infrastructure, and excellent canaries if they do inadvertently release anything that isn't backwards compatible.

This isn't the only way for updates to break our stuff. We used to rely on the internals of pip being available at build time to parse requirements.txt files:

GET_PIP = 'curl https://bootstrap.pypa.io/get-pip.py | python'

try:
    import pip
    from pip import __version__ as pip_version
except ImportError as e:
    print('Failed to import pip: %s' % (text_type(e)))
    print('')
    print('Download pip:\n%s' % (GET_PIP))
    sys.exit(1)

try:
    # pip < 10.0
    from pip.req import parse_requirements
except ImportError:
    # pip >= 10.0

    try:
        from pip._internal.req.req_file import parse_requirements
    except ImportError as e:
        print('Failed to import parse_requirements from pip: %s' % (text_type(e)))
        print('Using pip: %s' % (str(pip_version)))
        sys.exit(1)

Previous workaround

However, that must have broken when pip updated, because we I assume that's why we now parse requirements.txt files ourselves:

    with open(requirements_file_path, 'r') as fp:
        for line in fp.readlines():
            line = line.strip()

            if line.startswith('#') or not line:
                continue

            link, req_name = _get_link(line=line)

            if link:
                links.append(link)
            else:
                req_name = line

                if ';' in req_name:
                    req_name = req_name.split(';')[0].strip()  # <-- new code to support environment markers

            reqs.append(req_name)

Problems with the workaround

But note that parsing requirements.txt files ourselves also didn't work when I tried to use environment markers in requirements.txt files:

singledispatch ; python_version < 3.4

I think that this is one reason why we think we have to specify a specific version of pip: because we import from the private module pip._internals.

A better way

But we can actually refactor fetch_requirements to use the packaging library to parse requirements.txt lines. Doing so would have automatically helped us parse environment markers instead of doing it ourselves (which, even with my improvement, is still brittle code that probably still has bugs).

Not so fast...

However, the dist_utils.py file also comes with an ominous note:

# NOTE: This script can't rely on any 3rd party dependency so we need to use this code here

There is no further definitive explanation for that note that I have been able to find. However, my current working theory is that when that code is being run, the virtualenv has been created and activated, but third party dependencies have not yet been installed via pip or setuptools, so all imports from third party packages will fail. We need a way to install dependencies during "build" time...

Using setuptools properly

After a little digging, I found that setuptools supports specifying build-time dependencies in the setup_requires keyword argument to setup() (copying and pasting here since there isn't an anchor to that section on the linked page):

setup_requires

A string or list of strings specifying what other distributions need to be present in order for the setup script to run.
setuptools will attempt to obtain these (using pip if available) before processing the rest of the setup script or com-
mands. This argument is needed if you are using distutils extensions as part of your build process; for example, exten-
sions that process setup() arguments and turn them into EGG-INFO metadata files.

(Note: projects listed in setup_requires will NOT be automatically installed on the system where the setup script is being
run. They are simply downloaded to the ./.eggs directory if they’re not locally available already. If you want them to be in-
stalled, as well as being available when the setup script is run, you should add them to install_requires and
setup_requires.)

So, instead of writing all of this brittle NIH code, and instead of pinning pip to a specific version so we can import its private internals (and if that sounds gross to you - good, that's very much the intent!), I believe all we need to do is specify the setup_requires argument to setup() in our subpackages:

from setuptools import setup

setup(
    ...
    setup_requires=['packaging'],
    ...
)

and then specifying a minimum version constraint for both pip and setuptools.

Other Reasons

There are probably other reasons why we pin to specific versions, and I'd like a chance to investigate them fully and discuss them here. As demonstrated, some of our code is working around problems that have been solved in better ways by more recent versions of our tools, and we should use our tools as best we can instead of trying (and failing) to duplicate them.

I've realized that some of our inline comments and developer documentation are not as detailed as necessary to fully understand some of the constraints, so at the very least we need to get better at fully explaining ourselves in pull requests, developer documentation, but ideally in comments in the code. We cannot continue to carry around this information in our heads, we need to get better at communicating our rationale for major decisions in the code itself.

Possible compromise

Now, some of this might be a "bridge too far" for some people. Reproducible builds are becoming more and more important, and I think we should also support things like that.

To that end, one possible compromise would be to pin to specific versions of pip and setuptools only when we release a version of StackStorm. For example, ST2 v3.1 was build with pip == 19.3.1 and setuptools == 41.0.1. And then during development, we would only specify a minimum version so we continue to incorporate the most recent versions of our tools as they are released. So the git branch for ST2 v3.1.0 would pin to specific versions of pip and setuptools in the Makefile:

git checkout v3.1.0

pip install --upgrade "pip == 19.3.1"
pip install --upgrade "setuptools == 41.0.1"

and the git master branch would only specify minimum versions for pip and setuptools in the Makefile:

git checkout master

pip install --upgrade "pip >= 20"
pip install --upgrade "setuptools >= 42.0.1"

Updating and pinning the versions could be incorporated into the release process.

This would still allow us to achieve completely reproducible builds while preventing the development branch from breaking all the time.

Further Reading

  • What the heck is pyproject.toml - I need to read more up on this, but the Python ecosystem seems to be moving to use pyproject.toml instead of setup.py. This post goes through the problems with setuptools, the chicken-and-egg problem, and how pyproject.toml solves these problems. I'm not done reading it yet (tons of interrupts all day), but the first few paragraphs looked promising. If this is The Way™️ we should go, then we will need to update our dependency updating scripts and Makefile around pyproject.toml files.
@blag
Copy link
Author

blag commented Apr 27, 2020

Related: StackStorm/st2#4750

@nmaludy
Copy link
Member

nmaludy commented Apr 29, 2020

Looks like master is failing: https://travis-ci.org/github/StackStorm/st2/jobs/680685353#L1059

We just noticed our internal CI started failing last night with the same error.

Looks like not pinning pip is going to bite us on this one.

@nmaludy
Copy link
Member

nmaludy commented Apr 29, 2020

As the person who just had to fix this as a "hot fix". I've changed my mind and would prefer to keep this pinned in the future. Then, we can upgrade and fix "on our own time"

@arm4b
Copy link
Member

arm4b commented May 4, 2020

There are many problems with unpinning though related to build reproducibility and consistency. And it's not only about pip. virtualenv, setuptools, wheel all come into the same build tools category and need to follow the same approach.

Even if we unpin, the https://github.com/stackstorm/st2packaging-dockerfiles are generated on a build time. This will still pin a version during the Dockerfile build and make pip version inconsistent across the repos.

We started to pin pip and other python build tools following the operational knowledge and fires we fought before. Looks like @nmaludy just caught one of them after pip unpinning in master.

What makes sense to me today, keeping in mind previous incidents:

  • Use the patch version interval like pip>=19.1.0,<19.2.0 (following the @nmaludy operational experience when pip minor version still broke the build)
    • This will allow us to benefit from the potential security fixes
  • Try to rely less on pip API to minimize possible impact (as @blag was suggesting and it's also new operational experience now)
  • pip version should be the same across all the repositories
  • Plan pip upgrades and follow as a maintenance procedure
  • Upgrade pip version periodically (ideally at least for every release) across all the repositories

This will give us more control and cyclic approach in pip maintenance, rather then more chances of sudden fire at anytime with pip unpinning.
I believe this also fits the support velocity of StackStorm with the current team when we have better 🔥 control. If unpinning didn't work for the previous full-time dedicated team who could work on fixing things almost immediately, it less likely to work for the current team of Maintainers who dedicate their free time to the project.

I'd prefer us to keep the maintenance minimal which would potentially win us more time to spend on other important activities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants