Skip to content
This repository has been archived by the owner on Oct 25, 2022. It is now read-only.

Pin specific versions of Python dependencies #1

Merged
merged 1 commit into from
Jul 28, 2015
Merged

Conversation

jennyd
Copy link

@jennyd jennyd commented Jul 22, 2015

MapIt defines its dependencies flexibly in setup.py, but we want to pin to
exact versions of all dependencies for deployment.

We're using the most recent versions for most packages, except for GDAL,
which needs to be kept at the same version as the system packages it depends
on. gdal-config --version shows the installed version to match to.

GDAL is not strictly required, either by MapIt or for Django's geospatial
support, but it is recommended 1. The Python bindings for GDAL aren't
required by Django or MapIt, but if they are present then exceptions will be
raised 2 instead of only error messages being sent to stdout 3. We would
prefer to have exceptions raised so that we have better feedback about any
errors, so we are using a bindings package. Both GeoDjango and the Python
bindings for GDAL depend on several system packages, which we're installing
with Puppet 4.

We will be using virtualenv to install the Python dependencies on deployment,
and GDAL is a bit tricky to get working reliably inside virtualenvs. This is
because it needs access to config and header files from the system packages
which it doesn't know how to find from inside a virtualenv which was created
with --no-site-packages (which is the default now).

We found a couple of approaches which successfully install a Python GDAL
package in a virtualenv in a test vagrant machine:

  • Setting an envvar before installing GDAL with pip, as suggested here 5:

    export CFLAGS=$(gdal-config --cflags)
    pip install GDAL==1.10.0

    This sets the compiler flags so that the headers can be found during the pip
    compilation process. However, setting extra environment variables during
    deployment seems brittle, especially when the envvar is defined in a
    separate repo from the Python package dependency. We also don't know of any
    prior art in our stack for setting extra envvars during deployment, and that
    would introduce extra complexity by moving further away from a standard
    deployment.

  • Using pygdal, a Python package which is intended to make it easier to use
    GDAL inside a virtualenv. This package is suggested by various people in the
    same thread as the first option 6 and it installs without needing any extra
    configuration. There are a couple of downsides to this:

    • this package includes some C++ extensions which are presumably used for
      compiling GDAL instead of the system header files. This is a lot of
      auto-generated code that we don't feel confident to review.
    • that package is maintained separately from the system packages it
      depends on, and we aren't sure what the behaviour would be if the
      versions got out of sync.

    This approach has the advantage of working in the dev VM in a more standard
    way. This package also requires numpy, which we're not sure if we actually
    need or not, but the only downside of having it is extra compilation time on
    a first deploy.

We're going with the second option because it's simpler, more standard and
works reliably for us now, allowing us to continue with this work.

Matthew Somerville (@dracos) of mySociety commented very helpfully on this PR
as follows:

Note that Django recommends GDAL (the library), but python-gdal (the python
bindings, called "GDAL" on PyPI, but I'll call it python-gdal here) is
totally optional. Django doesn't use python-gdal, it uses the library
directly.

MapIt only uses python-gdal, if present, to turn on GDAL exceptions (in
mapit/views/area.py) rather than GDAL's default "print/return None"; if you
think you'll be okay without this, you don't need to install python-gdal at
all, and you need do nothing :)

If you do want that behaviour, either of the two options sound fine to me.
pygdal is basically doing the first option for you, setting the
library/include paths, that's all I think it's doing. It hasn't added any
additional auto-generated code that isn't present in GDAL itself, I believe
it's simply copied from the correct version of GDAL, all the 'fixing' is
in setup.py to call gdal-config appropriately. So you only need to worry
about reviewing that if you've been reviewing GDAL's code too ;) I imagine
it wouldn't work if they got out of sync, but that'd only presumably happen
at the point you do a whole OS upgrade.

As an aside, MapIt does not fully support 1.8 yet – the only issue I
currently know of is the use of the gone-in-1.8 manual transaction
management in the mapit_UK_import_nspd_ni command, but that will need
fixing if you wish to use it with 1.8 (PRs welcome of course if you need
it before we do :) ).

Django 1.7 is already past the end of its 'mainstream support' period 7
which means that it is now only getting security fixes and fixes for data loss
bugs, so we'll stay with Django 1.8 which is a Long Term Support version. If
we need to fix the manual transaction management Matthew mentions above, we'll
address that when we need to.

@dracos
Copy link

dracos commented Jul 23, 2015

Note that Django recommends GDAL (the library), but python-gdal (the python bindings, called "GDAL" on PyPI, but I'll call it python-gdal here) is totally optional. Django doesn't use python-gdal, it uses the library directly.

MapIt only uses python-gdal, if present, to turn on GDAL exceptions (in mapit/views/area.py) rather than GDAL's default "print/return None"; if you think you'll be okay without this, you don't need to install python-gdal at all, and you need do nothing :)

If you do want that behaviour, either of the two options sound fine to me. pygdal is basically doing the first option for you, setting the library/include paths, that's all I think it's doing. It hasn't added any additional auto-generated code that isn't present in GDAL itself, I believe it's simply copied from the correct version of GDAL, all the 'fixing' is in setup.py to call gdal-config appropriately. So you only need to worry about reviewing that if you've been reviewing GDAL's code too ;) I imagine it wouldn't work if they got out of sync, but that'd only presumably happen at the point you do a whole OS upgrade.

As an aside, MapIt does not fully support 1.8 yet – the only issue I currently know of is the use of the gone-in-1.8 manual transaction management in the mapit_UK_import_nspd_ni command, but that will need fixing if you wish to use it with 1.8 (PRs welcome of course if you need it before we do :) ).

MapIt defines its dependencies flexibly in setup.py, but we want to pin to
exact versions of all dependencies for deployment.

We're using the most recent versions for most packages, except for GDAL,
which needs to be kept at the same version as the system packages it depends
on. `gdal-config --version` shows the installed version to match to.

GDAL is not strictly required, either by MapIt or for Django's geospatial
support, but it is recommended [1]. The Python bindings for GDAL aren't
required by Django or MapIt, but if they are present then exceptions will be
raised [2] instead of only error messages being sent to stdout [3]. We would
prefer to have exceptions raised so that we have better feedback about any
errors, so we are using a bindings package. Both GeoDjango and the Python
bindings for GDAL depend on several system packages, which we're installing
with Puppet [4].

We will be using virtualenv to install the Python dependencies on deployment,
and GDAL is a bit tricky to get working reliably inside virtualenvs. This is
because it needs access to config and header files from the system packages
which it doesn't know how to find from inside a virtualenv which was created
with --no-site-packages (which is the default now).

We found a couple of approaches which successfully install a Python GDAL
package in a virtualenv in a test vagrant machine:

- Setting an envvar before installing GDAL with pip, as suggested here [5]:

    export CFLAGS=$(gdal-config --cflags)
    pip install GDAL==1.10.0

  This sets the compiler flags so that the headers can be found during the pip
  compilation process. However, setting extra environment variables during
  deployment seems brittle, especially when the envvar is defined in a
  separate repo from the Python package dependency. We also don't know of any
  prior art in our stack for setting extra envvars during deployment, and that
  would introduce extra complexity by moving further away from a standard
  deployment.

- Using pygdal, a Python package which is intended to make it easier to use
  GDAL inside a virtualenv. This package is suggested by various people in the
  same thread as the first option [6] and it installs without needing any extra
  configuration. There are a couple of downsides to this:

    - this package includes some C++ extensions which are presumably used for
      compiling GDAL instead of the system header files. This is a lot of
      auto-generated code that we don't feel confident to review.
    - that package is maintained separately from the system packages it
      depends on, and we aren't sure what the behaviour would be if the
      versions got out of sync.

  This approach has the advantage of working in the dev VM in a more standard
  way. This package also requires numpy, which we're not sure if we actually
  need or not, but the only downside of having it is extra compilation time on
  a first deploy.

We're going with the second option because it's simpler, more standard and
works reliably for us now, allowing us to continue with this work.

Matthew Somerville (@dracos) of mySociety commented very helpfully on this PR
as follows:

    Note that Django recommends GDAL (the library), but python-gdal (the python
    bindings, called "GDAL" on PyPI, but I'll call it python-gdal here) is
    totally optional. Django doesn't use python-gdal, it uses the library
    directly.

    MapIt only uses python-gdal, if present, to turn on GDAL exceptions (in
    mapit/views/area.py) rather than GDAL's default "print/return None"; if you
    think you'll be okay without this, you don't need to install python-gdal at
    all, and you need do nothing :)

    If you do want that behaviour, either of the two options sound fine to me.
    pygdal is basically doing the first option for you, setting the
    library/include paths, that's all I think it's doing. It hasn't added any
    additional auto-generated code that isn't present in GDAL itself, I believe
    it's simply copied from the correct version of GDAL, all the 'fixing' is
    in setup.py to call gdal-config appropriately. So you only need to worry
    about reviewing that if you've been reviewing GDAL's code too ;) I imagine
    it wouldn't work if they got out of sync, but that'd only presumably happen
    at the point you do a whole OS upgrade.

    As an aside, MapIt does not fully support 1.8 yet – the only issue I
    currently know of is the use of the gone-in-1.8 manual transaction
    management in the mapit_UK_import_nspd_ni command, but that will need
    fixing if you wish to use it with 1.8 (PRs welcome of course if you need
    it before we do :) ).

Django 1.7 is already past the end of its 'mainstream support' period [7]
which means that it is now only getting security fixes and fixes for data loss
bugs, so we'll stay with Django 1.8 which is a Long Term Support version. If
we need to fix the manual transaction management Matthew mentions above, we'll
address that when we need to.

[1]: https://docs.djangoproject.com/en/1.8/ref/contrib/gis/install/geolibs/
[2]: https://github.com/mysociety/mapit/blob/c4f68f05cdde7cae22d53c2373c832202fa66096/mapit/views/areas.py#L324-L325
[3]: https://trac.osgeo.org/gdal/wiki/PythonGotchas#PythonbindingsdonotraiseexceptionsunlessyouexplicitlycallUseExceptions
[4]: https://github.gds/gds/puppet/commit/03a5d4f50024c442128fe01a8b8135a80efa7f40
[5]: http://gis.stackexchange.com/a/115502
[6]: http://gis.stackexchange.com/a/124420
[7]: https://www.djangoproject.com/download/
@jennyd jennyd changed the title Pin explicit versions of Python dependencies Pin specific versions of Python dependencies Jul 23, 2015
@jennyd
Copy link
Author

jennyd commented Jul 23, 2015

Thanks @dracos for the really useful comment - you've clarified lots of things for us. I've updated the commit message to:

  • include the full comment
  • make some parts of our original explanation clearer
  • add a note about picking Django 1.8 at the end

@mattbostock
Copy link

👍 Top commit message.

tommyp added a commit that referenced this pull request Jul 28, 2015
Pin specific versions of Python dependencies
@tommyp tommyp merged commit 0446675 into master Jul 28, 2015
@tommyp tommyp deleted the add-requirements branch July 28, 2015 15:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants