-
Notifications
You must be signed in to change notification settings - Fork 5
Pin specific versions of Python dependencies #1
Conversation
Note that Django recommends GDAL (the library), but python-gdal (the python bindings, called "GDAL" on PyPI, but I'll call it python-gdal here) is totally optional. Django doesn't use python-gdal, it uses the library directly. MapIt only uses python-gdal, if present, to turn on GDAL exceptions (in mapit/views/area.py) rather than GDAL's default "print/return None"; if you think you'll be okay without this, you don't need to install python-gdal at all, and you need do nothing :) If you do want that behaviour, either of the two options sound fine to me. pygdal is basically doing the first option for you, setting the library/include paths, that's all I think it's doing. It hasn't added any additional auto-generated code that isn't present in GDAL itself, I believe it's simply copied from the correct version of GDAL, all the 'fixing' is in setup.py to call As an aside, MapIt does not fully support 1.8 yet – the only issue I currently know of is the use of the gone-in-1.8 manual transaction management in the |
MapIt defines its dependencies flexibly in setup.py, but we want to pin to exact versions of all dependencies for deployment. We're using the most recent versions for most packages, except for GDAL, which needs to be kept at the same version as the system packages it depends on. `gdal-config --version` shows the installed version to match to. GDAL is not strictly required, either by MapIt or for Django's geospatial support, but it is recommended [1]. The Python bindings for GDAL aren't required by Django or MapIt, but if they are present then exceptions will be raised [2] instead of only error messages being sent to stdout [3]. We would prefer to have exceptions raised so that we have better feedback about any errors, so we are using a bindings package. Both GeoDjango and the Python bindings for GDAL depend on several system packages, which we're installing with Puppet [4]. We will be using virtualenv to install the Python dependencies on deployment, and GDAL is a bit tricky to get working reliably inside virtualenvs. This is because it needs access to config and header files from the system packages which it doesn't know how to find from inside a virtualenv which was created with --no-site-packages (which is the default now). We found a couple of approaches which successfully install a Python GDAL package in a virtualenv in a test vagrant machine: - Setting an envvar before installing GDAL with pip, as suggested here [5]: export CFLAGS=$(gdal-config --cflags) pip install GDAL==1.10.0 This sets the compiler flags so that the headers can be found during the pip compilation process. However, setting extra environment variables during deployment seems brittle, especially when the envvar is defined in a separate repo from the Python package dependency. We also don't know of any prior art in our stack for setting extra envvars during deployment, and that would introduce extra complexity by moving further away from a standard deployment. - Using pygdal, a Python package which is intended to make it easier to use GDAL inside a virtualenv. This package is suggested by various people in the same thread as the first option [6] and it installs without needing any extra configuration. There are a couple of downsides to this: - this package includes some C++ extensions which are presumably used for compiling GDAL instead of the system header files. This is a lot of auto-generated code that we don't feel confident to review. - that package is maintained separately from the system packages it depends on, and we aren't sure what the behaviour would be if the versions got out of sync. This approach has the advantage of working in the dev VM in a more standard way. This package also requires numpy, which we're not sure if we actually need or not, but the only downside of having it is extra compilation time on a first deploy. We're going with the second option because it's simpler, more standard and works reliably for us now, allowing us to continue with this work. Matthew Somerville (@dracos) of mySociety commented very helpfully on this PR as follows: Note that Django recommends GDAL (the library), but python-gdal (the python bindings, called "GDAL" on PyPI, but I'll call it python-gdal here) is totally optional. Django doesn't use python-gdal, it uses the library directly. MapIt only uses python-gdal, if present, to turn on GDAL exceptions (in mapit/views/area.py) rather than GDAL's default "print/return None"; if you think you'll be okay without this, you don't need to install python-gdal at all, and you need do nothing :) If you do want that behaviour, either of the two options sound fine to me. pygdal is basically doing the first option for you, setting the library/include paths, that's all I think it's doing. It hasn't added any additional auto-generated code that isn't present in GDAL itself, I believe it's simply copied from the correct version of GDAL, all the 'fixing' is in setup.py to call gdal-config appropriately. So you only need to worry about reviewing that if you've been reviewing GDAL's code too ;) I imagine it wouldn't work if they got out of sync, but that'd only presumably happen at the point you do a whole OS upgrade. As an aside, MapIt does not fully support 1.8 yet – the only issue I currently know of is the use of the gone-in-1.8 manual transaction management in the mapit_UK_import_nspd_ni command, but that will need fixing if you wish to use it with 1.8 (PRs welcome of course if you need it before we do :) ). Django 1.7 is already past the end of its 'mainstream support' period [7] which means that it is now only getting security fixes and fixes for data loss bugs, so we'll stay with Django 1.8 which is a Long Term Support version. If we need to fix the manual transaction management Matthew mentions above, we'll address that when we need to. [1]: https://docs.djangoproject.com/en/1.8/ref/contrib/gis/install/geolibs/ [2]: https://github.com/mysociety/mapit/blob/c4f68f05cdde7cae22d53c2373c832202fa66096/mapit/views/areas.py#L324-L325 [3]: https://trac.osgeo.org/gdal/wiki/PythonGotchas#PythonbindingsdonotraiseexceptionsunlessyouexplicitlycallUseExceptions [4]: https://github.gds/gds/puppet/commit/03a5d4f50024c442128fe01a8b8135a80efa7f40 [5]: http://gis.stackexchange.com/a/115502 [6]: http://gis.stackexchange.com/a/124420 [7]: https://www.djangoproject.com/download/
Thanks @dracos for the really useful comment - you've clarified lots of things for us. I've updated the commit message to:
|
👍 Top commit message. |
Pin specific versions of Python dependencies
MapIt defines its dependencies flexibly in setup.py, but we want to pin to
exact versions of all dependencies for deployment.
We're using the most recent versions for most packages, except for GDAL,
which needs to be kept at the same version as the system packages it depends
on.
gdal-config --version
shows the installed version to match to.GDAL is not strictly required, either by MapIt or for Django's geospatial
support, but it is recommended 1. The Python bindings for GDAL aren't
required by Django or MapIt, but if they are present then exceptions will be
raised 2 instead of only error messages being sent to stdout 3. We would
prefer to have exceptions raised so that we have better feedback about any
errors, so we are using a bindings package. Both GeoDjango and the Python
bindings for GDAL depend on several system packages, which we're installing
with Puppet 4.
We will be using virtualenv to install the Python dependencies on deployment,
and GDAL is a bit tricky to get working reliably inside virtualenvs. This is
because it needs access to config and header files from the system packages
which it doesn't know how to find from inside a virtualenv which was created
with --no-site-packages (which is the default now).
We found a couple of approaches which successfully install a Python GDAL
package in a virtualenv in a test vagrant machine:
Setting an envvar before installing GDAL with pip, as suggested here 5:
export CFLAGS=$(gdal-config --cflags)
pip install GDAL==1.10.0
This sets the compiler flags so that the headers can be found during the pip
compilation process. However, setting extra environment variables during
deployment seems brittle, especially when the envvar is defined in a
separate repo from the Python package dependency. We also don't know of any
prior art in our stack for setting extra envvars during deployment, and that
would introduce extra complexity by moving further away from a standard
deployment.
Using pygdal, a Python package which is intended to make it easier to use
GDAL inside a virtualenv. This package is suggested by various people in the
same thread as the first option 6 and it installs without needing any extra
configuration. There are a couple of downsides to this:
compiling GDAL instead of the system header files. This is a lot of
auto-generated code that we don't feel confident to review.
depends on, and we aren't sure what the behaviour would be if the
versions got out of sync.
This approach has the advantage of working in the dev VM in a more standard
way. This package also requires numpy, which we're not sure if we actually
need or not, but the only downside of having it is extra compilation time on
a first deploy.
We're going with the second option because it's simpler, more standard and
works reliably for us now, allowing us to continue with this work.
Matthew Somerville (@dracos) of mySociety commented very helpfully on this PR
as follows:
Django 1.7 is already past the end of its 'mainstream support' period 7
which means that it is now only getting security fixes and fixes for data loss
bugs, so we'll stay with Django 1.8 which is a Long Term Support version. If
we need to fix the manual transaction management Matthew mentions above, we'll
address that when we need to.