Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings: add a pyproject.toml with numpy as a build requirement #8926

Merged
merged 9 commits into from
Dec 14, 2023

Conversation

rouault
Copy link
Member

@rouault rouault commented Dec 6, 2023

Fixes #8069.
Hopefully a compromise that will satisfy most needs...

  • add a pyproject.toml with numpy as a build requirement
  • make absence of numpy at build time an error by default, unless the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY env var is set
  • despite numpy being a build requirement, numpy remains an optional dependency at install time if users just install with "pip install gdal" (for folks that would do vector only operations with GDAL)
  • for people needing numpy support, they have to use "pip install gdal[numpy]".

Demo:

$ pip list
Package       Version
------------- -------
pip           20.0.2 
pkg-resources 0.0.0  
setuptools    44.0.0 

$ pip install -v dist/GDAL-3.9.0.tar.gz
[...]
  Collecting oldest-supported-numpy
[...]
Successfully installed GDAL-3.9.0
[...]
$ pip list
Package       Version
------------- -------
pip           20.0.2 
pkg-resources 0.0.0  
setuptools    44.0.0 

$ pip install -v dist/GDAL-3.9.0.tar.gz[numpy]
[...]
Collecting numpy>1.0.0; extra == "numpy"
[...]
Successfully installed GDAL-3.9.0 numpy-1.24.4
[...]

$ pip list
Package       Version
------------- -------
GDAL          3.9.0  
numpy         1.24.4 
pip           20.0.2 
pkg-resources 0.0.0  
setuptools    44.0.0 

@pl-kevinwurster
Copy link

@rouault Have been trying to figure out how to respond on gdal-dev – Sean's point about the OGR side is really good. I have spent too much time on the raster side and forgot about those users ...

Adding a pyproject.toml should alleviate some other problems with getting gdal installed using pip v23. Currently the presence of the wheel package triggers new build machinery, but I think with pyproject.toml the new build machinery will always be triggers.

Functionally I'm not so sure that adding Numpy as a build requirement changes anything for users right now. Since the gdal package does not provide wheel builds on PyPI, I believe any user doing $ pip install gdal is running the build process locally, which means that Numpy is still effectively a dependency for everyone. Some users may be building a wheel and hosting in their own infrastructure, but that is a class of user that knows how to navigate all of this. Obviously the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY alleviates this issue, but it does feel a bit like just moving the pain point from one place to another. I do like that this enables $ pip install gdal[numpy] to do what it should.

Researching this a bit more, it seems pyproject.toml-based builds have a mechanism for dynamically determining build dependencies via the get_requires_for_build_wheel hook for dynamically determining requirements at build time. It looks like this requires adding a local Python file that serves as the build backend, and overrides some stuff. But of course, an open pip issue (pypa/pip#12273) suggests this not working right now!

Unclear to me if the extra being installed is available to get_requires_for_build_wheel – doing some experimenting. if get_requires_for_build_wheel can determine that gdal[numpy] is being installed then that might solve the problem.

@rouault
Copy link
Member Author

rouault commented Dec 6, 2023

Have been trying to figure out how to respond on gdal-dev

subscribe at https://lists.osgeo.org/mailman/listinfo/gdal-dev and check your spam folder for confirmation email

Functionally I'm not so sure that adding Numpy as a build requirement changes anything for users right now

This does fix "pip install gdal[numpy]"

To be honest, I'm quite beyond my mastering of Python packaging, which itself seems to be in a quite sad state, so I don't really feel like being able to do any further improvements/complications. Door open to anyone else feeling like they want to try :-)

I'm a bit ambivalent about this PR as they are situations like the GDAL conda packaging which use pip options to not install dependencies and ignore installed depencies, so the "require numpy at build time" check fails, and I had to get around that with 442b785. Hopefully that's an unsual way of installing the bindings.

@latot
Copy link

latot commented Dec 6, 2023

Should setup.py.in still be there? pyproject.toml should fully replace setup.py with new ways to handle it.
Or this will works as the first working version of the new install method? or I'm confused.

@rouault
Copy link
Member Author

rouault commented Dec 6, 2023

Should setup.py.in still be there? ``

how could it not be there ? It does a lot of things that a .toml file can't do... The .toml file indicates that the underlying build system is setuptools. I've taken inspiration from https://github.com/rasterio/rasterio/blob/main/pyproject.toml and https://github.com/toblerity/fiona/blob/master/pyproject.toml which do similar things
Basically the only purpose of the .toml file is to indicate to pip that numpy is a build requirement, so it gets installed before calling setup.py which can't do that.

@latot
Copy link

latot commented Dec 6, 2023

After research and ask ppl in matrix, seems pyproject.toml new deprecated in part setuptools/setup.py, but still there is no considerations when needs compilation.

So, there is some alternatives, one keep using setuptools but with pyproject.toml which is still right, other one is use flit package (https://flit.pypa.io/en/stable/index.html) to construct a build env, and finally develop a build env (https://stackoverflow.com/a/71523567)

Maybe, the easiest way now if just keep the mix, while GDAL uses the new pyproject.toml should be fine, info about it and how to organize a little would be here: https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html

@pl-kevinwurster
Copy link

OK I tried the hook, and found some additional documentation showing that the settings these hooks receive are not useful. I also did some digging into the pip and setuptools source code, and it appears that it is not possible to know that gdal[numpy] is requested. The build machinery only knows that gdal is being built, and higher-level machinery deals with the extra numpy requirement. Very frustrating.

This does fix "pip install gdal[numpy]"

Yes, but in exchange it forces users who just want the OGR side to install numpy or set the environment variable. I wasn't thinking about that when I suggested just making Numpy a requirement. The mailing list thread suggests the primary maintainers at least are OK with this change, and I do think most users probably already have Numpy installed in their environment.

To be honest, I'm quite beyond my mastering of Python packaging, which itself seems to be in a quite sad state, so I don't really feel like being able to do any further improvements/complications. Door open to anyone else feeling like they want to try :-)

Yep totally agree. Not at all asking you to do any additional work. I have been dealing with Python packaging a lot recently, and it is incredibly frustrating. I think the additional research I mentioned is enough to understand that this PR is probably as good as it gets.

I'm a bit ambivalent about this PR

Yeah, same. What about just providing more detailed installation instructions? I wrote the snippet below for isofit/isofit#420, but it seems like something that should be included in the GDAL documentation. I think this probably provides enough information for most users to reliably get gdal installed, and enough context for users to do their own debugging rather than opening tickets on the issue tracker. Merging this PR and including this bit of documentation are not mutually exclusive either!

Installing ISOFIT with ``pip``
******************************

These instructions look scary at first glance, but the abbreviated version is:
ISOFIT can absolutely be installed via ``$ pip``, but it has some dependencies
that make this process difficult.

Instructions for installing on `Windows`_ are covered in a separate section.

Background
^^^^^^^^^^

The ``gdal`` Python package provides bindings to the underlying
`GDAL <https://gdal.org/>`_ library. GDAL itself provides vector support via
the OGR library, and raster support from the GDAL library. The ``gdal`` Python
package uses a Python extension to interact with these libraries, and it is
possible to install the ``gdal`` Python package in a manner where it only
provides vector support, or both vector and raster. The presence of the
``numpy`` library is used to make this determination. These difficulties are
rooted in difficulties in the Python packaging ecosystem. It is very easy to
get the ``gdal`` Python package installed in a mode where it only provides
vector support. To determine if the package was built with raster support, run:

.. code-block:: console
    $ python3 -c "from osgeo import gdal_array"
The command will issue an error if raster support is not enabled.

Users attempting to debug their ``gdal`` Python package installation should be
aware that when the package is installed from a source distribution, the package
is first built before it is installed. This matters because it is not possible
for the build process to know that the user has requested the package be
installed with raster support. The intention is that this command:

.. code-block:: console
    $ pip install 'gdal[numpy]'
should install the ``gdal`` Python package with raster support, but ``pip``
installs the "extras" (in this case ``numpy``) after installing the primary
package. It is not possible for a package to know that it is being installed
with "extras". Ultimately this is because the build and install processes are
separate, which is a good thing, but it is transparent and confusing to the
user.

To further complicate the issue, the Python world is transitioning to a
different mechanism for building packages (`PEP-517 <https://peps.python.org/pep-0517/>`_).
The details are mostly irrelevant, but users should know that
`pip v23 <https://pip.pypa.io/en/stable/news/#v23-0>`_ has some behavior
changes. The commands below are intended to work on v23 and also recent-ish
versions of ``pip``, but have not been extensively tested and may not be
_exactly_ right. Hopefully they are well commented enough to point users in the
right direction. The project welcomes improved documentation in this area.

Installing the Underlying GDAL Library
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Before installing the ``gdal`` Python package, the underlying GDAL library
must be installed. The exact commands are extremely dependent on the user's
platform and environment, but some common approaches are listed below. Users
can also build GDAL from source, although this can also be a very difficult
process. All of these approaches have caveats and edge cases that the ISOFIT
project cannot directly debug or support, but they should point users in the
right direction.

# Ubuntu users can use ``apt`` and/or ``apt-get`` to install the ``gdal-bin``
  and ``libgdal-dev`` libraries from the upstream package repository for their
  distribution. ISOFIT's CI system uses Ubuntu, and its configuration files and
  associated scripts may provide some helpful context.
# MacOS users can use `Homebrew <https://brew.sh/>`_  to install the ``gdal``
  package.


Installing the ``gdal`` Python Package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As mentioned, the ``gdal`` Python package must be installed with raster support,
which requires users to issue a specific set of ``$ pip install`` commands.

Users may find it odd that these commands do not invoke ``$ pip`` directly.
It is not uncommon for users to have multiple versions of ``python``,
``python3``, ``pip``, and ``pip3`` installed. Invoking ``pip`` via a Python
interpreter ensures that the package is installed for that interpreter, and not
some other interpreter attached to an unrelated ``pip`` executable that happens
to appear earlier on ``$PATH`` for some reason.

.. code-block:: bash

    pip install isofit
    # The 'gdal' Python package expects 'gdal-config' to be on '$PATH'.
    gdal-config --version
    # The presence of the 'wheel' package triggers 'pip' to use some of
    # PEP-517's behavior.
    python3 -m pip install wheel
    # As mentioned, we need to manually install 'numpy' prior to 'gdal. Users
    # may want to compare this against the project's 'numpy' version
    # requirement, otherwise it is possible for 'gdal' to be built against an
    # existing version of 'numpy', but for 'numpy' to then be upgraded when
    # 'isofit' is installed. This does not seem to matter though.
    python3 -m pip install numpy
    # Install 'gdal'. Note that '$ gdal-config' is used to identify exactly
    # which version of the 'gdal' package should be installed. Since the 'gdal'
    # package does not list 'numpy' as a requirement in 'setup_requires', we
    # must add the '--no-build-isolation' to allow the package to see the
    # version of 'numpy' that we pre-installed.
    #
    # This process builds a 'wheel' for GDAL, and 'pip' caches
    # the wheel locally even if the package is uninstalled. Users iterating on
    # these commands while attempting to determine exactly how to install 'gdal'
    # on their system may end up in a state where a bad 'wheel' is cached.
    # adding the '--force-reinstall' and '--no-cache-dir' flags will clear the
    # bad 'wheel'.
    python3 -m pip install \
      --no-build-isolation \
      "gdal==$(gdal-config --version)"
    # Install 'isofit'.
    python3 -m pip install isofit

@pl-kevinwurster
Copy link

@latot IMO this is the right approach right now:

Maybe, the easiest way now if just keep the mix, while GDAL uses the new pyproject.toml should be fine

Fully moving off of setup.py is a much more involved effort. Just determining a migration strategy will be time consuming and difficult. The current state of Python packaging really is a big poorly documented mess.

@latot
Copy link

latot commented Dec 6, 2023

I agree, actually would be better just get the pyproject.toml file, wait until things get organized and documented.

@rouault
Copy link
Member Author

rouault commented Dec 6, 2023

These instructions look scary at first glance

@pl-kevinwurster Thanks for the proposal ! yes, a bit scary...

Actually the python bindings should now always include the needed part (gdal_array module) for numpy support (unless people have built with the env variable set). The only missing thing is just to install numpy if it isn't already there (but yes I'd be unclear of what will happen when a ABI breaking numpy 2.0 will come...)

Since the 'gdal'
package does not list 'numpy' as a requirement in 'setup_requires', we
must add the '--no-build-isolation' to allow the package to see the
version of 'numpy' that we pre-installed.

The last part of those instructions are no longer true with this PR (at least when installing through pip), are they?

To install the version of the Python bindings matching your native GDAL library:

::

pip install GDAL=="$(gdal-config --version).*"
pip install gdal=="$(gdal-config --version).*"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think this does something I have not accounted for: Does the trailing .* account for versions like 3.8.1.1 where a new version of the Python package has been released to fix some bug with the package itself? I have always done "gdal==$(gdal-config --version)".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the trailing .* account for versions like 3.8.1.1

yes

@@ -0,0 +1,32 @@
[build-system]
requires = ["setuptools", "oldest-supported-numpy"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just use a broad numpy dependency? Forcing users to install the oldest supported version of Numpy supported by their version of Python doesn't seem great. They may have installed a specific version of Numpy already, and I think this could cause a downgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just use a broad numpy dependency?

just copying what rasterio does...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've changed that to numpy in an extra commit. I suspect rasterio does what it does to be able to do binary wheels using ancient toolchain so that they can work on the largest number of systems. Plain numpy should be fine in a GDAL context where we don't provide binary wheels

Comment on lines +8 to +11
authors = [
{name = "Frank Warmerdam"},
{name = "Howard Butler"},
{name = "Even Rouault"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few fields that are duplicated between setup.py and pyproject.toml. You should be able to introduce a setup.cfg file and define common bits of metadata there to avoid having to update both setup.py and pyproject.toml. I'm pretty sure the setuptools.setup() call in setup.py will use the contents of setup.cfg as defaults, and override with whatever parameters are still provided via the setup() function. I can't think of any adverse effects in terms of maintainability, but I also have never dealt with a project containing all 3 files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to introduce a setup.cfg file

sigh, yet another config file... I'd prefer sticking with the current 2 file setup, unless we see that it is broken...

Copy link
Member

@avalentino avalentino Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you no longer need setup.cfg if you use

[build-system]
requires = ["setuptools>=61.0.0", "wheel"]

Almost all the metadata can be simply moved from setup.py to pyproject.toml leaving in setup.py only the part defining the Extensions and the logic that cannot be expressed in a declarative manner.

@pl-kevinwurster
Copy link

@rouault Oh yeah. Sorry, I should have clarified that the instructions are written for users attempting to install the gdal Python package today. I'll let you determine if you want to make the setup.py changes, and I can suggest some documentation to be included once you have made a determination.

If these setup.py changes will be merged, then I will also include a note about how to install older versions of gdal, which I think is important because users are likely to go to pypi.org/pypi/gdal to seek help, so including instructions for older versions should prevent people from coming to the issue tracker for help.

@rouault
Copy link
Member Author

rouault commented Dec 6, 2023

I'll let you determine if you want to make the setup.py changes,

you mean, if we merge this PR ?

If these setup.py changes will be merged, then I will also include a note about how to install older versions of gdal

sounds good

@avalentino
Copy link
Member

AFAIK normally the python packages are built using "build isolation".
In my understanding this should mean

  1. if you put numpy in the list of build dependencies it is installed in the isolated environment used to build the package but the final user is not forced to install it in his own working environment
  2. using oldest-supported-numpy is recommended for improving the binary compatibility with all the currently supported python versions, but also in this case it is used in the isolated build environment. In the use environment one should be able to to use newer numpy versions just running pip install gdal[numpy]

@rouault
Copy link
Member Author

rouault commented Dec 7, 2023

2. using oldest-supported-numpy is recommended

ok, I've reverted to that, as well as adding ["setuptools>=61.0.0", "wheel"] to requires

@avalentino
Copy link
Member

In https://github.com/avalentino/gdal/tree/fix_8069 I have committed a change to move all the static(-ish) metadata form setup.py to pyproject.toml. This should avoid duplication and simplify the maintenance.
Please feel free to merge in your branch if you like.

I have also noted that both entry-points and scripts are installed. Is it on purpose?

Finally there is another setup.py in gdal-utils.
If you want I could provide a pyproject.toml also for that.

@rouault
Copy link
Member Author

rouault commented Dec 8, 2023

Please feel free to merge in your branch if you like.

thanks. I've just done it, with an extra commit to change from 'python setup.py sdist' to 'python -m build . --sdist', but I see that in swig/python/CMakeLists.txt we do also other invokations of setup.py directly. I believe further changes would be needed

I have also noted that both entry-points and scripts are installed. Is it on purpose?

I'm not sure... scripts was historical, and entry-points is a recent addition. I'd prefer not modifying that at that time if possible.

If you want I could provide a pyproject.toml also for that.

when the dust has settled on this PR... :-)

@rouault
Copy link
Member Author

rouault commented Dec 8, 2023

@avalentino Your changes borke "python setup.py install".
cf https://github.com/OSGeo/gdal/actions/runs/7141152960/job/19447889661?pr=8926:

2023-12-08T12:16:26.1914914Z running egg_info
2023-12-08T12:16:26.1917289Z creating UNKNOWN.egg-info
2023-12-08T12:16:26.1917999Z writing UNKNOWN.egg-info/PKG-INFO
2023-12-08T12:16:26.1920748Z writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
2023-12-08T12:16:26.1922865Z writing entry points to UNKNOWN.egg-info/entry_points.txt
2023-12-08T12:16:26.1925501Z writing top-level names to UNKNOWN.egg-info/top_level.txt
2023-12-08T12:16:26.1927341Z writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
2023-12-08T12:16:26.1933425Z reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
2023-12-08T12:16:26.1941772Z writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
2023-12-08T12:16:26.1943747Z Copying UNKNOWN.egg-info to /usr/lib/python3/dist-packages/UNKNOWN-3.9.0.egg-info
2023-12-08T12:16:26.1944456Z Skipping SOURCES.txt

I reproduce locally too doing "make install DESTDIR=/tmp"
I know python setup.py install is deprecated, but I'm nervous if that no longer works and one must necessarily use pip install. There are esoteric setup.py install options like --install-layout=deb that are required for Debian.
I've added a commit to re-add name to setup() to fix that

@rouault
Copy link
Member Author

rouault commented Dec 8, 2023

I've added a commit to re-add name to setup() to fix that

unfortunately that's not enough to fix it. I'm going to revert most of your changes, as I feel them too risky. The subset I've kept is in 80a147b

@avalentino
Copy link
Member

I will have another look later today.
Meanwhile, could you please confirm that the setuptools version that you use to run

python3 setup.py

is >=61.0?

@rouault
Copy link
Member Author

rouault commented Dec 8, 2023

Meanwhile, could you please confirm that the setuptools version that you use to run

python3 setup.py

is >=61.0?

ah, no, I was at 60.0. But even when upgrading, I run into issues when running setup.py

@avalentino
Copy link
Member

I'm not able to reproduce locally on ubuntu 23.10 (setuptools v68.1.2).

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 68.008% (-0.003%) from 68.011%
when pulling 0be30ab on rouault:fix_8069
into 06ef969 on OSGeo:master.

@pl-kevinwurster
Copy link

@rouault Looking good. I'll get you some documentation early next week. As for potentially breaking $ python setup.py install, I agree that now is not the time. Maybe something to consider for inclusion in GDAL 4.0 though?

@rouault rouault merged commit ef7bcde into OSGeo:master Dec 14, 2023
30 checks passed
@rouault
Copy link
Member Author

rouault commented Mar 7, 2024

for those following this PR, we have a new related PR in #9405 . I don't see anything obviously wrong to me in it, but thoughts appreciated

@pl-kevinwurster
Copy link

@rouault Oh good! Obviously I have failed to deliver my documentation. Commenting on #9405 right now with some suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug & Feature Request: Python bindings support pyproject.toml, conflicts with setuptools & setup.py
5 participants