Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug & Feature Request: Python bindings support pyproject.toml, conflicts with setuptools & setup.py #8069

Closed
latot opened this issue Jul 6, 2023 · 19 comments · Fixed by #8926

Comments

@latot
Copy link

latot commented Jul 6, 2023

Hi all, this issue started with #8024

Bug

Oks, lets start, first setup.py

I has two steps.

  1. Check libs and options to be used in the setup statement (including check if numpy exists)
  2. Run setup statement

This workflow has worked in the time, but the reason now why is failing is the next one, actually, setuptools does not allow any more import any module that is not declared in setup_required param of the setup statement, and the setup statement must be executed in order to be available to import that module, and will only by available after the setup statement.

This causes that when the python module needs to be compiled the numpy import will fails, and it will trigger the except statement saying to GDAL that numpy is not installed, because the numpy check needs to be executed before the setup statement.

numpy_include_dir = '.check if numpy is installed or not.'
try:
   #get_numpy_include() calls import numpy
    numpy_include_dir = get_numpy_include()
    HAVE_NUMPY = numpy_include_dir != '.'
    if not HAVE_NUMPY:
        print("WARNING: numpy found, but numpy headers were not found!  Array support will not be enabled")
except ImportError:
    HAVE_NUMPY = False
    print('WARNING: numpy not available!  Array support will not be enabled')

So, the rules that are in conflict right now:

Numpy must be checked before the setup statement
Numpy can only be imported after setup statement with the right params to allow the import

Is hard to see this problem?

Well... yes, lets go step by step.

In debian or ubuntu there is the package libgdal-python, which install all precompiled, so there is no need to do nothing of the above mentioned. Even is probable to python-numpy is installed as a dep of gdal or any other package, so any try to use the system python gdal will just works.

With venv is different, there will be no gdal bindings, no numpy from the start, and actually the warning of no numpy is just that, usually passes with all the text without someone notice it happens.

So, in order to find this bug, you need a setuptools with the actual rule of blocked imports, use venv to isolate a system, and then run with pip install -vvv without wheels to just read the warning that tells you numpy was not detected.... and if you know numpy is installed, just now you know "here is a problem"

Solution

Oks, I have asked on python matrix, pip dev, setuptools comunity.... is there a way to solve this? yes.

But first, this is not directly a problem on GDAL, actually setuptools has a lot of problems, one of them is that you are not able to handle the setup env.

Here is where the solution comes, use pyproject.toml, this replaces setup.py and have the features to be able to handle this cases, even setuptools now is legacy due to the problems it has.

They send me this link: https://godatadriven.com/blog/a-practical-guide-to-setuptools-and-pyproject-toml/

There is this one too: https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/

I was not able to find a solution with setuptools, but maybe this way is safer for mid and long term.

Special thx @rouault with advice and help debugging this!

Thx!

@rouault
Copy link
Member

rouault commented Jul 6, 2023

you need a setuptools with the actual rule of blocked imports

@latot What do you mean by the above ? How to configure that ?

@latot
Copy link
Author

latot commented Jul 7, 2023

Well... is weird, actually setuptools blocks imports, that means you can't import anything, or, well you can run the import statement but will always says there is no module, except if you pass the setup statement with the setup_require param with the module, and you can import after that call.

The weird part, is "block the imports" is right now the default behavior...., you should not do anything to activate that...

I have the next ideas why this:

  • Maybe depends on python version, I have tested 3.10.12 and 3.11.4, in both the block works
  • Maybe the setuptools version, I have the next setuptools, scm 7.1.0 and rust 1.6.0. Both python with the same versions
  • Debian/Ubuntu changes setuptools to disable this feature

@latot
Copy link
Author

latot commented Jul 7, 2023

This is an example of setup.py to show what happens with the import and when, with the expected behavior of setuptools.

#This will fails with no numpy module, in a try will trigger the catch
import numpy
setup(
  ...
  setup_requires(["numpy"]),
  ...
)
#This one will works fine
import numpy

Here I asked on setuptools, where says the solution for this is using pyptoject.toml: pypa/setuptools#3971

Thx!

@avalentino
Copy link
Member

@latot yes, pyproject.toml is the way to specify the build system and the build-time dependencies (i.e. the packages that must be there in order to be able to run setup.py).

If there is interest I could try to provide a patch.

I would also suggest to move as much as possible form setup.py to configuration files like pyproject,toml itself or setup.cfg.

even setuptools now is deprecated due to the problems it has.

AFAIK distutils has been deprecated and removed form Python 3.12.
Honestly I'm not aware of the deprecation of setuptools.
Could you please provide a pointer?

@latot
Copy link
Author

latot commented Jul 7, 2023

Hi, I used the wrong word sorry, is not deprecated, now is legacy, you can check that on the docs of pyproject.toml

https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/

I changed the word on the main post, from deprecated to legacy to be clear.

I would like to do a patch, but right now I'm out of time, debug this issue was really slow D:

@rouault
Copy link
Member

rouault commented Jul 7, 2023

If there is interest I could try to provide a patch.

That would be welcome. But cf my question at pypa/setuptools#3971 (reply in thread)
And I also guess that we'd also want numpy to be automatically taken into account if it is already available, as today, at least in environments where already installed dependencies are available (I couldn't reproduce @latot issue on my env where "pip install numpy" before "pip install gdal" is enough for numpy to be usable by GDAL)

Hum, wondering if having numpy as a required dependency wouldn't make things simpler. I presume most people who use GDAL Python bindings enable numpy. Perhaps something to raise on the mailing list to see if there are reactions to such proposal.

@latot
Copy link
Author

latot commented Jul 7, 2023

If there is interest I could try to provide a patch.

That would be welcome.

I think that would be great too, have the new way and skip problems now and in the future.

Hum, wondering if having numpy as a required dependency wouldn't make things simpler. I presume most people who use GDAL Python bindings enable numpy. Perhaps something to raise on the mailing list to see if there are reactions to such proposal.

I think that would do a lot of things simpler, at least form my perspective.

  • GDAL would support raster in a native way (without numpy there is no full raster support)
  • Don't have raster support from the GDAL bindings could be even unexpected
  • Have numpy simplifies the installation script, the script import from: sys, traceback, tempfile, subprocess, numpy, os, glob and setuptools. IIRC numpy is the only module that does not comes with python.
  • Install GDAL again for rasters is annoying, we need to uninstall gdal, install numpy and install again gdal without forget remove wheel cache.
  • As described, know when this problem is active is not always easy, with numpy as a dep will be explicit if there is a problem with this dep

So I think would be great!

@avalentino
Copy link
Member

That would be welcome. But cf my question at pypa/setuptools#3971 (reply in thread)

I think that the use case of an optional build dependency is not addressed by any of the build backends AFAIK.

By the way one could face the problem form a different perspective.
If the additional module requiring numpy is in the pre-build wheel package, then you can decide if you want numpy in your environment just runing

$ pip install gdal[numpy]

or

$ pip install gdal

In this case we would have a (mandatory) build dependency on numpy and just an optional dependency at runtime.

The problem at this point is that currently binary wheels for GDAL are not produced so the above commands will always result the download of numpy. If the isolated build is used then numpy will not go in the user environment unless explicitly requested ... but I need to check this.

And I also guess that we'd also want numpy to be automatically taken into account if it is already available, as today, at least in environments where already installed dependencies are available (I couldn't reproduce @latot issue on my env where "pip install numpy" before "pip install gdal" is enough for numpy to be usable by GDAL)

for this, I think you need to use

$ pip install --no-build-isolation [...]

Hum, wondering if having numpy as a required dependency wouldn't make things simpler. I presume most people who use GDAL Python bindings enable numpy. Perhaps something to raise on the mailing list to see if there are reactions to such proposal.

I don't think I've ever heard about people using GDAL Python bindings without numpy.
I think that it would make perfectly sense to discuss it on the mailing list.

@latot
Copy link
Author

latot commented Jul 7, 2023

@avalentino Great to know all that.

An extra part, that is described above, is that setuptools, right now, only allows you to import a module when is declared in setup and import after the call: #8069 (comment) , is more well explained in the main post.

So even if you have numpy installed, you will not be allowed to import numpy with the actual setup.py.

With the actual behavior of setuptools even with pip install gdal[numpy] in a clean venv the installation will fail... still a mystery to me why debian/ubuntu are not having the default behavior of setuptools.... what setuptools versions are you using?

@rouault
Copy link
Member

rouault commented Jul 7, 2023

@latot

$ python3 -m venv myvenv

$ source myvenv/bin/activate

$ pip list
Package    Version
---------- -------
pip        21.2.3
setuptools 57.4.0

$ python3 -m pip install -U pip setuptools wheel numpy
Requirement already satisfied: pip in ./myvenv/lib/python3.10/site-packages (21.2.3)
Collecting pip
  Using cached pip-23.1.2-py3-none-any.whl (2.1 MB)
Requirement already satisfied: setuptools in ./myvenv/lib/python3.10/site-packages (57.4.0)
Collecting setuptools
  Using cached setuptools-68.0.0-py3-none-any.whl (804 kB)
Collecting wheel
  Using cached wheel-0.40.0-py3-none-any.whl (64 kB)
Collecting numpy
  Using cached numpy-1.25.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
Installing collected packages: wheel, setuptools, pip, numpy
  Attempting uninstall: setuptools
    Found existing installation: setuptools 57.4.0
    Uninstalling setuptools-57.4.0:
      Successfully uninstalled setuptools-57.4.0
  Attempting uninstall: pip
    Found existing installation: pip 21.2.3
    Uninstalling pip-21.2.3:
      Successfully uninstalled pip-21.2.3
Successfully installed numpy-1.25.0 pip-23.1.2 setuptools-68.0.0 wheel-0.40.0

$ pip list
Package    Version
---------- -------
numpy      1.25.0
pip        23.1.2
setuptools 68.0.0
wheel      0.40.0

$ python3 -m pip install gdal[numpy] --verbose
Using pip 23.1.2 from /home/even/gdal/gdal/build_cmake/myvenv/lib/python3.10/site-packages/pip (python 3.10)
Collecting gdal[numpy]
  Using cached GDAL-3.7.0.tar.gz (775 kB)
  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info
  writing /tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/SOURCES.txt'
  writing manifest file '/tmp/pip-pip-egg-info-epo1biei/GDAL.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy>1.0.0 in ./myvenv/lib/python3.10/site-packages (from gdal[numpy]) (1.25.0)
[....]

@latot
Copy link
Author

latot commented Jul 7, 2023

@rouault Can you try upgrading pip + setuptools?

remember no wheels O.o

@latot
Copy link
Author

latot commented Jul 7, 2023

Hi!, I was able to reproduce it in debian!

python3 -m venv pyenv
source pyenv/bin/activate
python -m pip install --upgrade pip setuptools
pip install numpy
#no -vvv no logs about numpy
pip install -vvv gdal
  WARNING: numpy not available!  Array support will not be enabled

@rouault
Copy link
Member

rouault commented Jul 7, 2023

Hi!, I was able to reproduce it in debian!

ok I can now reproduce "pip install numpy" being ignored by "pip install gdal", when not installing wheel (I was probably confused by one of your previous reports)

@pl-kevinwurster
Copy link

@rouault I just stepped through this problem and would like to echo @avalentino's summary (#8069 (comment)) – especially the bit about --no-build-isolation.

Ultimately I agree that this is a flaw in the Python packaging ecosystem. Ideally the Python Packaging Authority would have provided one of these paths:

  • Install extra requirements before the primary package.
  • Provide a mechanism for modifying setup_requires dynamically based on which extra is being installed. I believe this is the recommended solution per the discussion on pypa/setuptools that @latlot opened, although I find it to be overly complicated.

However, I am also struggling to see why a user would want to install the gdal Python package, and not Numpy.


The current method of installing numpy before gdal also means that it is very difficult to get a stable version of numpy installed. For example, lets say that:

  • I am a user attempting to install some package that lists a numpy>=1.20 dependency.
  • I am installing package into an environment that already has numpy v1.17.

In order to install gdal I need to install Numpy first, however I cannot find a mechanism for discovering that I actually need to install numpy>=1.20. Ideally I could do:

$ pip install package --only-deps

to install only package's dependencies, but this option does not exist. Issue pypa/pip#11440 requests this feature, but it has not yet been implemented. As far as I can tell, pip does not provide a mechanism for listing a package's dependencies without installing it.

So, if I invoke these commands to install gdal, the result is that gdal's extensions are compiled against numpy v1.17, which is then replaced by whatever satisfies numpy>=1.20. Given how gdal uses numpy I suspect this doesn't matter – especially since I suspect this is happening today in many installations and it seems nobody has complained. But it is still surprising behavior that could present problems in the future, possibly once numpy v2.0 is released?

# Discovers that 'numpy' 1.17 is already installed and does nothing
$ pip install numpy

# Compiles 'gdal' against 'numpy' 1.17
$ pip install gdal --no-build-isolation --no-binary gdal

# Uninstalls 'numpy' 1.17 and installs some version >= 1.20
$ pip install package

@rouault
Copy link
Member

rouault commented Dec 4, 2023

Mailing list probed about requiring numpy: https://lists.osgeo.org/pipermail/gdal-dev/2023-December/058064.html

@pl-kevinwurster
Copy link

Great, thank you!

@nilason
Copy link
Contributor

nilason commented Dec 4, 2023

Some related background: https://peps.python.org/pep-0517/

rouault added a commit to rouault/gdal that referenced this issue Dec 6, 2023
…ult, unless the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY env var is set (refs OSGeo#8069)
@rouault
Copy link
Member

rouault commented Dec 6, 2023

cf #8926 for hopefully a resolution of this issue

rouault added a commit that referenced this issue Dec 14, 2023
…nt (#8926)

Fixes #8069.

Hopefully a compromise that will satisfy most needs...
- add a pyproject.toml with numpy as a build requirement
- make absence of numpy at build time an error by default, unless the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY env var is set.
- despite numpy being a build requirement, numpy remains an optional dependency at install time if users just install with "pip install gdal" (for folks that would do vector only operations with GDAL)
- for people needing numpy support, they have to use "pip install gdal[numpy]".

* swig/python/CMakeLists.txt: check return value of execute_process() at various places
@Jonasmpi
Copy link

Jonasmpi commented Apr 3, 2024

I dont want to open this feature but after going back to gdal 3.6.2 with python3.11 and this answer solved the issue for me. I tried every suggestion here and also everthing i could find on the internet. So maybe this will help someone in the future.

https://gis.stackexchange.com/a/465888

My adapation for my Dockerfile

#To make sure its not installed by any means
RUN pip uninstall -y gdal || true

RUN pip install numpy

RUN pip install -U setuptools wheel

RUN pip install --no-build-isolation --no-cache-dir --force-reinstall gdal==3.6.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants