Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include spatialindex headers #292

Merged
merged 10 commits into from
Jan 18, 2024

Conversation

JDBetteridge
Copy link
Contributor

It would be great if the spatialindex headers could be included with the packaged binary wheel of Rtree. We have two projects that currently depend on spatialindex:

  • Firedrake, which links to libspatialindex when performing code generation and we hope to in the future transition to using the Python interface that Rtree provides.
  • Supermesh, which also links to libspatialindex, but is minimally maintained for users who depend on the functionality that this library provides.

We would like to move away from building our own copy of spatialindex (twice!) to linking against the shared library packaged with Rtree. To be able to build the projects all we require are the spatialindex headers and the introspection functions that this PR provides.

For reference you can see how we would accomplish this in these PRs:

Please let me know your thoughts and suggestions 🙂

@JDBetteridge
Copy link
Contributor Author

All the CI checks appear to be passing now! Please let me know if you have any requested changes or would like me to tidy the git history, etc

@hobu
Copy link
Member

hobu commented Nov 14, 2023

I'll admit I'm anxious about this change. Is it usual procedure for wheel builders to install header files and advertise their libraries for linkage? Are there other examples of this?

It would seem to me to be a rather fraught policy.

@JDBetteridge
Copy link
Contributor Author

I'm not saying it 100% justifies the practise, but numpy do this, for one example. (import numpy; numpy.get_include())

@mwtoews
Copy link
Member

mwtoews commented Nov 14, 2023

Thanks for the PR! As for the analog with NumPyh, I see headers in numpy/core/include/numpy for both their sdist (source distribution) but also their bdist (built distribution) on PyPI.

As for this PR, I see two behaviors:

  1. If libspatialindex is built and installed (e.g. via ci/install_libspatialindex.* scripts), then the headers are included for both sdist and bdist (like NumPy on PyPI)
  2. If libspatialindex is not installed, then headers are absent from sdist and bdist

Is this expected? (I think so, but just checking).

@JDBetteridge
Copy link
Contributor Author

I think this is the correct behaviour, my intention being that if the libspatialindex being used is not installed by Rtree, then it is either

  • a system package and the headers are in some standard location, or
  • the user has built it themselves and hence knows where the headers are.

If, however, the libspatialindex being used is the one included in a binary distribution of Rtree, currently the headers aren't included.

In this PR those headers are not in a particularly discoverable place, hence the get_include() function. Perhaps this function needs to return something different in case (2) in your above comment to avoid potential issues. My intention was that in this case a user would have no reason to want to call it in the first place.

The PR itself is largely motivated by our desire to perform automatic code generation using libspatialindex, the same way we do with numpy and petsc4py.

@JDBetteridge
Copy link
Contributor Author

Not wanting to be impatient, just want to check the status of this PR 🙂

I think we are in very different time zones, so I appreciate communication might take more time!

@mwtoews
Copy link
Member

mwtoews commented Nov 22, 2023

I've been thinking more about headers for sdist and bdist. The shared library is only bundled with the bdist, so perhaps headers should only be included with these builds? Whereas the sdist doesn't imply any specific version of libspatialindex, thus shouldn't need to include headers.

@JDBetteridge
Copy link
Contributor Author

Thanks for the feedback, I will address this as soon as I get a chance. In the sdist case I imagine that the best thing for the get_include() introspection function to return is an empty string.

@JDBetteridge
Copy link
Contributor Author

I realise that fixing this requires minimal changes. Because the sdist is created in a separate CI step ("Package and push release") in an isolated environment, you already get the desired behaviour. Looking in the previous logs, you can see when building the sdist a warning is raised warning: no files found matching '*.h' under directory 'rtree/include'. I believe that if you were to additionally upload the sdist artifact (which the below patch would do, but I do not include it as a commit in case it is undesirable behaviour!) you would see that the include directory is absent.

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index e43b341..274d8e8 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -138,6 +138,10 @@ jobs:
             python3 -m pip install setuptools numpy pytest wheel
             export PATH=$PATH:/home/runner/.local/bin
             python3 setup.py sdist
+      - uses: actions/upload-artifact@v3
+        with:
+          name: sdist
+          path: dist/*.tar.gz
 
       - uses: actions/download-artifact@v3
         with:

Regardless, I have fixed the get_include() function to return an empty string if running as part of an sdist installation. This is consistent with the behaviour of get_libraries().

@JDBetteridge
Copy link
Contributor Author

Sorry if I didn't make this obvious, but I think I'm done here! Polite bump 🙂

@hobu
Copy link
Member

hobu commented Dec 8, 2023

I'm not enthusiastic about this but I'm not going to stand in the way. Because Mike has been so active with Rtree lately, I'd feel a lot more comfortable if he presses the button.

@mwtoews
Copy link
Member

mwtoews commented Dec 10, 2023

Hi yes, I'm happy to go forward with some version of this PR, but will still need some time to do a more thorough review, so appreciate the time. I hope to find a bit more time over the week.

@mwtoews
Copy link
Member

mwtoews commented Jan 11, 2024

Note that I've not forgotten this PR, and have a few local edits that I've made that I'll push here when I'm ready.

@JDBetteridge
Copy link
Contributor Author

Awesome! Thanks for the update 🙂

@mwtoews
Copy link
Member

mwtoews commented Jan 16, 2024

Ok, back on this for more questions.

What is the purpose of get_libraries(), and who would use it? It seems a bit out of place. If anyone really needs it, I suppose they could just do this:

import rtree
rtree.core.rt._name  # /path/to/lib/libspatialindex_c.so.6

As for get_include(), this mirrors nicely with numpy.get_include(). But could I move it to rtree.finder.get_include()? I have a few adjustments to make it work for Python 3.8 and installs that get libspatialindex via conda-forge or Debian-like users with libspatialindex-dev installed. I also would like to add a bit of testing.

@mwtoews
Copy link
Member

mwtoews commented Jan 17, 2024

Back to @JDBetteridge for review. A summary of relevant changes:

  • Moved get_include() to rtree.finder because it is a more appropriate module
  • Adapted it to search for headers in priority from (1) binary wheel (using importlib.metadata), (2) sys.prefix for conda, (3) /usr for system package install (e.g. libspatialindex-dev).
  • Removed rtree.core.get_libraries() and rtree.core.get_library_name() since this a bit out of scope

Some other notes is that a binary wheel with lib and include is created via python -m build --wheel, but not with python -m build, because the later command builds the wheel from the sdist. I've reworked this PR to exclude lib and include from the sdist.

The built wheel artefacts can be found here, which can be downloaded unzipped and installed via pip.

Any feedback welcome!

@JDBetteridge
Copy link
Contributor Author

Thanks @mwtoews!

To answer your first question, I added get_libraries() mainly as a convenience function for our use case, which is to compile small C kernels using symbols from the spatialindex shared library. However, I will concede that this is a fairly niche use case and I'm happy to use your suggestion of obtaining the path and shared library from the private attribute rt._name.

  • I agree that rtree.finder is a better place for the get_include() function.
  • I am not a *conda user, but the header priority appears correct. Just to double check: Your intention with this priority is to have users who install the binary wheel to use the bundled libspatialindex_c.so over any system spatialindex, unless they set the SPATIALINDEX_C_LIBRARY environment variable? I don't disagree with the choice, I just want to check I have understood correctly.
  • My key aim for this PR was to get the headers bundled with the wheel, the library introspection functions can be omitted.

I'm very happy for this PR to be merged in its current state. I have run our CI on this branch today and everything is working as expected.

@mwtoews
Copy link
Member

mwtoews commented Jan 18, 2024

The environ SPATIALINDEX_C_LIBRARY can be used to load a specified shared library at runtime, so yes you are understanding this correctly. It's mostly a developer feature, or where multiple instances of the library are available.

Thanks for testing and feedback!

@mwtoews mwtoews merged commit d816ef2 into Toblerity:master Jan 18, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants