Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update python build system #701

Merged
merged 12 commits into from
Apr 16, 2024
Merged

Update python build system #701

merged 12 commits into from
Apr 16, 2024

Conversation

gspowley
Copy link
Member

@gspowley gspowley commented Apr 12, 2024

The goal of this PR is to update the python build system to the more modern scikit-build-core, with inspiration from TileDB-Inc/TileDB-Vector-Search.

Looking for feedback from build experts, especially related to the current Windows failure and releasing to pypi in the future.

@jdblischak
Copy link
Collaborator

related to the current Windows failure

I think the Windows failure is due to the loss of the flag --libtiledbvcf. The Windows build setup is similar to that used by tiledb-vcf-feedstock. The C++ library is installed into the conda environment, and then the Python client finds it via --libtiledbvcf (note that in the feedstock the linux and osx builds also use this strategy).

TileDB-Vector-Search and tiledb-vector-search-feedstock are different. The recipe does not have a separate output for the C++ library. Since the C++ library is always vendored with the Python package, there is no need to have a command line flag to help the Python package find it.

TileDB-VCF is more similar to TileDB-SOMA, and it is still using setup.py. In additional to passing command line flags to setup.py, it also supports setting env vars so that you can use pip install.

I don't know if we have an example Python package that uses scikit-build-core but also maintains the feature of finding an existing shared library. @dudoslav and @ihnorton will have to advise on that aspect.

@jdblischak
Copy link
Collaborator

Also note that the macOS Python build works without --libtiledbvcf only because I put libtiledbvcf in dist/lib, which is exactly where CMake looks for it.

- name: Download libtiledbvcf artifact
uses: actions/download-artifact@v3
with:
name: libtiledbvcf
path: dist

Also, as an aside, I see that this macOS job suffers from the same SIP problems I just fixed yesterday in single-cell-data/TileDB-SOMA#2435. That is why DYLD_LIBRARY_PATH is empty. I'll address that in a separate PR later.

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉. One high-level comment: we should probably make the libtiledbvcf cmake build do a configured install that can be found with find_package instead of the individual find_library calls for libtiledbvcf. Probably not necessary to get this PR working, but if we don't do it here then @dudoslav could take a look as a backlog item.

find_library(
TILEDBVCF_LIB
NAMES tiledbvcf
PATHS "${CMAKE_CURRENT_SOURCE_DIR}/../../dist/lib"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the direct cause of the error, but this will almost certainly be /bin on windows. I believe if you remove /lib then find_library will search both on Windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm maybe not IIUC this override

LIBRARY DESTINATION lib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made windows happy with proper path manipulation, which could move to an improved libtiledbvcf cmake structure.

PATHS "${CMAKE_CURRENT_SOURCE_DIR}/../../dist/lib"
)
endif()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably check TILEDBVCF_LIB-FOUND here to make sure one of the statements succeeded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code to check for success. Moving to find_package would definitely simplify this CMakeLists.txt file.

@gspowley
Copy link
Member Author

I think the Windows failure is due to the loss of the flag --libtiledbvcf.

Thanks @jdblischak, this was the problem. We can now pass in the location of libtiledbvcf with the LIBTILEDBVCF_DIR environment variable.

we should probably make the libtiledbvcf cmake build do a configured install that can be found with find_package instead of the individual find_library calls for libtiledbvcf.

I agree @ihnorton. This PR works for now and @dudoslav could improve it in the future to be more idiomatic.

@gspowley gspowley marked this pull request as ready for review April 13, 2024 22:26
@gspowley gspowley requested a review from awenocur April 13, 2024 22:26
Copy link
Contributor

@dudoslav dudoslav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nit-picks and questions. Before merging please:

  1. Create sdist and wheel packages
  2. For each of these, install them and print out their version (something like print(tilevcf.version) if it exists)
  3. Optional: Also test on Windows, because I remember issues with it in TileDB-Vector-Search

"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this pyproject.toml is missing dependencies, however, even the setup.py does not have any requires (except the build_requires). Doesn't this project really depend on any other packages? (tiledb, tiledb-cloud, numpy, etc?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been a long-standing frustration of mine since it makes it difficult to know what needs to be installed (eg for nightly builds or conda recipes), eg most recently in #669.

I think this overhaul of the Python build system is a good time to explicitly list the required and optional dependencies

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Since VCF has been a conda only install so far, we've push dependency management to the conda environments. We will make the dependencies clear in pyproject.toml.

build-dir = "build/{wheel_tag}"
metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"
sdist.include = ["src/tiledbvcf/version.py"]
sdist.cmake = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is needed. This just runs CMake during sdist packaging process. In Vector-Search we use this to generate version files/query git status, since .git is not present in sdist package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can remove this.

cmake_minimum_required(VERSION 3.21)

project(
${SKBUILD_PROJECT_NAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to use SKBUILD_PROJECT_NAME because it implies that we are going to run this CMakeLists.txt only in scikit-build-core environment. Same goes for the version. But this is just a suggestion, if you think that no-one will ever call this cmake file on its own then keep it as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this CMakeLists.txt will only be called in the scikit-build-core environment. Let's keep it as.

message(STATUS "Searching for libtiledbvcf in ${LIBTILEDBVCF_SEARCH_PATHS}")

# TODO: Use find_package() instead of find_library() to find libtiledbvcf.
find_library(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use find_package? Or if this is not a different package but build by this project then add_subdirectory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was @ihnorton's comment too. The libtiledbvcf cmake files will need some work to support a different approach. I'll add that task as a backlog item instead of blocking this PR.


target_include_directories(
${VCF_TARGET_NAME} PRIVATE
${pybind11_INCLUDE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed. PYBIND headers are included automatically by pybind11_add_module.

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cibuildwheel instead of this script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. These are VCF's first steps towards building wheels and suggestions for standard processes are welcome.

@@ -0,0 +1,7 @@
black
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between the dev requirements and requirements for the package? (This is genuine question I don't know :D )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above (https://github.com/TileDB-Inc/TileDB-VCF/pull/701/files#r1565992899), I agree that this file requirements-dev.txt is confusing because it includes more than just development-only dependencies. Here is how I would classify them based on my current understanding:

  • dev (only needed for local development and CI): black
  • runtime (required to import tiledbvcf because they are hit by __init__.py): pandas, pyarrow, pyarrow-hotfix
  • optional (conditionally used if installed): dask
  • test (required to run the test suite): dask, pytest, tiledb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements-dev.txt was only there for building a local dev environment. As mentioned above, we'll make the dependencies clear in a more standard way.

@jdblischak
Copy link
Collaborator

The nightly builds against libtiledb 2.20 are failing with the CMake 2.29.1 error. @gspowley please rebase your PR onto main to incorporate 23565d2, (#700), which bumps the nightlies to build against libtiledb 2.21 (which had the fix backported)

@jdblischak
Copy link
Collaborator

this was the problem

Great! Glad this was able to fix the Windows build

We can now pass in the location of libtiledbvcf with the LIBTILEDBVCF_DIR environment variable.

@gspowley how strong is your preference for LIBTILEDBVCF_DIR over the existing LIBTILEDBVCF_PATH env var? Are you following a convention for these types of env vars? I would default to keeping the env var name the same for two primary motivations:

  1. Avoid breaking any existing setups. Currently users can install tiledbvcf-py with pip install plus LIBTILEDBVCF_PATH. If the env var name is kept the same, the transition to scikit-build-core would be seamless
  2. Consistency with TileDB-SOMA. The setup.py for tiledbsoma-py uses the env vars TILEDB_PATH and TILEDBSOMA_PATH

@jdblischak
Copy link
Collaborator

I tested this PR in my fork of tiledb-vcf-feedstock.

Note that the failing win-64 build is a known issue due to linking to libfmt (discussed in TileDB-Inc/tiledb-vcf-feedstock#116). The error messages are the same as in the failing CI for those most recently merged commit, TileDB-Inc/tiledb-vcf-feedstock@761620c

@gspowley
Copy link
Member Author

how strong is your preference for LIBTILEDBVCF_DIR over the existing LIBTILEDBVCF_PATH

@jdblischak not very strong. I was pattern matching the environment variables for finding tiledb, htslib, spdlog, catch, and cli11. I value not breaking things over matching names, so we can go back to LIBTILEDBVCF_PATH.

@gspowley gspowley merged commit d7bc780 into main Apr 16, 2024
23 checks passed
@gspowley gspowley deleted the gspowley/update-python-build branch April 16, 2024 13:57
@jdblischak
Copy link
Collaborator

I tested the most recent changes in the feedstock. The linux and osx builds are now failing with the following configuration error that has something to do with aws-c-auth. Note that I replaced the long environment path with $CONDA_PREFIX to improve readability:

https://dev.azure.com/jdblischak/feedstock-builds/_build/results?buildId=744&view=logs&j=656edd35-690f-5c53-9ba3-09c10d0bea97&t=986b1512-c876-5f92-0d81-ba851554a0a3

  -- Found PythonInterp: $CONDA_PREFIX/bin/python (found suitable version "3.10.14", minimum required is "3.6")
  -- Found PythonLibs: $CONDA_PREFIX/lib/libpython3.10.so
  -- Performing Test HAS_FLTO
  -- Performing Test HAS_FLTO - Success
  -- Found pybind11: $CONDA_PREFIX/lib/python3.10/site-packages/pybind11/include (found version "2.12.0")
  -- Searching for libtiledbvcf in /home/conda/feedstock_root/build_artifacts/tiledb-vcf_1713276872393/work/apis/python/../../dist/lib;
  -- Found libtiledbvcf: $CONDA_PREFIX/lib/libtiledbvcf.so
  CMake Error at CMakeLists.txt:122 (install):
    install FILES given directory
    "$CONDA_PREFIX/lib/aws-c-auth"
    to install.


  -- Configuring incomplete, errors occurred!

  *** CMake configuration failed

Also, today I noticed some CMake warnings. These were there previously, so they aren't fatal, but they might be worth addressing:

  Building wheel for tiledbvcf (pyproject.toml): started
  Running command Building wheel for tiledbvcf (pyproject.toml)
  *** scikit-build-core 0.8.2 using CMake 3.29.2 (wheel)
  *** Configuring CMake...
  2024-04-16 14:26:58,194 - scikit_build_core - WARNING - Unsupported CMAKE_ARGS ignored: -DCMAKE_BUILD_TYPE=Release
  2024-04-16 14:26:58,194 - scikit_build_core - WARNING - Unsupported CMAKE_ARGS ignored: -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
  2024-04-16 14:26:58,194 - scikit_build_core - WARNING - libdir/ldlibrary: $CONDA_PREFIX/lib/libpython3.10.a is not a real file!
  2024-04-16 14:26:58,194 - scikit_build_core - WARNING - Can't find a Python library, got libdir=$CONDA_PREFIX/lib, ldlibrary=libpython3.10.a, multiarch=x86_64-linux-gnu, masd=None
  2024-04-16 14:26:58,348 - scikit_build_core - WARNING - Unsupported CMAKE_ARGS ignored: -DCMAKE_BUILD_TYPE=Release
  2024-04-16 14:26:58,348 - scikit_build_core - WARNING - Unsupported CMAKE_ARGS ignored: -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
  loading initial cache file build/cp310-cp310-manylinux_2_17_x86_64/CMakeInit.txt

@gspowley
Copy link
Member Author

The linux and osx builds are now failing with the following configuration error

Thanks for the heads up. I'll follow up with a PR to fix this later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants