Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join forces? #18

Open
JanWielemaker opened this issue Nov 1, 2023 · 15 comments
Open

Join forces? #18

JanWielemaker opened this issue Nov 1, 2023 · 15 comments
Labels
question Further information is requested

Comments

@JanWielemaker
Copy link

Comment:

Hi Team,

I'm the lead developer of SWI-Prolog. Possibly you are ware that SWI-Prolog has a new Python interface called "Janus". Given that, we want an easy way to get Python, SWI-Prolog and Janus in its two incarnations (as a Prolog library to embed Python and as a Python package to embed Prolog) installed everywhere. As is, the Prolog build picks up Python when the embedding headers and library are found. Pip may be used to build Janus (named janus_swi to avoid ambiguities with an existing package as well as Janus for XSB Prolog) as a Python package on Linux, MacOS and Windows (using MSVC).

Conda came to mind and I started a feedstock. You can find that at https://github.com/SWI-Prolog/swi-prolog-feedstock

Today I was happy to find this repo. My aim is to come to a recipe that supports at least Linux, MacOS and Windows. Linux and MacOS are working, Windows somehow stalls during the build.

Are you interested merging my work into this repo? My aim is to make it cross platform, support the new Janus interface and come with regular updates that follow the development branch if this can be achieved with little overhead. My hope is that there is some more knowledge about Conda to help streamlining the feedstock.

   --- Jan
@JanWielemaker JanWielemaker added the question Further information is requested label Nov 1, 2023
@bollwyvl
Copy link
Contributor

bollwyvl commented Nov 1, 2023

Certainly happy for the help. Please make an issue to request adding yourself to the maintainer team. The biggest thing that would help is refactoring this to generate multiple outputs, so we can emit smaller individual packages to remove e.g. the openjdk dependency, akin to the debian split. This increases build time, but makes it easier to downstreams to select just the features they need, while still getting a coherent solve.

As to language-specific binding outputs: initially, it would likely only make sense to export other language bindings if they follow the same version scheme, and/or are released from the same repo. The main challenge is that each factor (increasingly) increases the build matrix size and time-to-complete. If janus is fundamentally a separate project, with a different version, it may be better handled as a downstream, which depends on a specific version of the upstream headers. The win here is it when a new python release drops, it would be possible to rebuild just the python pieces, rather than the whole system.

As to tracking development upstreams: this is possible, but typically is done on a case-by-case basis, on a separate, long-running branch, and with a custom channel_targets, e.g. conda-forge/label/swi_prolog_dev. The bot automation which is the core of conda-forge basically won't handle any version updates on such branches, but will handle platform/compiler upgrades.

@JanWielemaker
Copy link
Author

Thanks for the supportive feedback. I hope to fix the Windows building issue first. Then I'll come back here and see whether we can deal with the three issues. With some luck we each understand the other half of the problem 😄 Might take some time (read, possibly weeks).

@JanWielemaker
Copy link
Author

If janus is fundamentally a separate project, with a different version, it may be better handled as a downstream, which depends on a specific version of the upstream headers. The win here is it when a new python release drops, it would be possible to rebuild just the python pieces, rather than the whole system.

To answer this: "half". Building the Prolog extension is part of the rather involved Prolog build process. Building as a Python package it is a stand-alone git submodule of the SWI-Prolog repo that is built using pip and setup.py finds the SWI-Prolog headers and libraries.

@bollwyvl
Copy link
Contributor

bollwyvl commented Nov 1, 2023

submodule

conda-forge really, really prefers building from canonical tarballs... if this is the road that needs to be taken, then having a "fat" upstream source release, with all de-referenced submodules included and a published sha256 sum, would be substantially better.

@JanWielemaker
Copy link
Author

canonical tarballs

That is available for each SWI-Prolog release. Not for the Python interface alone, but of course that can be created.

@JanWielemaker
Copy link
Author

Sorry for the delay. Working on install scripts is not really my favorite, in particular on Windows where each build takes 16 minutes ... Anyway, at https://github.com/SWI-Prolog/swi-prolog-feedstock is now my recipe that works on Linux, MacOS (tested arm64 only) and Windows using VS2022. That is, it produces a complete SWI-Prolog environment with bi-directional integration to Python. I won't say it is a good feedstock (it is not), but SWI-Prolog received the changes needed to make this works smoothly and the feedstock contains all dependencies. Some remarks

  • As is, it works using GIT as source. That is way easier for testing. In a few days I probably release 9.1.20 and we can use the tarball thereof.
  • The Windows version uses VS2022. This produces a complete, but rather slow version. MinGW can produce much faster binaries. Are Windows Conda versions either MSVC or MinGW based, or do they allow for a mix of compilers?
  • Something similar applies to MacOS, though Clang ends somewhere between GCC and MSVC.
  • I'm not so happy with the way dependencies are selected. For many, I'd simply like to say "get me this dependency if you have it" as the installers takes a plan B if the package is not available. I would like to distinguish required and optional dependencies. Is that possible?

And then we have the issues of outputs, channel_targets and abi_migration_branches. Can someone make a proposal for that? Roughly, we have

  • The core system. That requires some version of zlib and optionally gmp. gmp binding currently only works on Linux. Without, it used a modified bundled version of LibBF for the same functionality (but generally slower).
  • A lot of extensions that have no dependencies (either Prolog code or Prolog+stand-alone C/C++)
  • Various extensions with large dependencies: ssl (OpenSSL or LibreSSL), odbc (UnixODBC, native Windows ODBC), JPL (any Java SDK), archive (libarchive, needed to install Prolog packs, so you want that), Janus (Python) and xpce (X11, for Windows the native Win32 API).
  • Janus creates two extensions: one to load Python into Prolog and one to load Prolog into Python.

@bollwyvl
Copy link
Contributor

bollwyvl commented Dec 3, 2023

On the main, I'd just start a pull request to conda-forge once the tarball is available. It could even start with just linux-64, and initially just end up on conda-forge/label/swi_prolog_experimental, but this can all be handled after a PR is started.

MacOS (tested arm64 only)

conda-forge can only cross-compile and not test for that platform on CI at the moment, which will of course complicate things. Getting osx-64 working and tested first is a gate that would have to be passed before worrying about the larger headache.

use the tarball

Yep, hard block on that.

mix of compilers

If the compiled thing is an entirely standalone binary, then yes. If it links against python/numpy/gdal/etc, then no, I don't believe so. It's also possible to have variants that use multiple compilers, but takes rather a lot of work.

I am also not an expert on these topics, and this should likely be raised to (pared down) version of that question on gitter to get a current ruling, as i can't find the docs for it.

required and optional dependencies

Optional dependencies aren't really a thing in conda. The only real solution at scale is multiple recipe outputs from a single recipe that scope down the requirements.

The original design of outputs had a single #/outputs/*/files with a list of things to copy, or a script that copied them dynamically, but it's also possible to have individual build files.

If they aren't linked in as binaries, such extra outputs can be pretty "dumb" metapackages, that just gather run dependencies, but if they are actually linked, they'll need to be properly declared on the package that deploys the thing in /lib or whatever that need them.

A lot of extensions

Not sure what scale this is relevant: independently-packaged extensions, or extensions created through this recipe? Either way, if possible they should just be separate packages. Indeed, if the python side of the house can be built independently, it can be whole separate recipe.

@JanWielemaker
Copy link
Author

Thanks. I guess we can simply disable the tests for osx/m*? Packaging and dependencies is still somewhat unclear to me. Let me try to describe.

The core system comes with a number of optional extensions, some of which have serious external dependencies. The build process for these dependencies is pretty much wired into the CMake build infrastructure though. I think I understand I could use one build project and then use multiple output sections that create multiple Conda packages, each with its own runtime requirements? That would be pretty close to what is done for the Debian packages. Is it possible to trigger an action on installing or removing such a package? The current system maintains a single library index file that needs to be rebuilt after adding or removing extensions.

For embedding Python into Prolog, this is just one of the extensions as mentioned above. The Python package to embed Prolog can be build as an independent package from its own source. This requires a tarball from one of the git submodules. Is the best option to distribute that to PyPi as a pip package first?

@bollwyvl
Copy link
Contributor

bollwyvl commented Dec 7, 2023

simply disable the tests

generally tests (or even built binaries) are never executed for cross-compiled-under-emulation artifacts.

use one build project and then use multiple output sections that create multiple Conda packages, each with its own runtime requirements?

yes, as long as the thing-under-test at the lowest level thing (e.g. swi-prolog-base) doesn't fail to run at all without the external dependencies installed.

There are a number of syntax helpers for making sure these end up with a robustly-solveable packaging. In the "one-true-build" pattern, where a move-files.sh knows how to correctly deploy all the little bits and pieces based on ${PKG_NAME}:

build:
  script: # all the main complexity here
requirements:
  build:
    - {{ compiler('c') }}
    # whatever else
   host:
    - python # this gets expanded in the ci matrix
outputs:
  - name: swi-prolog-base
    script: move-files.sh
    dependencies:
      run: # ...
    test:
     commands:
      - swi-prolog --version 

  - name: swi-prolog-python-inside-prolog
    script: move-files.sh
    dependencies:
      run:
        - python  # will be magically pinned
        - {{ pin_subpackage('swi-prolog-base', exact=True) }}
   test:
     commands:
      - swi-prolog-but-using-python-somehow 

  - name: swi-prolog-prolog-inside-python
    script: move-files.sh
    dependencies:
      run:
        - python  # will be magically pinned
        - {{ pin_subpackage('swi-prolog-base', exact=True) }}
    test:
     imports:
       - swi_prolog_inside_python 
     commands:
       - pip check  # ensure any _other_ python deps aren't screwed up
       - swi-prolog-but-using-python-somehow  --version
     requires:
       - pip

  # the 12 days of swi-prolog...

 - name: swi-prolog-and-a-pear-tree
   dependencies:
    run:
     - python
      - {{ pin_subpackage('swi-prolog-base', exact=True) }}
      - {{ pin_subpackage('swi-prolog-python-inside-prolog', exact=True) }}
      - {{ pin_subpackage('swi-prolog-prolog-inside-python', exact=True) }}

git submodules

no git checkouts or submodules will be used in conda-forge CI. this is not my call.

the biggest win is to have upstream CI build and upload a "fat" tarball as part of CI release (not git(hub|lab) workflow artifact, which don't get real URLs) process. ideally, the release artifacts also includes a SHA256SUMS that includes the sha of the fat tarball, as these won't change randomly based on decisions made by the hosting platform. even more ideally is if every single LICENSE, COPYRIGHT, NOTICE, whatever, of every vendored (or build-time-downloaded) thing is also extracted out into a well-known location either in the "fat" tarball, or at the end of the build.

For embedding Python into Prolog

if this builds against libpython, this will currently, unavoidably, trigger os x arch x python builds, as conda does not support the relatively newfangled ABI stuff for "universal binary" wheels.... which apparently don't even work, yet, introducing yet another thing, beyond the python major version, sadly.

PyPi as a pip package first?

yes, canonical source upstreams separate from forge URLs are always a win. this can take some heat off building the main "platform" package, where CI timeout possibility is very real.

In the case of PyPI: a .tar.gz sdist the preferred input, even if it's... not really viable for a drive-by pip install user to be able to build the thing, but helps the overall understandability of the system (thing CVE reporting, etc).

Everyhting that can be a separate feedstock is a win, even if they are built from the same upstream fat tarball.... unless the above x python matrix factor is already being exercised, and the marginal time for building the package is "trivial".

@JanWielemaker
Copy link
Author

Thanks. I think I get the idea 😄 The arm64 version could be a real problem if the system is created using cross-compilation. Several parts of the build process run Prolog itself. The build system supports two routes for cross compilation. The simple route is if there is an emulator to run the binary (e.g., Wine for cross-compilation for Windows on Linux). The complicated route is to also generate a compatible native Prolog in the same CMake build tree and use the native executable to run the Prolog steps of the build. I think we should skip that for now ...

I'll first look into avoiding shared index files that complicate adding and removing extensions. That would also simplify various Linux distributions. Next, give the outputs thing a try. More will follow ...

@JanWielemaker
Copy link
Author

Sorry for the delay. I did the rewrites to make generate sub-packages and combining them easier. The current result is at https://github.com/SWI-Prolog/swi-prolog-feedstock/tree/outputs

Currently only works on Linux. I'll take care of the porting after we get the structure right for Linux.

I should have re-read your skeleton more carefully as I see you can use a single move-files.sh and make it work depending on the package name. I now use one per output and use a Python script to avoid having to write Windows and non-Windows versions. Conda complains on these Python scripts though using

WARNING: Not detecting used variables in output script /home/janw/src/conda/swi-prolog-feedstock/recipe/install-ssl.py; conda-build only knows how to search .sh and .bat files right now.
WARNING: not adding activation to /home/janw/miniconda3/conda-bld/swi-prolog_1703194081926/work/install-tests.py - I don't know how to do so

for this file type

I use the CMake components to create the packages. That works nice, but I think it is using the host cmake rather than the Conda one, despite I added cmake as a build requirement for each output. Is there a problem installing the component calling cmake as part of the output script?

As is, I have no tests. I'll figure out a way to do this cleanly. I did add a swi-prolog-tests package that you can install alongside the conda packages and that enables running the complete regression test suite in the installed environment. We could also make all other swi-prolog sub packages requirements of this and then run the test? Questions:

  • Is it a bad idea to use Python for moving the files? Should I write .sh and .bat files instead?
  • Can cmake be used for moving the files?
  • For the Python module I now use python -m build . in build.sh and use pip to install the wheel in (now) install-python.py. Is that ok?
  • Does the selection of outputs make sense?
  • Do you have a good reference to this "pinning". I see it mentioned in several places, but I haven't found a good description on what it actually does.

Thanks --- Jan

@bollwyvl
Copy link
Contributor

bad idea to use Python for moving the files? Should I write .sh and .bat files instead?

.py is fine:

Currently the list of recognized extensions is py, bat, ps1, and sh.

Not sure about the warnings, but presumably they are there for a reason... can't say for sure. A relatively useful approach is to include posix in the win-64 enviornments so you can use only .sh, and avoid any weird, unexpected python special cases (aside from packages that are actually python relevant).

cmake be used for moving the files?

I don't know. Some environment detection and rewriting may not be correct, and, as mentioned above, it may not be using the correct cmake. Looking for active prior art with both outputs and cmake is probably better than relying on my incomplete knowledge.

use python -m build . in build.sh and use pip

Ensure that $PTYHON -m build --no-isolation and $PYTHON -m pip install --no-deps --no-build-isolation are being used to avoid any surprises.

selection of outputs make sense?

Haven't looked: would start a (draft) PR against this repo so that there can be line-by-line discussion rather than telephone. Here is an experimental branch to target, which would upload to a new conda-forge/label/swi_prolog_experimental label when PRs are merged so nobody's workflows break.

https://github.com/conda-forge/swi-prolog-feedstock/tree/experimental

reference to this "pinning".

Other than the conda-build docs, linked above, there are many examples in the wild that again represent more knowledge than I have.

@bollwyvl
Copy link
Contributor

bollwyvl commented Feb 15, 2024

Over on #20, the bot is starting to find janus, etc. in the distribution. Going to go ahead merge that, and not go digging around on how to do the new builds, and will wait to hear whether a PR is coming for the work described here (splitting/new package outputs) before looking into it more deeply.

@bollwyvl
Copy link
Contributor

Actually, on second thought/build that failed, so will not merge that (or go digging much further) and await some discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants