Skip to content

Vendoring Python Wheels as Artifacts #1439

@staticfloat

Description

@staticfloat

We have a pretty good python interop story, but it lacks many of the reproducibility guarantees that Pkg3 has; in particular, when using python packages from the system, or even from Conda.jl, because the python dependencies are managed separately from the Julia dependencies, it is possible for Julia and python packages to get out of sync and break. To resolve this, I propose a technique for creating python "virtual environments" for more fully controlled python installations.

System design constraints:

  • pkg> instantiate should "just work". No matter how much time has passed, you must be able to get back the same python packages as you had before, so that PyCall, IJulia, etc... can all "just work" far off into the future, no matter how much breaking progress the python ecosystem experiences.

  • Upgrading Python packages should be simple. Not via pkg> upgrade, but through some relatively simple mechanism.

  • Isolation from the system. System python packages should not interfere or aid in these packages at all.

Reading this list of design constraints, you might think that this sounds an awful lot like what I've been working on towards JLL packages/Pkg Artifacts, and you would be correct. At least I'm consistent in the kinds of ideas I come up with. Since Artifacts are the 'marteau du jour', as it were, let's recklessly apply them here and see what kind of a system we can create:

  • Bundle a python interpreter as an artifact, e.g. Python_jll. Not too difficult.

  • Translate python packages into artifacts. something like translate_py_pkg(name::String, version = nothing) would hit PyPI's JSON API for a listing of versions, generate an Artifacts.toml entry for that python package by downloading, extracting and tree-hashing the python package.

    • Pure-source python packages are usually tarballs
    • Wheels are zipballs (we'll need .zip support for this....)
    • Explicitly do not support any kind of python package that is not pure-source and is not a wheel. Anything else probably requires arbitrary code execution upon download.
  • Once python packages are being downloaded as artifacts, we set PYTHONPATH appropriately before loading libPython or invoking python, so that these packages are being found properly.

  • Future invocations of the Julia package manager will see these binary blobs that are attached to the current project, and will properly re-instantiate them from PyPI.

There's some subtlety here related to the implicit Python compiler ABI. In particular, on Windows, they assume usage of MSVC, which is fine, except when you start compiling C++ code. It's highly unlikely that Python wheels that contain C++ code will link properly to Julia. This has never and probably will never worked though, so we don't lose that much here. C and FORTRAN code should work together just fine, so we should be okay in 95% of what we want to do, and if you want to do something more complicated, you can always just spin up a Python interpreter compiled properly and communicate over a socket.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions