Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: support bundling libmagic #233

Closed
pombredanne opened this issue Jan 21, 2021 · 13 comments · May be fixed by #294
Closed

Discussion: support bundling libmagic #233

pombredanne opened this issue Jan 21, 2021 · 13 comments · May be fixed by #294

Comments

@pombredanne
Copy link

I forked this fine code for a long while at https://github.com/nexB/typecode/blob/8e926684f260ce1cf7ffed74b2da99db97210f13/src/typecode/magic2.py

One of the key change is that I can provide a bundled pre-built binary of use a system-provided binary for libmagic and the magic db, which is not possible here.
For instance:
https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic-linux and https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic_system_provided

I would much prefer to fold that code back here at some point.
Would you be open to have a way to provide a libmagic and db path rather than always use the same heuristics code?

@ahupp
Copy link
Owner

ahupp commented Feb 17, 2021

Thanks for starting this discussion. I think a nice way to handle this is publishing a separate package that exposes the shared lib/data file with package_data, and then make that an optional dependency of python-magic with extra_require. The extra_require is nice because we can bump the versions together, though not strictly necessary. What do you think?

@pombredanne
Copy link
Author

package_data and extra_requires are the way to go indeed. And that can be then either used through a conditional try/except import or a setuptools "entrypoint" point plugin.

This is more or less what we do now:

And I have a build loop otherwise in https://github.com/nexB/scancode-plugins/blob/develop/etc/scripts/fetch-plugins.sh to do the actual footwork of assembling pre-built binaries for all OSes.

So in recap, we can adapt, steal, reuse or not any of the code above for make benefit of the great libmagic!

@kratsg
Copy link

kratsg commented Mar 20, 2022

In the meantime as this discussion didn't seem to go anywhere -- I went and created a pylibmagic package that should provide the appropriate libraries as needed for most mac/linux distros (I need help getting windows supported).

https://pypi.org/project/pylibmagic/

You just need to install and import this before importing magic and there's no change needed in python-magic. All that's really needed is to patch to override the hardcoded libmagic.so.1 that python-magic uses for linux, since this is related to a minor bug in the core python code.

@ahupp
Copy link
Owner

ahupp commented Aug 25, 2023

Merging into #293

@kratsg
Copy link

kratsg commented Aug 26, 2023

Merging into #293

Is this merging appropriate? The merged issue is specifically about Windows, but this issue is not OS-specific.

@ahupp
Copy link
Owner

ahupp commented Aug 28, 2023

@kratsg Basically 100% of the issues with libmagic are on Windows, so my intent was to just solve it there. OSX and linux all have good solutions for this. Of course in principle once this is setup for windows other platforms are straightforward but given Python doesn't have awesome tooling for building+shipping binaries I'd rather keep it limited.

@kratsg
Copy link

kratsg commented Aug 28, 2023

@ahupp that's fair. I've solved it for MacOSX and Linux via https://github.com/kratsg/pylibmagic/ right now. The solution there is that it ships a pre-built binary of file with the package, so if import magic doesn't work, then

import pylibmagic
import magic

will. It does require some monkeypatching of utilities that python-magic depends on, but does so in order to make sure the shared libs are findable.

@ddelange
Copy link
Contributor

the whole idea of python-magic uploading binary (wheel) distribution would be to package the libmagic binary into the wheel (zip).

from the packaging point of view, there should:

  • only only host a source distribution (which will fail to install if libmagic is not on available the system)
  • upload linux/win/mac platform dependent wheels that include libmagic, e.g. using cibuildwheel in github actions with a platform-aware CIBW_BEFORE_ALL=./install_libmagic.sh

full example

@ahupp
Copy link
Owner

ahupp commented Aug 29, 2023 via email

@ddelange
Copy link
Contributor

ddelange commented Aug 30, 2023

I only know of one other package that serves binary distributions (.whl), and still requires the user to additionally install an external binary (which will be dynamically linked / searched for at runtime): https://pypi.org/project/mxnet/

But that's only because of licensing of that one binary, which they would otherwise include in the binary distribution. Wheels are officially only allowed to dynamically link against glibc on the system, anything else needs to be included in the wheel.

Generally, you would:

  • have source distribution (.tar.gz) that will fail to install if binaries are missing that your setup.py tries to dynamically/statically link. Static ones are included in site-packages (setup.py install) or the wheel (setup.py bdist_wheel), dynamic binaries (glibc) are not copied and are assumed to remain available.
  • additionally have binary distributions that contain all (statically linked) binaries needed to use the library.
    • Wheels assume some basic stuff to be present on the system which may be dynamically linked against. For instance, manylinux_2_28 in the wheel filename stands for glibc >= 2.28 (mostly all 2020+ linux distributions like debian 10 buster, ubuntu 20.04 focal, almalinux/rhel 8, ...). When building a wheel under this assumption (example PR), pip will only install this on compatible (new enough) systems.
    • If the libmagic binary you would include in the wheel uses dynamic linking, this is relevant. Otherwise (only statically linked binaries baked into the wheel), you'll most likely end up with a manylinux_2014 wheel, which goes back as far as debian 6 or something.

Do you feel like using outside packages from Debian, homebrew etc is a problem?

So to answer your question:

  • if you have a sdist (.tar.gz) on PyPI, the user is assumed to have installed libmagic from any source, prior to installation. Up to you to decide whether that sdist dynamically links against that libmagic, or copies it over into site-packages.
  • if you have a bdist (.whl) on PyPI, strictly it is only allowed to dynamically link against glibc, and anything else needs to be statically linked and included in the wheel.

@ahupp
Copy link
Owner

ahupp commented Sep 2, 2023

@ddelange I did a random sample of some top-250 packages that are (afaik) source-only and they all distribute a .-py3-none-any.whl:

https://pypi.org/project/typing-extensions/#files
https://pypi.org/project/requests/#files
https://pypi.org/project/wheel/#files

I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?

@ddelange
Copy link
Contributor

ddelange commented Sep 2, 2023

py3-none-any.whl wheels (a wheel is just a zip file with a .whl extension) can run on any python 3.5+ distribution, on win, mac, and nix, regardless of cpu architecture (aarch, x86_64, etc), because they only contain python files and no compiled binaries. These are pure-python libraries. If the code will run on both py2.7 and py3.5+, you can python setup.py bdist_wheel --universal and you'll get a py2.py3-none-any.whl.

Any project that needs compiled binaries (cythonized, rust binaries, c++ backend etc), will publish wheels for a wealth of combinations of python version (minor version specific ABI), OS and CPU architecture, containing pre-compiled binaries that will execute on the target system. See for instance this list of popular python libraries.

I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?

That is correct, when wheels are available on PyPI, pip does not need to execute setup.py, but can copy the python (and binary) files from the wheel straight into site-packages. But as explained above, wheels hosted on PyPI should be self-contained.

So in case of libmagic, strictly speaking you should host an sdist on PyPI, which will detect a missing libmagic on install time by assertion in setup.py (or by copy attempt). If you choose to additionally host bdist (wheels) on PyPI, they should be self-contained, system specific wheels containing precompiled libmagic binaries.

Does that make sense?

@ahupp
Copy link
Owner

ahupp commented Sep 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants