Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for libblosc2 with pkg-config if not in the python blosc2 installation #1000

Merged
merged 8 commits into from Mar 8, 2023

Conversation

bnavigator
Copy link
Contributor

This is a bit different than #983, but works for my packaging efforts on openSUSE. I thought I should share. Feel free to use it in #983 or modify.

@bnavigator bnavigator changed the title Search the libblosc2 with pkg-build if not in the python blosc2 installation Search for libblosc2 with pkg-build if not in the python blosc2 installation Feb 19, 2023
@bnavigator bnavigator changed the title Search for libblosc2 with pkg-build if not in the python blosc2 installation Search for libblosc2 with pkg-config if not in the python blosc2 installation Feb 19, 2023
@avalentino
Copy link
Member

@bnavigator thanks for submitting this PR.
There seems to be a failure in the CI tests, could you please give a look?
It seems to be unrelated but it would be nice to have it fixed.

@avalentino avalentino added this to the 3.8.1 milestone Feb 20, 2023
@FrancescAlted
Copy link
Member

@bnavigator If I read this correctly, this would use the system blosc2 library in case this is not found; could you develop more on why this could happen when python-blosc2 is a requirement? If for some reason you want to enforce this, wouldn't be better to use an environment variable (let's say PYTABLES_ENFORCE_SYSTEM_BLOSC2 or similar)?

@bnavigator
Copy link
Contributor Author

I made it because like @avalentino pointed out in #983 linux distributions like to separate libraries and python modules. The blosc2 wheel on PyPI ships libblosc2.so.2, but we at openSUSE as well as other linux distributions would like to provide a python-blosc2 package without libblosc2.so.2. Thus, our RECORD does not have the necessary entry despite the python module blosc2 along wiht blosc2-%{version}.dist-info existst in the platlib. Instead, a separate blosc2-devel installed from the c-blosc2 sources provides the headers and pkgconfig configuration.

I made this before I have seen #983, and #983 seems to inlcude a lot more stuff which is not yet merged into the main branch.
I was not aware that you do not really need the blosc2 python module.

@FrancescAlted
Copy link
Member

Ah, that clarifies your intent indeed. Just to confirm, python-blosc2 is not meant as a strict requirement, but as a way to be able to fulfill dependencies and publish our wheels. After figuring out what you are after, I suggest to first try to use the system c-blosc2 and then fallback to use the python-blosc2 wheel.

@avalentino
Copy link
Member

Hi @bnavigator, builds on windows are still failing due to a blosc2 relate issue.

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I have just found a small issue in the minimum blosc2 required.

tables/req_versions.py Outdated Show resolved Hide resolved
@avalentino avalentino merged commit ce7ebab into PyTables:master Mar 8, 2023
@avalentino
Copy link
Member

Thanks @bnavigator

blosc2_found = False
blosc2_search_paths = [blosc2_lib_hardcoded,
os.path.join(current_dir, blosc2_lib_hardcoded),
find_library("blosc2")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few problems:

  1. find_library() returns None if it doesn't find the library, and cdll.LoadLibrary(None) doesn't DTRT:
>>> ctypes.util.find_library('missing.so') is None
True
>>> ctypes.cdll.LoadLibrary(None)
<CDLL 'None', handle 7f317dd372c0 at 0x7f316f9e7f10>

That loop needs an additional if blosc2_lib check.

  1. Loading of library from $PWD creates the obvious security issue where an attacker can place a file in $PWD and cause the user to load it. Loading from $PWD can be done in a test environment, but MUST NOT be done in production code.

  2. find_library() doesn't seem to fit the bill. Quoting the docs: "The purpose of the find_library() function is to locate a library in a way similar to what the compiler or runtime loader does (on platforms with several versions of a shared library the most recent should be loaded)". It would find the newest version of the library, which might or might not be what is needed. The whole point of the SONAME field (i.e. the thing that gives the .2 suffix) is to refer to a specific version from code using the library, so that when a future incompatible version appears, existing code is not broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://src.fedoraproject.org/rpms/python-tables/blob/rawhide/f/0007-Drop-misguided-check.patch
This is the patch I pushed in Fedora. It ignores non-Linux systems, so it's not suitable for inclusion upstream.

@avalentino
Copy link
Member

Dear @keszybz, I think that the purpose of that code is just to raise an early error in case any kind of block2 library is available in the system.

  1. do you think that the following patch could fix the first point?
 for blosc2_lib in blosc2_search_paths:
     try:
-        cdll.LoadLibrary(blosc2_lib)
-        blosc2_found = True
-        break
+        if blosc2_lib and cdll.LoadLibrary(blosc2_lib) is not None:
+            blosc2_found = True
+            break
     except OSError:
         pass
  1. please not that current_dir is not $PWD, it is current_dir = os.path.dirname(__file__), I think that it should be safe enough also in production. Do you agree?
  2. I think hat the loading mechanism for dynamic libraries takes care of checking the correct library version. E.g., in my local build:
ldd build/lib.linux-x86_64-cpython-311/tables/hdf5extension.cpython-311-x86_64-linux
-gnu.so | grep blosc
      libblosc.so.1 => /home/antonio/anaconda3/envs/p311/lib/libblosc.so.1 (0x00007f56fc367000)
      libblosc2.so.2 => /home/antonio/anaconda3/envs/p311/lib/libblosc2.so.2 (0x00007f56fc2a7000)

As I said we use ctypes only to for a "coarse" check and to raise an early warning in the case blosc2 is totally missing in the system.

Please let me know if the patch proposed at point 1. is OK for you, and I will commit it.

@keszybz
Copy link
Contributor

keszybz commented Aug 16, 2023

Hi, sorry for my snarky tone earlier. There was no good reason for it.

Dear @keszybz, I think that the purpose of that code is just to raise an early error in case any kind of block2 library is available in the system.

Yes. This check is not very useful on systems with package management, but it makes sense in other scenarios.

1. do you think that the following patch could fix the first point?
 for blosc2_lib in blosc2_search_paths:
     try:
-        cdll.LoadLibrary(blosc2_lib)
-        blosc2_found = True
-        break
+        if blosc2_lib and cdll.LoadLibrary(blosc2_lib) is not None:
+            blosc2_found = True
+            break
     except OSError:
         pass

I think this is better:

for blosc2_lib in blosc2_search_paths:
      if blosc2_lib:
           try:
               cdll.LoadLibrary(blosc2_lib)
           except OSError:
                pass
           break
else:
     raise RuntimeError(…)
2. please not that `current_dir` is not `$PWD`, it is `current_dir = os.path.dirname(__file__)`, I think that it should be safe enough also in production. Do you agree?

Yep, my bad. There is no security issue.

3. I think hat the loading mechanism for dynamic libraries takes care of checking the correct library version. E.g., in my local build:
ldd build/lib.linux-x86_64-cpython-311/tables/hdf5extension.cpython-311-x86_64-linux
-gnu.so | grep blosc
      libblosc.so.1 => /home/antonio/anaconda3/envs/p311/lib/libblosc.so.1 (0x00007f56fc367000)
      libblosc2.so.2 => /home/antonio/anaconda3/envs/p311/lib/libblosc2.so.2 (0x00007f56fc2a7000)

Let's say that we have libblosc2.so.3, with the symlink libblosc2.solibblosc2.so.3. Since the SONAME version was bumped, presumably version 3 broke compatiblity with version 2. We still want to load version 2, but find_library() will return libblosc2.so.3.
Either loading version 3 instead of 2 or loading both versions into the same process is likely to cause issues.

As I said we use ctypes only to for a "coarse" check and to raise an early warning in the case blosc2 is totally missing in the system.

Please let me know if the patch proposed at point 1. is OK for you, and I will commit it.

I don't think it quite fixes the issue, because of the problem with find_library().

@avalentino
Copy link
Member

Hi, sorry for my snarky tone earlier. There was no good reason for it.

no problem

I think this is better:

for blosc2_lib in blosc2_search_paths:
      if blosc2_lib:
           try:
               cdll.LoadLibrary(blosc2_lib)
           except OSError:
                pass
           break
else:
     raise RuntimeError(…)

OK, but we still need to check the case in which cdll.LoadLibrary(blosc2_lib) returns None, correct?

ldd build/lib.linux-x86_64-cpython-311/tables/hdf5extension.cpython-311-x86_64-linux
-gnu.so | grep blosc
      libblosc.so.1 => /home/antonio/anaconda3/envs/p311/lib/libblosc.so.1 (0x00007f56fc367000)
      libblosc2.so.2 => /home/antonio/anaconda3/envs/p311/lib/libblosc2.so.2 (0x00007f56fc2a7000)

Let's say that we have libblosc2.so.3, with the symlink libblosc2.solibblosc2.so.3. Since the SONAME version was bumped, presumably version 3 broke compatiblity with version 2. We still want to load version 2, but find_library() will return libblosc2.so.3. Either loading version 3 instead of 2 or loading both versions into the same process is likely to cause issues.

As I said we use ctypes only to for a "coarse" check and to raise an early warning in the case blosc2 is totally missing in the system.
Please let me know if the patch proposed at point 1. is OK for you, and I will commit it.

I don't think it quite fixes the issue, because of the problem with find_library().

My point is that in the worst case we could have a misdetection of the correct version of the blosc2 library. This should result in an error at a later stage then tables/hdf5extension.*.so will try to load libblosc2.so.2.
The early error detection will not work in this case but, according to my understanding, still if the correct version of libblosc2 exists it will be loaded despite of the presence of newer ones.
Is it correct?

By the way I agree with you that it makes totally sense to drop totally the ctypes based check on system having a proper packaging system

@keszybz
Copy link
Contributor

keszybz commented Aug 17, 2023

OK, but we still need to check the case in which cdll.LoadLibrary(blosc2_lib) returns None, correct?

cdll.LoadLibrary seems to throw OSError when it can't load the library, which is already covered.

The early error detection will not work in this case but, according to my understanding, still if the correct version of libblosc2 exists it will be loaded despite of the presence of newer ones.
Is it correct?

I'm not sure. blosc2 doesn't use versioned symbols, so only one version of each symbol (function) can be loaded into the global namespace. So when the library is loaded in the normal fashion (with RTLD_GLOBAL), which is what would happen when the linker loads hdf5extension.*.so, having a different version loaded earlier would either cause the second load not to happen (if all symbols are already resolved), or possibly break it (if a mix of old and new symbols are used). But I'm not sure what exactly happens when ctypes.cdll.LoadLibrary is called. Maybe it doesn't do the equivalent of RTLD_GLOBAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants