Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /vsirar/ virtual file system for RAR compression format #7253

Merged
merged 11 commits into from
Feb 17, 2023

Conversation

e-n-f
Copy link
Contributor

@e-n-f e-n-f commented Feb 16, 2023

What does this PR do?

Adds support for using /vsirar to access RAR-format compressed files via libarchive, in the same manner as the existing /vsi7z

What are related issues/pull requests?

None found through issue search

Tasklist

  • Add test case(s)
  • Add documentation
  • Updated Python API documentation (swig/include/python/docs/)
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

Environment

Provide environment details, if relevant:

  • OS: Tested on MacOS 12.6
  • Compiler: clang

@@ -700,6 +700,9 @@ gdal_check_package(Blosc "Blosc compression" CAN_DISABLE)
define_find_package2(ARCHIVE archive.h archive)
gdal_check_package(ARCHIVE "Multi-format archive and compression library library (used for /vsi7z/" CAN_DISABLE)

define_find_package2(BLAKE2 blake2.h b2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to conditionalized RAR support with b2 dependency ? Looking at the sources of libarchive, it seems to have the blake2 routines if libb2 is not available.

For example on Ubuntu 20.04, libarchive.so.13 doesn't link to libb2, but has blake2 symbols:

$ ldd /lib/x86_64-linux-gnu/libarchive.so.13
	linux-vdso.so.1 (0x00007ffc9e3d4000)
	libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007f226ff68000)
	libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f226ff5d000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f226ff32000)
	libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f226fe89000)
	liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f226fe68000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f226fe55000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f226fe39000)
	libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f226fc7f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f226fa8d000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f226fa68000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f226fa62000)
	libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f226f87c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f226f72d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2270099000)
	libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f226dc6c000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f226da88000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f226da6d000)
even@even-ThinkPad-P15v-Gen-1:~/gdal/gdal/build_cmake$ objdump -T /lib/x86_64-linux-gnu/libarchive.so.13 | grep blak
0000000000093650 g    DF .text	000000000000010c  Base        blake2s_init_key
0000000000093760 g    DF .text	0000000000000128  Base        blake2s_final
00000000000934a0 g    DF .text	0000000000000084  Base        blake2s_init
0000000000093d20 g    DF .text	00000000000001d5  Base        blake2sp_update
0000000000093400 g    DF .text	0000000000000097  Base        blake2s_init_param
0000000000094020 g    DF .text	00000000000002fb  Base        blake2sp
0000000000093890 g    DF .text	00000000000000f4  Base        blake2s
0000000000093f00 g    DF .text	000000000000011d  Base        blake2sp_final
0000000000093530 g    DF .text	0000000000000117  Base        blake2s_update
0000000000093b80 g    DF .text	000000000000019c  Base        blake2sp_init_key
0000000000093aa0 g    DF .text	00000000000000de  Base        blake2sp_init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't work on my Mac (with libarchive explicitly installed through brew and libb2 pulled in as an implicit dependency) without linking GDAL against libb2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's super weird, unless you statically link against libarchive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not doing anything unusual with linkage, just installing libarchive through my system's usual package manager and building with cmake ..; make.

But you're right, it is finding /opt/homebrew/opt/libarchive/lib/libarchive.a, not /opt/homebrew/opt/libarchive/lib/libarchive.13.dylib. I'm not sure why but will try to figure it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then I guess that all HAVE_BLAKE2 conditionals can be removed

Copy link
Contributor

@dg0yt dg0yt Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/opt/homebrew/opt/libarchive/lib/libarchive.a, not /opt/homebrew/opt/libarchive/lib/libarchive.13.dylib

Does libarchive.13.dylib qualify for find_library(.... NAMES archive), or does it need a filename without the version number?
I think you could pass a cmake input variable for the library filepath.

For libarchive, there is an official CMake find module, but it can't handle static link libraries. libarchive also provides a pkg-config file, but it is also incomplete for static linkage without extra patching.

In vcpkg, everything is ready on the libarchive side. But I didn't yet attempt to integrate libarchive in the gdal port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I didn't yet attempt to integrate libarchive in the gdal port.

@dg0yt libarchive dependency is in GDAL master only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, and my problem was that I had specified the static library earlier when I was trying to get it to work and it was cached in CMakeCache.txt. It works correctly without all the BLAKE2 conditionals when I configure with

cmake -DARCHIVE_INCLUDE_DIR=/opt/homebrew/opt/libarchive/include -DARCHIVE_LIBRARY=/opt/homebrew/opt/libarchive/lib/libarchive.dylib -DGDAL_USE_ARC=ON ..

so I will simplify this PR by removing all of the checks for BLAKE2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is not released yet... I could start working on it anyways, as time permits, and see what can be done in gdal for good integration.

port/cpl_vsil_libarchive.cpp Outdated Show resolved Hide resolved
port/cpl_vsil_libarchive.cpp Outdated Show resolved Hide resolved
@@ -441,7 +477,7 @@ class VSILibArchiveFilesystemHandler final : public VSIArchiveFilesystemHandler
}
virtual std::vector<CPLString> GetExtensions() override
{
return {".7z", ".lpk", ".lpkx", ".mpk", ".mpkx"};
return {".7z", ".lpk", ".lpkx", ".mpk", ".mpkx", ".rar"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the returned extensions should probably depend on the prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll make that change.

@@ -529,4 +576,27 @@ void VSIInstall7zFileHandler(void)
"/vsi7z/", new VSILibArchiveFilesystemHandler("/vsi7z"));
}

#ifdef HAVE_BLAKE2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the #ifdef HAVE_BLAKE2 is really needed, it should be moved inside the function body, so that the function is always defined in the HAVE_LIBARCHIVE branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was trying to follow the pattern of the #ifdefs for VSIInstall7zFileHandler but will change it to a single function body instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't perhaps clear. You need 2 bodies. One at the top of the file in the #ifndef HAVE_LIBARCHIVE branch (without a #ifdef HAVE_BLAKE2) condition.
And the one at the bottom of the file, which is now fine with your modifications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I hope this is correct now, with everything controlled just by the #ifndef HAVE_LIBARCHIVE.

@rouault
Copy link
Member

rouault commented Feb 16, 2023

A autotest/gcore/vsirar.py file would be needed (strong inspiration can be taken from autotest/gcore/vsi7z.py)
As well as doc in doc/source/user/virtual_file_systems.rst

@e-n-f
Copy link
Contributor Author

e-n-f commented Feb 16, 2023

Thanks, I'll add the test and documentation.

@rouault
Copy link
Member

rouault commented Feb 16, 2023

We use clang-format rules. Cf https://gdal.org/development/dev_practices.html#commit-hooks

@e-n-f
Copy link
Contributor Author

e-n-f commented Feb 16, 2023

Thanks, just saw the failing lint test and pushed a commit to fix.

.. _vsirar:

/vsirar/ (.rar archives)
----------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
----------------------
------------------------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just fixed it. I am having trouble getting the lint to run locally so I was relying on seeing the test output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try setting up pre-commit ( https://gdal.org/development/dev_practices.html#commit-hooks) ? Once done, all reformatting is automatically done at git commit time.
Just after initial installation, to fix possibly wrongly formatted files in your branch: "pre-commit run --all"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but there is something wrong with my python environment that is keeping the python code from running. I will try to figure it out before I bother you with more commits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps create a dedicated virtualenv ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still doesn't work for me, even with a brand-new virtual environment. The error log is https://gist.github.com/e-n-f/394bf3c1b2e52aaf27c7bd708f23e476. I'll keep trying to figure it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this modification to .pre-commit-config.yaml worked for me, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump to isort 5.12.0 just committed in master per f44f38e

@rouault rouault added this to the 3.7.0 milestone Feb 17, 2023
@rouault rouault merged commit ce559c3 into OSGeo:master Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants