Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License, Windows and Compatibility Upgrade #18

Closed
wants to merge 12 commits into from

Conversation

neurolabusc
Copy link

This pull request adds 3 benefits:

  1. The GPL code for SIMD CRC has been replaced by BSD-license code from the Chrome browser. Performance is identical, as demonstrated in this minimal project. This addresses complications raised with GPL code.
  2. At run-time, the software now detects if the host computer supports CLMUL instructions, using the accelerated CRC if supported. This prevents the software from executing illegal instructions on hardware that does not support CLMUL or where CLMUL has been disabled by a OEM BIOS or virtual machine. The code for detection is designed to be backward compatible with old compilers (e.g. gcc 4.8.4) that report to CMAKE that they support CLMUL instructions but do not provide the cpuid detection of CLMUL.
  3. The CMAKE files have been updated to support modern Microsoft compilers allowing simpler distribution to those using the Windows operating system.

A benchmark demonstrates that this code is robust and outperforms the baseline and zlib-ng on a wide range of hardware and operating systems.

I would be excited to see this pull request included, as it can be called from the pigz make. This dramatically accelerates pigz, with the SIMD CRC directly accelerating the serial stage that is most impacted by Amdahl's law.

crc32.c Outdated
#ifdef HAS_PCLMUL

//https://github.com/webosose/chromium68/blob/master/src/third_party/zlib/crc32_simd.c

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of pointing to an outdated copy of Chromium (M68), you could reference the ToT (Top of Tree) original repository Chromium repo*?

Since M68, I've added some new optimizations, refactorings and bug fixes on Chromium's zlib.
:-)

*https://cs.chromium.org/chromium/src/third_party/zlib/

@@ -21,6 +21,13 @@
DYNAMIC_CRC_TABLE and MAKECRCH can be #defined to write out crc32.h.
*/

#ifdef HAS_PCLMUL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you try to follow the integration method used in Chromium's zlib i.e. keeping SIMD/optimizations in separated files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Adenilson I have done so in my fork. Can I suggest I withdraw this pull request, and you either comment on my fork or make your own fork. Once all your concerns are addressed, we can generate a new pull request to the CloudFlare repo. I have tested the separate files on both MacOS and Linux, so I think this addresses your concern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Adenilson I should also point out that my repo copies the Chromium zlib code verbatim, and still includes the ARM code, even though I only tested and included the x86 code. I think the whole community would benefit if the ARM code was enhanced, but that is outside my wheelhouse.

crc32.c Outdated
#ifdef HAS_PCLMUL
#include <emmintrin.h>
#include <smmintrin.h>
#include <wmmintrin.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of those can be replaced by a single #include <immintrin.h>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defines no longer in crc32, @Adenilson can decide if these can be replaced in arc_simd.c

crc32.c Outdated
#ifdef _MSC_VER
#define zalign(x) __declspec(align(x))
#else
#define zalign(x) __attribute__((aligned((x))))
#endif

uLong crc32_simd(unsigned char const *buf, size_t len, uInt crc) {
uint crc32_simd(unsigned char const *buf, size_t len, uInt crc) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use uint32_t everywhere the size is known to be 32 bits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define PCLMUL_MIN_LEN 64
#define PCLMUL_ALIGN 16
#define PCLMUL_ALIGN_MASK 15

int cpu_has_pclmul = -1; //global: will be 0 or 1 after first test

int has_pclmul(void) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this perhaps be called at init time with __attribute__((constructor)) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkrasnov I think my solution is robust and efficient and works on all the compilers I tested it on (gcc 4.8..9.2; Clang 10, Microsoft C compiler). No other location in this library uses the __attribute__((constructor)) style, and it is unclear if it is universally accepted. This is your code, which you maintain, so ultimately should be familiar to you. Since that style is unfamiliar to me, I would be grateful if you or @Adenilson could devise the implementation you are happy with.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about __attribute__((constructor)), e.g. Rust intentionally avoids life before main, so I'm not sure how that would work if this library was wrapped as a Rust crate.

However, unsychronized access to a global variable is Undefined Behavior if code runs on multiple threads, and we definitely will use it in multi-threaded programs. So at very least it should be marked as atomic. I know x86 memory model is forgiving about such things, but things that are UB in the compiler, but happen to work in practice are a thin ice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski -I have tried to address this in my fork. The value is atomic. There is also no a compiler flag SKIP_CPUID_CHECK that allows one to assume that the CPU supports PCLMUL. One option would be to make the default behavior be to assume PCLMUL support. This would my commit would have the same compatibility as the current code, simply using the Chromium BSD-licensed SIMD CRC in place of the GPL code. This would eliminate @Adenilson's performance concerns, albeit just like the current code it will fail if the code is compiled on a system that supports PCLMUL but is run on a computer without support.

@Adenilson
Copy link

@neurolabusc A question: how are you benchmarking? And with which data corpus?

@neurolabusc
Copy link
Author

@Adenilson here is my benchmark for zlib and pigz. I explicitly test NIfTI-format images that are popular in my field. Tools like AFNI and FSL will save gzip-compressed images at each stage of processing, and individual files are often ~2Gb each (e.g. multi-band fMRI session).

I understand that @vkrasnov's original use case was a huge number of very small files. In my case, the penalty for a single cpuid call is negligible, while it may be more of an issue for other cases. One option is to simply have a compile flag to determine if the cpuid should be checked. People like @vkrasnov who build for modern facilities can compile to without supporting older processors (just like previous versions of this repository), while scientists may want to compile a version that does not crash if run on old hardware. I am open for any suggestions.

@neurolabusc
Copy link
Author

@Adenilson are you interested in implementing your changes to the CloudFlare zlib? Your changes cover the same ground as mine (remove GPL, aid MSVC updates), but is superior to mine with ARM support. Also, you provide a CPU detection that you suggest might be superior to mine. I am reluctant to put your changes into my fork, as I am not familiar with ARM.

If you will devote time to this, I will withdraw my PR. Once your PR is included, I can work to ensure the cmake supports MSVC.

On the other hand, if you are not interested in this, I would like to go ahead and submit a revised version of my own CloudFlare pull request. While it does not include all the features of yours, it is still superior in every respect to the current CloudFlare implementation.

Just tell me how to proceed. I can completely understand if you do not want to devote time to this. From my perspective, CloudFlare zlib includes benefits beyond the enhanced CRC discussed here, both for serial as well as parallel compression.

@Adenilson
Copy link

@neurolabusc I'm afraid I lack the bandwidth to dedicate to upstreaming changes to Cloudflare fork.

Maintaining the zlib used by Chrome and Android and trying to upstream changes to Canonical zlib is already a bit taxing.

vkrasnov pushed a commit that referenced this pull request Feb 26, 2020
* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc

* Westmere detection

* Update configure for Westmere (InsightSoftwareConsortium/ITK#416)

* use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)

* remove gpl code

* Improve support for compiling using Windows (https://github.com/ningfei/zlib)

* Import ucm.cmake from https://github.com/ningfei/zlib

* crc32_simd as separate file (#18)

* atomic and SKIP_CPUID_CHECK (#18)

* Remove unused code

* Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)

* Atomic does not compile on Ubuntu 14.04

* Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.

* Restore Windows compilation using "nmake -f win32\Makefile.msc"

* zconf.h required for Windows nmake

* support crc intrinsics for ARM CPUs

* Requested changes (#19)
vkrasnov pushed a commit that referenced this pull request Dec 2, 2020
* Fix architecture macro on MSVC.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Update '.gitignore'.

Also remove 'zconf.h', it will be generated during building.

* Add 'build' subdir to .gitignore

* Fix architecture macro on MSVC.

* Update CMakeLists.txt

Clean up.
Add option to set runtime (useful for MSVC).
Add 'SSE4.2' and 'AVX' option.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt.

Also remove checking fseeko.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Set 'CMAKE_MACOSX_RPATH' to TRUE.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Update SSE4.2 and PCLMUL setting in CMakeLists.txt

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Add 'HAVE_HIDDEN' to CMakeLists.txt

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Fix building dll on MSVC.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Fix zlib.pc file generation.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Refine cmake settings.

* Change cmake function to lowercase.

* Fix zlib.pc

* Fix crc32_pclmul_le_16 type.

* Set POSITION_INDEPENDENT_CODE flag to ON by default.

* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc

* Westmere detection

* Update configure for Westmere (InsightSoftwareConsortium/ITK#416)

* use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)

* remove gpl code

* Improve support for compiling using Windows (https://github.com/ningfei/zlib)

* Import ucm.cmake from https://github.com/ningfei/zlib

* crc32_simd as separate file (#18)

* atomic and SKIP_CPUID_CHECK (#18)

* Suppress some MSVC warnings.

* Remove unused code

* Fix ucm_set_runtime when only C is enabled for the project.

* Removed configured header file.

* Unify zconf.h template.

* Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)

* Atomic does not compile on Ubuntu 14.04

* Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.

* Restore Windows compilation using "nmake -f win32\Makefile.msc"

* Do not set SOVERSION on Cygwin.

* Fix file permission.

* Refine PCLMUL CMake option.

* zconf.h required for Windows nmake

* Do not set visibility flag on Cygwin.

* Fix compiling dll resource.

* Fix compiling using MinGW.

SSE4.2 and PCLMUL are also supported.

* Fix zconf.h for Windows.

* support crc intrinsics for ARM CPUs

* Add gzip -k option to minigzip (to aid benchmarks)

* Support Intel and ARMv8 optimization in CMake.

* Clean up.

* Fix compiling on Apple M1.

check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling
fails. Workaround fix, need further investigation.

* Restore MSVC warnings.

Co-authored-by: neurolabusc <rorden@sc.edu>
Co-authored-by: Chris Rorden <rorden@mailbox.sc.edu>
fhanau pushed a commit to fhanau/zlib that referenced this pull request Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants