Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License, Windows and Compatibility Upgrade (redux) #19

Merged
merged 17 commits into from Feb 26, 2020

Conversation

neurolabusc
Copy link

This pull request responds to feedback to an earlier pull request.

It adds 3 benefits:

  1. The GPL code for SIMD CRC has been replaced by BSD-license code from the Chrome browser. Performance is identical, as demonstrated in this minimal project. This addresses complications raised with GPL code.
  2. At run-time, the software now detects if the host computer supports CLMUL instructions, using the accelerated CRC if supported. This prevents the software from executing illegal instructions on hardware that does not support CLMUL or where CLMUL has been disabled by a OEM BIOS or virtual machine. The code for detection is designed to be backward compatible with old compilers (e.g. gcc 4.8.4) that report to CMAKE that they support CLMUL instructions but do not provide the cpuid detection of CLMUL.
  3. The CMAKE files have been updated to support modern Microsoft compilers allowing simpler distribution to those using the Windows operating system.

I would be excited to see this pull request included, as it can be called from the pigz make. This dramatically accelerates pigz, with the SIMD CRC directly accelerating the serial stage that is most impacted by Amdahl's law.

The first time the CRC is called it set the atomic int cpu_has_pclmul, which reports whether or not the clmul instructions are available. This ensures the program does not crash if compiled on a modern computer and executed on an old computer. I appreciate different people have different views on this, which I summarize below. I do not have a preference, the pull request I am providing is robust (tested on Ubuntu 14.04 as well as recent versions) and simple. The CRC SIMD code and CPUID is only tested for x86-64, but again this extends the current repository. Future developments my consider alternatives:

Other comments

  • @Adenilson suggested SIMD CRC for ARM but I am unable to validate that platform.
  • The compiler -mpclmul flag is only applied to the crc32_simd unit. In theory, if this flag were applied globally, a compiler could optimize other sections of code to use these instructions, and the resulting code would fail if run on older (<2009) computers.

For most cases (where files are large) testing the CPU for CLMUL support on has negligible performance impact. One can conceive of use cases where an statically compiled executable is called many times to compress tiny files. If one is confident the resulting executable is going to be only run on modern (Westmere or later) CPUs, one can compile with the DSKIP_CPUID_CHECK compiler directive to skip this check for cmake the command is "cmake -DSKIP_CPUID_CHECK=ON ..".

configure Outdated
cat > $test.c << EOF
#include <immintrin.h>
void foo(void) {
_mm_clmulepi64_si128(_mm_set1_epi16(1), _mm_set1_epi16(2), 0);
#include <wmmintrin.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this include is not required AFAIK

crc32.c Outdated
#ifndef _MSC_VER
#include <cpuid.h>
#endif
//#include <stdio.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

crc32.c Outdated
#endif
if ((ecx & crc_bit_PCLMUL) != 0)
cpu_has_pclmul = 1;
//printf("leaf=%d, eax=0x%x, ebx=0x%x, ecx=0x%x, edx=0x%x, cpu_has_pclmul %d\n", leaf, eax, ebx, ecx, edx, cpu_has_pclmul);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

#ifndef SKIP_CPUID_CHECK
if (!has_pclmul())
return crc32_generic(crc, buf, len);
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not aligned with the #ifndef above it

deflate.c Outdated
@@ -545,7 +545,7 @@ int ZEXPORT deflateTune(strm, good_length, max_lazy, nice_length, max_chain)
* upper bound of about 14% expansion does not seem onerous for output buffer
* allocation.
*/
uint64_t ZEXPORT deflateBound(strm, sourceLen)
uLong ZEXPORT deflateBound(strm, sourceLen)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed?

@neurolabusc
Copy link
Author

@vkrasnov thanks for your careful review. I believe my latest commit includes all your suggestions.

@vkrasnov
Copy link

@neurolabusc thank you very much for the contribution!

@vkrasnov vkrasnov merged commit 372bcd1 into cloudflare:gcc.amd64 Feb 26, 2020
@ningfei ningfei mentioned this pull request Nov 18, 2020
fhanau pushed a commit to fhanau/zlib that referenced this pull request Feb 27, 2023
Add a "compression context" to CompressionStream and DecompressionStream
to make the tracking of the internal state explicit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants