main
Name already in use
Commits on Aug 26, 2022
-
-
Inflate: Use zero-latency refill strategy
This requires refilling while there are still enough bits in the buffer for the next table lookup. The next table lookup can then use the old bit-buffer and then shift the refilled bit-buffer, completely removing the refill from the critical path.
Commits on Aug 24, 2022
Commits on Aug 21, 2022
Commits on Aug 20, 2022
Commits on Aug 19, 2022
-
Inflate: Remove some AArch64-specific code
This is a ~0.3% regression on Apple M1 and a ~2% gain on Intel. Not a huge deal either way, so the simplicity is nice to have. This is possible by still using empty inline-assembly blocks to hint the desired order of operations to the compiler, and because the commit 'Inflate: rearrange shifts and ignore high bits of "bits"' encourages the compiler to use 32-bit loads anyway. Removing the "asm volatile" hints causes a 12% slowdown on M1.
-
Add adler32 implementation using ARMv8.2-A dotprod extension (~1% spe…
…edup) This extension is enabled by default from ARMv8.4-A. This commit only enables it on macOS. This implementation runs at ~60GB/s on an Apple M1, but the existing NEON implementation is already very fast, so only a ~1% gain shows in inflate benchmarks.
-
-
-
Inflate: preload for next iteration before chunkcopy (~6% speedup)
This overlaps huffman latency with any branch-mispredict latency in the chunkcopy.
-
Inflate: use one shift for both huffman code and extra bits (~5.5% sp…
…eedup) This changes the "code" structure so that "bits" contains the total number of bits, and "op & 15" contains the non-extra bit count. The inffast*.c implementations save the unshifted bit-buffer and extract the extra bits from that, whereas inflate.c simply converts the values back to the old format.
-
Commits on May 26, 2022
-
Fix a bug that can crash deflate on some input when using Z_FIXED. (c…
…loudflare#29) Port of the patch for CVE-2018-25032 from madler@5c44459 This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! The bug was introduced in zlib 1.2.2.2, with the addition of the Z_FIXED option. That option forces the use of fixed Huffman codes. For rare inputs with a large number of distant matches, the pending buffer into which the compressed data is written can overwrite the distance symbol table which it overlays. That results in corrupted output due to invalid distances, and can result in out-of-bound accesses, crashing the application. The fix here combines the distance buffer and literal/length buffers into a single symbol buffer. Now three bytes of pending buffer space are opened up for each literal or length/distance pair consumed, instead of the previous two bytes. This assures that the pending buffer cannot overwrite the symbol table, since the maximum fixed code compressed length/distance is 31 bits, and since there are four bytes of pending space for every three bytes of symbol space. Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
Commits on Mar 11, 2021
-
Handling undefined behavior in inffast_chunk
This ports a21a4e8 - "Handling undefined behavior in inffast_chunk" from the Chromium fork of zlib.
Commits on Dec 2, 2020
-
* Fix architecture macro on MSVC. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Update '.gitignore'. Also remove 'zconf.h', it will be generated during building. * Add 'build' subdir to .gitignore * Fix architecture macro on MSVC. * Update CMakeLists.txt Clean up. Add option to set runtime (useful for MSVC). Add 'SSE4.2' and 'AVX' option. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt. Also remove checking fseeko. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Set 'CMAKE_MACOSX_RPATH' to TRUE. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Update SSE4.2 and PCLMUL setting in CMakeLists.txt Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Add 'HAVE_HIDDEN' to CMakeLists.txt Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Fix building dll on MSVC. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Fix zlib.pc file generation. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Refine cmake settings. * Change cmake function to lowercase. * Fix zlib.pc * Fix crc32_pclmul_le_16 type. * Set POSITION_INDEPENDENT_CODE flag to ON by default. * Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (cloudflare#18) * atomic and SKIP_CPUID_CHECK (cloudflare#18) * Suppress some MSVC warnings. * Remove unused code * Fix ucm_set_runtime when only C is enabled for the project. * Removed configured header file. * Unify zconf.h template. * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * Do not set SOVERSION on Cygwin. * Fix file permission. * Refine PCLMUL CMake option. * zconf.h required for Windows nmake * Do not set visibility flag on Cygwin. * Fix compiling dll resource. * Fix compiling using MinGW. SSE4.2 and PCLMUL are also supported. * Fix zconf.h for Windows. * support crc intrinsics for ARM CPUs * Add gzip -k option to minigzip (to aid benchmarks) * Support Intel and ARMv8 optimization in CMake. * Clean up. * Fix compiling on Apple M1. check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling fails. Workaround fix, need further investigation. * Restore MSVC warnings. Co-authored-by: neurolabusc <rorden@sc.edu> Co-authored-by: Chris Rorden <rorden@mailbox.sc.edu>
-
Create make.yml (cloudflare#27)
Vlad Krasnov committedDec 2, 2020 -
Create cmake.yml (cloudflare#26)
Vlad Krasnov committedDec 2, 2020
Commits on Sep 29, 2020
-
Fix typo in configure file (cloudflare#24)
Vlad Krasnov committedSep 29, 2020
Commits on Sep 28, 2020
-
Port Intel optimizations (adler32, chunkcopy) to cloudflare (cloudfla…
…re#23) * Add SIMD SSSE3 implementation of the adler32 checksum Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib. 17bbb3d73c84 ("zlib adler_simd.c") Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Port inflate chunk SIMD SSE2 improvements for cloudflare Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon (noel@chromium.org) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The improvement in inflate performance is around 15-35%, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
Commits on Sep 23, 2020
-
Port ARM inflate performance improvement patches (chunk SIMD, read64l…
…e) (cloudflare#22) * When windowBits is zero, the size of the sliding window comes from the zlib header. The allowed values of the four-bit field are 0..7, but when windowBits is zero, values greater than 7 are permitted and acted upon, resulting in large, mostly unused memory allocations. This fix rejects such invalid zlib headers. * Add option to not compute or check check values. The undocumented (except in these commit comments) function inflateValidate(strm, check) can be called after an inflateInit(), inflateInit2(), or inflateReset2() with check equal to zero to turn off the check value (CRC-32 or Adler-32) computation and comparison. Calling with check not equal to zero turns checking back on. This should only be called immediately after the init or reset function. inflateReset() does not change the state, so a previous inflateValidate() setting will remain in effect. This also turns off validation of the gzip header CRC when present. This should only be used when a zlib or gzip stream has already been checked, and repeated decompressions of the same stream no longer need to be validated. * This verifies that the state has been initialized, that it is the expected type of state, deflate or inflate, and that at least the first several bytes of the internal state have not been clobbered. * Use macros to represent magic numbers This combines two patches which help in improving the readability and maintainability of the code by making magic numbers into #defines. Based on Chris Blume's (cblume@chromium) patches for zlib chromium: 8888511 - "Zlib: Use defines for inffast" b9c1566 - "Share inffast names in zlib" These patches are needed when introducing chunk SIMD NEON enchancements. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Port inflate chunk SIMD NEON patches for cloudflare Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon (noel@chromium.org) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The two patches combined provide around 5-25% increase in inflate performance, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Increase inflate speed: read decode input into a uint64_t Update the chunk-copy code with a wide input data reader, which consumes input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the wide reader. Based on Noel Gordon's (noel@chromium.org) patch for the zlib chromium fork 8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t" This patch provides 7-10% inflate performance improvement when tested with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
Commits on Aug 20, 2020
-
-
Add SIMD NEON implementation of the adler32 checksum. Inflate speed increased by ~10% with the Silesia corpus for ARMv8. Tested with a modified zpipe.c to run inflate, deflate for a stream of size 100M. Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib. 17bbb3d73c84 ("zlib adler_simd.c")
Commits on Feb 26, 2020
-
License, Windows and Compatibility Upgrade (redux) (cloudflare#19)
* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (cloudflare#18) * atomic and SKIP_CPUID_CHECK (cloudflare#18) * Remove unused code * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * zconf.h required for Windows nmake * support crc intrinsics for ARM CPUs * Requested changes (cloudflare#19)
Commits on Apr 3, 2019
-
zconf.h is generated by configure; it should not be in git. (cloudfla…
…re#16) configure mostly generates this file by copying from zconf.h.in.
Commits on Apr 1, 2019
-
Don't disable target optimizations when cross-compiling. (cloudflare#15)
* PS-237 Don't disable target optimizations when cross-compiling. AFAICT there's no reason to check the host platform here. The test only runs the compiler; it doesn't execute the output. * Add misc intermediate files to .gitignore.
Commits on May 31, 2018
Commits on May 30, 2018
Commits on May 27, 2018
-
Fix compile of crc32-pclmul_asm on macOS (cloudflare#8)
* .type and .size are ELF/COFF specific so drop them * .globl + .hidden equivalent for macOS is .private_extern * symbol name are not mangled on macOS, so we need to prefix _
-
Correctly check architecture on FreeBSD (cloudflare#10)
On FreeBSD amd64 is named... 'amd64', not 'x86_64'. This is what 'uname -m' prints.
Commits on Mar 18, 2018
-
faster crc32 of last <8 bytes on aarch64 (cloudflare#9)
* faster crc32 of last <8 bytes on aarch64
Commits on Nov 7, 2017
-
Merge pull request cloudflare#7 from cloudflare/vlad/aarch64
Support for aarch64 with crc extension