Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Aug 26, 2022

  1. Inflate: Use zero-latency refill strategy

    This requires refilling while there are still enough bits in the
    buffer for the next table lookup. The next table lookup can then
    use the old bit-buffer and then shift the refilled bit-buffer,
    completely removing the refill from the critical path.
    dougallj committed Aug 26, 2022

Commits on Aug 24, 2022

Commits on Aug 21, 2022

Commits on Aug 20, 2022

  1. Fix INFLATE_FAST_MIN_OUTPUT

    dougallj committed Aug 20, 2022

Commits on Aug 19, 2022

  1. Inflate: Remove some AArch64-specific code

    This is a ~0.3% regression on Apple M1 and a ~2% gain on Intel. Not a
    huge deal either way, so the simplicity is nice to have.
    
    This is possible by still using empty inline-assembly blocks to hint
    the desired order of operations to the compiler, and because the
    commit 'Inflate: rearrange shifts and ignore high bits of "bits"'
    encourages the compiler to use 32-bit loads anyway.
    
    Removing the "asm volatile" hints causes a 12% slowdown on M1.
    dougallj committed Aug 19, 2022
  2. Add adler32 implementation using ARMv8.2-A dotprod extension (~1% spe…

    …edup)
    
    This extension is enabled by default from ARMv8.4-A. This commit
    only enables it on macOS.
    
    This implementation runs at ~60GB/s on an Apple M1, but the existing
    NEON implementation is already very fast, so only a ~1% gain shows
    in inflate benchmarks.
    dougallj committed Aug 19, 2022
  3. Inflate: preload for next iteration before chunkcopy (~6% speedup)

    This overlaps huffman latency with any branch-mispredict latency in
    the chunkcopy.
    dougallj committed Aug 19, 2022
  4. Inflate: use one shift for both huffman code and extra bits (~5.5% sp…

    …eedup)
    
    This changes the "code" structure so that "bits" contains the total
    number of bits, and "op & 15" contains the non-extra bit count.
    
    The inffast*.c implementations save the unshifted bit-buffer and
    extract the extra bits from that, whereas inflate.c simply converts
    the values back to the old format.
    dougallj committed Aug 19, 2022

Commits on May 26, 2022

  1. Fix a bug that can crash deflate on some input when using Z_FIXED. (c…

    …loudflare#29)
    
    Port of the patch for CVE-2018-25032 from
    madler@5c44459
    
    This bug was reported by Danilo Ramos of Eideticom, Inc. It has
    lain in wait 13 years before being found! The bug was introduced
    in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
    option forces the use of fixed Huffman codes. For rare inputs with
    a large number of distant matches, the pending buffer into which
    the compressed data is written can overwrite the distance symbol
    table which it overlays. That results in corrupted output due to
    invalid distances, and can result in out-of-bound accesses,
    crashing the application.
    
    The fix here combines the distance buffer and literal/length
    buffers into a single symbol buffer. Now three bytes of pending
    buffer space are opened up for each literal or length/distance
    pair consumed, instead of the previous two bytes. This assures
    that the pending buffer cannot overwrite the symbol table, since
    the maximum fixed code compressed length/distance is 31 bits, and
    since there are four bytes of pending space for every three bytes
    of symbol space.
    
    Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
    LloydW93 and madler committed May 26, 2022

Commits on Mar 11, 2021

  1. Handling undefined behavior in inffast_chunk

    This ports a21a4e8 - "Handling undefined behavior in inffast_chunk" from the Chromium fork of zlib.
    janaknat authored and Vlad Krasnov committed Mar 11, 2021

Commits on Dec 2, 2020

  1. CMake update. (cloudflare#25)

    * Fix architecture macro on MSVC.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Update '.gitignore'.
    
    Also remove 'zconf.h', it will be generated during building.
    
    * Add 'build' subdir to .gitignore
    
    * Fix architecture macro on MSVC.
    
    * Update CMakeLists.txt
    
    Clean up.
    Add option to set runtime (useful for MSVC).
    Add 'SSE4.2' and 'AVX' option.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt.
    
    Also remove checking fseeko.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Set 'CMAKE_MACOSX_RPATH' to TRUE.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Update SSE4.2 and PCLMUL setting in CMakeLists.txt
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Add 'HAVE_HIDDEN' to CMakeLists.txt
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Fix building dll on MSVC.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Fix zlib.pc file generation.
    
    Signed-off-by: Ningfei Li <ningfei.li@gmail.com>
    
    * Refine cmake settings.
    
    * Change cmake function to lowercase.
    
    * Fix zlib.pc
    
    * Fix crc32_pclmul_le_16 type.
    
    * Set POSITION_INDEPENDENT_CODE flag to ON by default.
    
    * Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc
    
    * Westmere detection
    
    * Update configure for Westmere (InsightSoftwareConsortium/ITK#416)
    
    * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)
    
    * remove gpl code
    
    * Improve support for compiling using Windows (https://github.com/ningfei/zlib)
    
    * Import ucm.cmake from https://github.com/ningfei/zlib
    
    * crc32_simd as separate file (cloudflare#18)
    
    * atomic and SKIP_CPUID_CHECK (cloudflare#18)
    
    * Suppress some MSVC warnings.
    
    * Remove unused code
    
    * Fix ucm_set_runtime when only C is enabled for the project.
    
    * Removed configured header file.
    
    * Unify zconf.h template.
    
    * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)
    
    * Atomic does not compile on Ubuntu 14.04
    
    * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.
    
    * Restore Windows compilation using "nmake -f win32\Makefile.msc"
    
    * Do not set SOVERSION on Cygwin.
    
    * Fix file permission.
    
    * Refine PCLMUL CMake option.
    
    * zconf.h required for Windows nmake
    
    * Do not set visibility flag on Cygwin.
    
    * Fix compiling dll resource.
    
    * Fix compiling using MinGW.
    
    SSE4.2 and PCLMUL are also supported.
    
    * Fix zconf.h for Windows.
    
    * support crc intrinsics for ARM CPUs
    
    * Add gzip -k option to minigzip (to aid benchmarks)
    
    * Support Intel and ARMv8 optimization in CMake.
    
    * Clean up.
    
    * Fix compiling on Apple M1.
    
    check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling
    fails. Workaround fix, need further investigation.
    
    * Restore MSVC warnings.
    
    Co-authored-by: neurolabusc <rorden@sc.edu>
    Co-authored-by: Chris Rorden <rorden@mailbox.sc.edu>
    3 people committed Dec 2, 2020
  2. Create make.yml (cloudflare#27)

    Vlad Krasnov committed Dec 2, 2020
  3. Create cmake.yml (cloudflare#26)

    Vlad Krasnov committed Dec 2, 2020

Commits on Sep 29, 2020

  1. Fix typo in configure file (cloudflare#24)

    Vlad Krasnov committed Sep 29, 2020

Commits on Sep 28, 2020

  1. Port Intel optimizations (adler32, chunkcopy) to cloudflare (cloudfla…

    …re#23)
    
    * Add SIMD SSSE3 implementation of the adler32 checksum
    
    Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib.
    17bbb3d73c84 ("zlib adler_simd.c")
    
    Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
    
    * Port inflate chunk SIMD SSE2 improvements for cloudflare
    
    Based on 2 patches from zlib chromium fork:
    
    * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
      3060dcb - "zlib: inflate using wider loads and stores"
    
    * Noel Gordon (noel@chromium.org)
      64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy
    
    The improvement in inflate performance is around 15-35%, based
    on the workload, when checked with a modified zpipe.c and the
    Silesia corpus.
    
    Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
    janaknat committed Sep 28, 2020

Commits on Sep 23, 2020

  1. Port ARM inflate performance improvement patches (chunk SIMD, read64l…

    …e) (cloudflare#22)
    
    * When windowBits is zero, the size of the sliding window comes from
    
    the zlib header.  The allowed values of the four-bit field are
    0..7, but when windowBits is zero, values greater than 7 are
    permitted and acted upon, resulting in large, mostly unused memory
    allocations.  This fix rejects such invalid zlib headers.
    
    * Add option to not compute or check check values.
    
    The undocumented (except in these commit comments) function
    inflateValidate(strm, check) can be called after an inflateInit(),
    inflateInit2(), or inflateReset2() with check equal to zero to
    turn off the check value (CRC-32 or Adler-32) computation and
    comparison. Calling with check not equal to zero turns checking
    back on. This should only be called immediately after the init or
    reset function. inflateReset() does not change the state, so a
    previous inflateValidate() setting will remain in effect.
    
    This also turns off validation of the gzip header CRC when
    present.
    
    This should only be used when a zlib or gzip stream has already
    been checked, and repeated decompressions of the same stream no
    longer need to be validated.
    
    * This verifies that the state has been initialized, that it is the
    
    expected type of state, deflate or inflate, and that at least the
    first several bytes of the internal state have not been clobbered.
    
    * Use macros to represent magic numbers
    
    This combines two patches which help in improving the readability and
    maintainability of the code by making magic numbers into #defines.
    
    Based on Chris Blume's (cblume@chromium) patches for zlib chromium:
    8888511 - "Zlib: Use defines for inffast"
    b9c1566 - "Share inffast names in zlib"
    
    These patches are needed when introducing chunk SIMD NEON enchancements.
    
    Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
    
    * Port inflate chunk SIMD NEON patches for cloudflare
    
    Based on 2 patches from zlib chromium fork:
    
    * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
      3060dcb - "zlib: inflate using wider loads and stores"
    
    * Noel Gordon (noel@chromium.org)
      64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy
    
    The two patches combined provide around 5-25% increase in inflate
    performance, based on the workload, when checked with a modified
    zpipe.c and the Silesia corpus.
    
    Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
    
    * Increase inflate speed: read decode input into a uint64_t
    
    Update the chunk-copy code with a wide input data reader, which consumes
    input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the
    wide reader.
    
    Based on Noel Gordon's (noel@chromium.org) patch for the zlib chromium fork
    8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t"
    
    This patch provides 7-10% inflate performance improvement when tested with a
    modified zpipe.c and the Silesia corpus.
    
    Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
    
    Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
    janaknat and madler committed Sep 23, 2020

Commits on Aug 20, 2020

  1. Add gzip -k option to minigzip (to aid benchmarks)

    neurolabusc authored and Vlad Krasnov committed Aug 20, 2020
  2. Add alder32 SIMD ARM support

    Add SIMD NEON implementation of the adler32 checksum.
    
    Inflate speed increased by ~10% with the Silesia corpus for ARMv8. Tested with
    a modified zpipe.c to run inflate, deflate for a stream of size 100M.
    
    Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib.
    17bbb3d73c84 ("zlib adler_simd.c")
    janaknat authored and Vlad Krasnov committed Aug 20, 2020

Commits on Feb 26, 2020

  1. License, Windows and Compatibility Upgrade (redux) (cloudflare#19)

    * Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc
    
    * Westmere detection
    
    * Update configure for Westmere (InsightSoftwareConsortium/ITK#416)
    
    * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)
    
    * remove gpl code
    
    * Improve support for compiling using Windows (https://github.com/ningfei/zlib)
    
    * Import ucm.cmake from https://github.com/ningfei/zlib
    
    * crc32_simd as separate file (cloudflare#18)
    
    * atomic and SKIP_CPUID_CHECK (cloudflare#18)
    
    * Remove unused code
    
    * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)
    
    * Atomic does not compile on Ubuntu 14.04
    
    * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.
    
    * Restore Windows compilation using "nmake -f win32\Makefile.msc"
    
    * zconf.h required for Windows nmake
    
    * support crc intrinsics for ARM CPUs
    
    * Requested changes (cloudflare#19)
    neurolabusc committed Feb 26, 2020

Commits on Apr 3, 2019

  1. zconf.h is generated by configure; it should not be in git. (cloudfla…

    …re#16)
    
    configure mostly generates this file by copying from zconf.h.in.
    kentonv authored and Vlad Krasnov committed Apr 3, 2019

Commits on Apr 1, 2019

  1. Don't disable target optimizations when cross-compiling. (cloudflare#15)

    * PS-237 Don't disable target optimizations when cross-compiling.
    
    AFAICT there's no reason to check the host platform here. The test only runs the compiler; it doesn't execute the output.
    
    * Add misc intermediate files to .gitignore.
    kentonv authored and vkrasnov committed Apr 1, 2019

Commits on May 30, 2018

  1. Set SSE4.2 flag (cloudflare#13)

    kornelski authored and vkrasnov committed May 30, 2018

Commits on May 27, 2018

  1. Fix compile of crc32-pclmul_asm on macOS (cloudflare#8)

    * .type and .size are ELF/COFF specific so drop them
    * .globl + .hidden equivalent for macOS is .private_extern
    * symbol name are not mangled on macOS, so we need to prefix _
    felixbuenemann authored and vkrasnov committed May 27, 2018
  2. Correctly check architecture on FreeBSD (cloudflare#10)

    On FreeBSD amd64 is named... 'amd64', not 'x86_64'. This is what
    'uname -m' prints.
    tomaszmduda authored and vkrasnov committed May 27, 2018

Commits on Mar 18, 2018

  1. faster crc32 of last <8 bytes on aarch64 (cloudflare#9)

    * faster crc32 of last <8 bytes on aarch64
    landfillbaby authored and vkrasnov committed Mar 18, 2018

Commits on Nov 7, 2017

  1. Merge pull request cloudflare#7 from cloudflare/vlad/aarch64

    Support for aarch64 with crc extension
    vkrasnov committed Nov 7, 2017
Older