Skip to content

Commit

Permalink
lib/lz4: enable LZ4_FAST_DEC_LOOP on aarch64 Clang builds
Browse files Browse the repository at this point in the history
Upstream lz4 mentioned a performance regression on Qualcomm SoCs
when built with Clang, but not with GCC [1]. However, according to my
testing on sdm845 with LLVM Clang 15, this patch does offer a nice
5-10% boost in decompression, so enable the fast dec loop for Clang
as well.

Testing procedure:
- pre-fill zram with 1GB of real-word zram data dumped under memory
  pressure, for example
  $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000
- $ fio --readonly --name=randread --direct=1 --rw=randread \
  --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \
  --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M

Results:
- vanilla lz4: read: IOPS=1282k, BW=5006MiB/s (5249MB/s)(4000MiB/799msec)
- lz4 fast dec: read: IOPS=1382k, BW=5398MiB/s (5660MB/s)(4000MiB/741msec)

[1] lz4/lz4#707

Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
  • Loading branch information
jjpprrrr authored and Kyuofox committed Jul 10, 2022
1 parent 0f96034 commit c99b145
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/lz4/lz4_decompress.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
#ifndef LZ4_FAST_DEC_LOOP
#if defined(__i386__) || defined(__x86_64__)
#define LZ4_FAST_DEC_LOOP 1
#elif defined(__aarch64__) && !defined(__clang__)
#elif defined(__aarch64__)
/* On aarch64, we disable this optimization for clang because on certain
* mobile chipsets and clang, it reduces performance. For more information
* refer to https://github.com/lz4/lz4/pull/707. */
Expand Down

0 comments on commit c99b145

Please sign in to comment.