Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default raw index compression format to LZ4 #7795

Closed
richardstartin opened this issue Nov 18, 2021 · 2 comments
Closed

Change default raw index compression format to LZ4 #7795

richardstartin opened this issue Nov 18, 2021 · 2 comments

Comments

@richardstartin
Copy link
Member

richardstartin commented Nov 18, 2021

LZ4 decompression is consistently faster than Snappy and produces smaller file sizes. These results are taken from a benchmark which consists of raw string sentences composed of words from Wikipedia (the distribution is the distribution of sentence length in words):

Benchmark                                  (_chunkCompressionType)      (_distribution)  (_maxChunkSize)  (_records)  Mode  Cnt           Score      Error  Units
BenchmarkRawForwardIndexReader.readV3                       SNAPPY  UNIFORM(1000,10000)          1048576      100000  avgt    5        4472.955 ±   45.246  ms/op
BenchmarkRawForwardIndexReader.readV3                       SNAPPY           EXP(0.001)          1048576      100000  avgt    5         861.179 ±   75.607  ms/op
BenchmarkRawForwardIndexReader.readV3                          LZ4  UNIFORM(1000,10000)          1048576      100000  avgt    5        2178.528 ±   46.808  ms/op
BenchmarkRawForwardIndexReader.readV3                          LZ4           EXP(0.001)          1048576      100000  avgt    5         360.927 ±   12.732  ms/op
BenchmarkRawForwardIndexReader.readV3                    ZSTANDARD  UNIFORM(1000,10000)          1048576      100000  avgt    5        4116.442 ±   88.894  ms/op
BenchmarkRawForwardIndexReader.readV3                    ZSTANDARD           EXP(0.001)          1048576      100000  avgt    5         789.733 ±   35.641  ms/op
BenchmarkRawForwardIndexReader.readV4                       SNAPPY  UNIFORM(1000,10000)          1048576      100000  avgt    5        4471.859 ±   55.049  ms/op
BenchmarkRawForwardIndexReader.readV4                       SNAPPY           EXP(0.001)          1048576      100000  avgt    5         791.099 ±    3.990  ms/op
BenchmarkRawForwardIndexReader.readV4                          LZ4  UNIFORM(1000,10000)          1048576      100000  avgt    5        2096.095 ±   57.949  ms/op
BenchmarkRawForwardIndexReader.readV4                          LZ4           EXP(0.001)          1048576      100000  avgt    5         344.592 ±    3.445  ms/op
BenchmarkRawForwardIndexReader.readV4                    ZSTANDARD  UNIFORM(1000,10000)          1048576      100000  avgt    5        4136.956 ±   98.780  ms/op
BenchmarkRawForwardIndexReader.readV4                    ZSTANDARD           EXP(0.001)          1048576      100000  avgt    5         742.575 ±   23.906  ms/op
BenchmarkRawForwardIndexWriter.writeV3                      SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5       71012.041 ±  425.849  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                    SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5  7560214965.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                   SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5     7383020.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                   SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5        7205.000                 #
BenchmarkRawForwardIndexWriter.writeV3                      SNAPPY           EXP(0.001)          1048576      100000    ss    5        9968.552 ±  785.299  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                    SNAPPY           EXP(0.001)          1048576      100000    ss    5  1387187650.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                   SNAPPY           EXP(0.001)          1048576      100000    ss    5     1354675.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                   SNAPPY           EXP(0.001)          1048576      100000    ss    5        1320.000                 #
BenchmarkRawForwardIndexWriter.writeV3                         LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5       72593.624 ± 7111.796  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                       LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5  7801600700.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                      LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5     7618750.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                      LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5        7440.000                 #
BenchmarkRawForwardIndexWriter.writeV3                         LZ4           EXP(0.001)          1048576      100000    ss    5       10565.451 ±  473.811  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                       LZ4           EXP(0.001)          1048576      100000    ss    5  1458628405.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                      LZ4           EXP(0.001)          1048576      100000    ss    5     1424440.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                      LZ4           EXP(0.001)          1048576      100000    ss    5        1390.000                 #
BenchmarkRawForwardIndexWriter.writeV3                   ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5       48690.818 ± 2852.004  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                 ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5  5010074735.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5     4892650.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5        4775.000                 #
BenchmarkRawForwardIndexWriter.writeV3                   ZSTANDARD           EXP(0.001)          1048576      100000    ss    5        8096.877 ±  806.071  ms/op
BenchmarkRawForwardIndexWriter.writeV3:b                 ZSTANDARD           EXP(0.001)          1048576      100000    ss    5   967798195.000                 #
BenchmarkRawForwardIndexWriter.writeV3:kb                ZSTANDARD           EXP(0.001)          1048576      100000    ss    5      945115.000                 #
BenchmarkRawForwardIndexWriter.writeV3:mb                ZSTANDARD           EXP(0.001)          1048576      100000    ss    5         920.000                 #
BenchmarkRawForwardIndexWriter.writeV4                      SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5       16158.218 ±  363.814  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                    SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5  7551233800.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                   SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5     7374250.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                   SNAPPY  UNIFORM(1000,10000)          1048576      100000    ss    5        7200.000                 #
BenchmarkRawForwardIndexWriter.writeV4                      SNAPPY           EXP(0.001)          1048576      100000    ss    5        2914.195 ±   81.574  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                    SNAPPY           EXP(0.001)          1048576      100000    ss    5  1367008240.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                   SNAPPY           EXP(0.001)          1048576      100000    ss    5     1334965.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                   SNAPPY           EXP(0.001)          1048576      100000    ss    5        1300.000                 #
BenchmarkRawForwardIndexWriter.writeV4                         LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5        9818.165 ±  490.165  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                       LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5  7785795405.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                      LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5     7603315.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                      LZ4  UNIFORM(1000,10000)          1048576      100000    ss    5        7425.000                 #
BenchmarkRawForwardIndexWriter.writeV4                         LZ4           EXP(0.001)          1048576      100000    ss    5        1765.996 ±   77.316  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                       LZ4           EXP(0.001)          1048576      100000    ss    5  1410988895.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                      LZ4           EXP(0.001)          1048576      100000    ss    5     1377915.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                      LZ4           EXP(0.001)          1048576      100000    ss    5        1345.000                 #
BenchmarkRawForwardIndexWriter.writeV4                   ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5       18359.873 ±  380.187  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                 ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5  4964714505.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5     4848350.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                ZSTANDARD  UNIFORM(1000,10000)          1048576      100000    ss    5        4730.000                 #
BenchmarkRawForwardIndexWriter.writeV4                   ZSTANDARD           EXP(0.001)          1048576      100000    ss    5        3346.148 ±  169.362  ms/op
BenchmarkRawForwardIndexWriter.writeV4:b                 ZSTANDARD           EXP(0.001)          1048576      100000    ss    5   900821780.000                 #
BenchmarkRawForwardIndexWriter.writeV4:kb                ZSTANDARD           EXP(0.001)          1048576      100000    ss    5      879705.000                 #
BenchmarkRawForwardIndexWriter.writeV4:mb                ZSTANDARD           EXP(0.001)          1048576      100000    ss    5         855.000                 #

Since overriding the chunk compression requires verbose configuration and this is an entirely backward compatible change (the raw indexes contain compression info in their headers) I would like to propose that the default be changed.

@siddharthteotia
Copy link
Contributor

+1
Linking related issue - #6804

@richardstartin
Copy link
Member Author

Done in #7797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants