Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark data table serialization logic and pre-allocate byte[] array if need be #6714

Open
mqliang opened this issue Mar 23, 2021 · 1 comment

Comments

@mqliang
Copy link
Contributor

mqliang commented Mar 23, 2021

As @siddharthteotia pointed out in #6710 (comment)_

serialization functions first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream. I think the reason for doing that is upfront we don't know the length of byte[] array to allocate.

However, we can probably do different and it might be faster

  • Write a loop to go over each entry and keep a running sum of size
  • At the end of loop, allocate byte array of that size
  • Start another loop and go over each entry again and fill out the pre-allocated byte array.
  • Return the filled byte array

We need to benchmark this two serialization approach. If the proposed approach is better, will send a PR to address it.

@mqliang
Copy link
Contributor Author

mqliang commented Mar 25, 2021

I write a benchmark here: mqliang@7892423

The benchmark compares three serialization methods (serialize a typical metadata map):

  • temporaryOutputStream: For each KV pair in metadata, first writes to a temporary output stream and then converts to byte array which is returned to the caller and written to the main stream
  • preAllocateByteArrayNative:
    • loop to go over each entry and keep a running sum of size
    • At the end of loop, allocate byte array of that size
    • Start another loop and go over each entry again and fill out the pre-allocated byte array.
    • Return the filled byte array
    • key and values are encoded two times during two loop
  • preAllocateByteArrayWithBytesCache: same logic as preAllocateByteArrayNative, just add a cache to cache the encoded K/V so can be used in the second loop.

Here is the result:

# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative

# Run progress: 0.00% complete, ETA 00:08:00
# Fork: 1 of 1
# Warmup Iteration   1: 552.178 us/op
Iteration   1: 519.531 us/op
                 ·gc.alloc.rate:                   3270.480 MB/sec
                 ·gc.alloc.rate.norm:              1811608.009 B/op
                 ·gc.churn.PS_Eden_Space:          3275.114 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1814175.318 B/op
                 ·gc.churn.PS_Survivor_Space:      0.558 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 309.168 B/op
                 ·gc.count:                        525.000 counts
                 ·gc.time:                         261.000 ms

Iteration   2: 524.659 us/op
                 ·gc.alloc.rate:                   3238.871 MB/sec
                 ·gc.alloc.rate.norm:              1811608.011 B/op
                 ·gc.churn.PS_Eden_Space:          3242.901 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1813862.347 B/op
                 ·gc.churn.PS_Survivor_Space:      0.563 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 314.968 B/op
                 ·gc.count:                        516.000 counts
                 ·gc.time:                         263.000 ms

Iteration   3: 526.323 us/op
                 ·gc.alloc.rate:                   3228.230 MB/sec
                 ·gc.alloc.rate.norm:              1811608.008 B/op
                 ·gc.churn.PS_Eden_Space:          3232.024 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1813736.682 B/op
                 ·gc.churn.PS_Survivor_Space:      0.471 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 264.539 B/op
                 ·gc.count:                        470.000 counts
                 ·gc.time:                         254.000 ms

Iteration   4: 521.779 us/op
                 ·gc.alloc.rate:                   3256.320 MB/sec
                 ·gc.alloc.rate.norm:              1811608.008 B/op
                 ·gc.churn.PS_Eden_Space:          3261.433 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1814452.617 B/op
                 ·gc.churn.PS_Survivor_Space:      0.560 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 311.772 B/op
                 ·gc.count:                        534.000 counts
                 ·gc.time:                         270.000 ms

Iteration   5: 524.474 us/op
                 ·gc.alloc.rate:                   3239.855 MB/sec
                 ·gc.alloc.rate.norm:              1811608.008 B/op
                 ·gc.churn.PS_Eden_Space:          3242.045 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1812832.659 B/op
                 ·gc.churn.PS_Survivor_Space:      0.547 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 305.975 B/op
                 ·gc.count:                        483.000 counts
                 ·gc.time:                         255.000 ms



Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative":
  523.353 ±(99.9%) 10.345 us/op [Average]
  (min, avg, max) = (519.531, 523.353, 526.323), stdev = 2.687
  CI (99.9%): [513.008, 533.698] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate":
  3246.751 ±(99.9%) 64.066 MB/sec [Average]
  (min, avg, max) = (3228.230, 3246.751, 3270.480), stdev = 16.638
  CI (99.9%): [3182.685, 3310.818] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm":
  1811608.009 ±(99.9%) 0.005 B/op [Average]
  (min, avg, max) = (1811608.008, 1811608.009, 1811608.011), stdev = 0.001
  CI (99.9%): [1811608.003, 1811608.014] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space":
  3250.704 ±(99.9%) 66.578 MB/sec [Average]
  (min, avg, max) = (3232.024, 3250.704, 3275.114), stdev = 17.290
  CI (99.9%): [3184.126, 3317.282] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm":
  1813811.924 ±(99.9%) 2365.646 B/op [Average]
  (min, avg, max) = (1812832.659, 1813811.924, 1814452.617), stdev = 614.351
  CI (99.9%): [1811446.279, 1816177.570] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space":
  0.540 ±(99.9%) 0.150 MB/sec [Average]
  (min, avg, max) = (0.471, 0.540, 0.563), stdev = 0.039
  CI (99.9%): [0.390, 0.690] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm":
  301.285 ±(99.9%) 80.118 B/op [Average]
  (min, avg, max) = (264.539, 301.285, 314.968), stdev = 20.806
  CI (99.9%): [221.166, 381.403] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count":
  2528.000 ±(99.9%) 0.001 counts [Sum]
  (min, avg, max) = (470.000, 505.600, 534.000), stdev = 27.700
  CI (99.9%): [2528.000, 2528.000] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time":
  1303.000 ±(99.9%) 0.001 ms [Sum]
  (min, avg, max) = (254.000, 260.600, 270.000), stdev = 6.504
  CI (99.9%): [1303.000, 1303.000] (assumes normal distribution)


# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache

# Run progress: 33.33% complete, ETA 00:05:28
# Fork: 1 of 1
# Warmup Iteration   1: 390.616 us/op
Iteration   1: 375.676 us/op
                 ·gc.alloc.rate:                   3524.091 MB/sec
                 ·gc.alloc.rate.norm:              1411608.008 B/op
                 ·gc.churn.PS_Eden_Space:          3532.601 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1415016.587 B/op
                 ·gc.churn.PS_Survivor_Space:      0.538 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 215.400 B/op
                 ·gc.count:                        458.000 counts
                 ·gc.time:                         248.000 ms

Iteration   2: 375.171 us/op
                 ·gc.alloc.rate:                   3528.907 MB/sec
                 ·gc.alloc.rate.norm:              1411608.006 B/op
                 ·gc.churn.PS_Eden_Space:          3534.356 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1413787.624 B/op
                 ·gc.churn.PS_Survivor_Space:      0.494 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 197.609 B/op
                 ·gc.count:                        435.000 counts
                 ·gc.time:                         247.000 ms

Iteration   3: 373.233 us/op
                 ·gc.alloc.rate:                   3547.720 MB/sec
                 ·gc.alloc.rate.norm:              1411608.005 B/op
                 ·gc.churn.PS_Eden_Space:          3559.728 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1416385.929 B/op
                 ·gc.churn.PS_Survivor_Space:      0.539 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 214.343 B/op
                 ·gc.count:                        462.000 counts
                 ·gc.time:                         247.000 ms

Iteration   4: 371.186 us/op
                 ·gc.alloc.rate:                   3567.068 MB/sec
                 ·gc.alloc.rate.norm:              1411608.006 B/op
                 ·gc.churn.PS_Eden_Space:          3566.702 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1411463.405 B/op
                 ·gc.churn.PS_Survivor_Space:      0.597 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 236.411 B/op
                 ·gc.count:                        520.000 counts
                 ·gc.time:                         271.000 ms

Iteration   5: 370.738 us/op
                 ·gc.alloc.rate:                   3571.354 MB/sec
                 ·gc.alloc.rate.norm:              1411608.005 B/op
                 ·gc.churn.PS_Eden_Space:          3582.874 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1416161.234 B/op
                 ·gc.churn.PS_Survivor_Space:      0.588 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 232.322 B/op
                 ·gc.count:                        509.000 counts
                 ·gc.time:                         262.000 ms



Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache":
  373.201 ±(99.9%) 8.639 us/op [Average]
  (min, avg, max) = (370.738, 373.201, 375.676), stdev = 2.243
  CI (99.9%): [364.562, 381.840] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate":
  3547.828 ±(99.9%) 82.702 MB/sec [Average]
  (min, avg, max) = (3524.091, 3547.828, 3571.354), stdev = 21.477
  CI (99.9%): [3465.126, 3630.530] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm":
  1411608.006 ±(99.9%) 0.005 B/op [Average]
  (min, avg, max) = (1411608.005, 1411608.006, 1411608.008), stdev = 0.001
  CI (99.9%): [1411608.001, 1411608.011] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space":
  3555.252 ±(99.9%) 83.120 MB/sec [Average]
  (min, avg, max) = (3532.601, 3555.252, 3582.874), stdev = 21.586
  CI (99.9%): [3472.132, 3638.373] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm":
  1414562.956 ±(99.9%) 7771.212 B/op [Average]
  (min, avg, max) = (1411463.405, 1414562.956, 1416385.929), stdev = 2018.159
  CI (99.9%): [1406791.744, 1422334.168] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space":
  0.551 ±(99.9%) 0.162 MB/sec [Average]
  (min, avg, max) = (0.494, 0.551, 0.597), stdev = 0.042
  CI (99.9%): [0.389, 0.713] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm":
  219.217 ±(99.9%) 60.046 B/op [Average]
  (min, avg, max) = (197.609, 219.217, 236.411), stdev = 15.594
  CI (99.9%): [159.171, 279.263] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count":
  2384.000 ±(99.9%) 0.001 counts [Sum]
  (min, avg, max) = (435.000, 476.800, 520.000), stdev = 36.134
  CI (99.9%): [2384.000, 2384.000] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time":
  1275.000 ±(99.9%) 0.001 ms [Sum]
  (min, avg, max) = (247.000, 255.000, 271.000), stdev = 10.977
  CI (99.9%): [1275.000, 1275.000] (assumes normal distribution)


# JMH version: 1.26
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=65146:/Users/mqliang/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/203.7717.56/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 30 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream

# Run progress: 66.67% complete, ETA 00:02:44
# Fork: 1 of 1
# Warmup Iteration   1: 483.366 us/op
Iteration   1: 408.351 us/op
                 ·gc.alloc.rate:                   3580.758 MB/sec
                 ·gc.alloc.rate.norm:              1558808.007 B/op
                 ·gc.churn.PS_Eden_Space:          3586.078 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1561123.846 B/op
                 ·gc.churn.PS_Survivor_Space:      0.511 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 222.603 B/op
                 ·gc.count:                        476.000 counts
                 ·gc.time:                         253.000 ms

Iteration   2: 410.342 us/op
                 ·gc.alloc.rate:                   3563.256 MB/sec
                 ·gc.alloc.rate.norm:              1558808.009 B/op
                 ·gc.churn.PS_Eden_Space:          3569.765 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1561655.686 B/op
                 ·gc.churn.PS_Survivor_Space:      0.451 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 197.394 B/op
                 ·gc.count:                        409.000 counts
                 ·gc.time:                         244.000 ms

Iteration   3: 407.314 us/op
                 ·gc.alloc.rate:                   3589.291 MB/sec
                 ·gc.alloc.rate.norm:              1558808.006 B/op
                 ·gc.churn.PS_Eden_Space:          3592.335 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1560130.076 B/op
                 ·gc.churn.PS_Survivor_Space:      0.557 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 241.833 B/op
                 ·gc.count:                        495.000 counts
                 ·gc.time:                         261.000 ms

Iteration   4: 407.294 us/op
                 ·gc.alloc.rate:                   3590.035 MB/sec
                 ·gc.alloc.rate.norm:              1558808.006 B/op
                 ·gc.churn.PS_Eden_Space:          3595.643 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1561243.143 B/op
                 ·gc.churn.PS_Survivor_Space:      0.439 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 190.513 B/op
                 ·gc.count:                        382.000 counts
                 ·gc.time:                         239.000 ms

Iteration   5: 410.068 us/op
                 ·gc.alloc.rate:                   3565.783 MB/sec
                 ·gc.alloc.rate.norm:              1558808.006 B/op
                 ·gc.churn.PS_Eden_Space:          3576.571 MB/sec
                 ·gc.churn.PS_Eden_Space.norm:     1563524.046 B/op
                 ·gc.churn.PS_Survivor_Space:      0.542 MB/sec
                 ·gc.churn.PS_Survivor_Space.norm: 236.741 B/op
                 ·gc.count:                        460.000 counts
                 ·gc.time:                         252.000 ms



Result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream":
  408.674 ±(99.9%) 5.641 us/op [Average]
  (min, avg, max) = (407.294, 408.674, 410.342), stdev = 1.465
  CI (99.9%): [403.033, 414.314] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate":
  3577.824 ±(99.9%) 48.952 MB/sec [Average]
  (min, avg, max) = (3563.256, 3577.824, 3590.035), stdev = 12.713
  CI (99.9%): [3528.873, 3626.776] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm":
  1558808.007 ±(99.9%) 0.005 B/op [Average]
  (min, avg, max) = (1558808.006, 1558808.007, 1558808.009), stdev = 0.001
  CI (99.9%): [1558808.002, 1558808.011] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space":
  3584.078 ±(99.9%) 41.614 MB/sec [Average]
  (min, avg, max) = (3569.765, 3584.078, 3595.643), stdev = 10.807
  CI (99.9%): [3542.465, 3625.692] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm":
  1561535.360 ±(99.9%) 4793.590 B/op [Average]
  (min, avg, max) = (1560130.076, 1561535.360, 1563524.046), stdev = 1244.880
  CI (99.9%): [1556741.769, 1566328.950] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space":
  0.500 ±(99.9%) 0.204 MB/sec [Average]
  (min, avg, max) = (0.439, 0.500, 0.557), stdev = 0.053
  CI (99.9%): [0.296, 0.704] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm":
  217.817 ±(99.9%) 88.656 B/op [Average]
  (min, avg, max) = (190.513, 217.817, 241.833), stdev = 23.024
  CI (99.9%): [129.161, 306.473] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count":
  2222.000 ±(99.9%) 0.001 counts [Sum]
  (min, avg, max) = (382.000, 444.400, 495.000), stdev = 47.300
  CI (99.9%): [2222.000, 2222.000] (assumes normal distribution)

Secondary result "org.apache.pinot.perf.BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time":
  1249.000 ±(99.9%) 0.001 ms [Sum]
  (min, avg, max) = (239.000, 249.800, 261.000), stdev = 8.526
  CI (99.9%): [1249.000, 1249.000] (assumes normal distribution)


# Run complete. Total time: 00:08:12

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                                                            Mode  Cnt        Score      Error   Units
BenchmarkDataTableSerialization.preAllocateByteArrayNative                                           avgt    5      523.353 ±   10.345   us/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate                            avgt    5     3246.751 ±   64.066  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.alloc.rate.norm                       avgt    5  1811608.009 ±    0.005    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space                   avgt    5     3250.704 ±   66.578  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Eden_Space.norm              avgt    5  1813811.924 ± 2365.646    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space               avgt    5        0.540 ±    0.150  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.churn.PS_Survivor_Space.norm          avgt    5      301.285 ±   80.118    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.count                                 avgt    5     2528.000             counts
BenchmarkDataTableSerialization.preAllocateByteArrayNative:·gc.time                                  avgt    5     1303.000                 ms
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache                                   avgt    5      373.201 ±    8.639   us/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate                    avgt    5     3547.828 ±   82.702  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.alloc.rate.norm               avgt    5  1411608.006 ±    0.005    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space           avgt    5     3555.252 ±   83.120  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Eden_Space.norm      avgt    5  1414562.956 ± 7771.212    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space       avgt    5        0.551 ±    0.162  MB/sec
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.churn.PS_Survivor_Space.norm  avgt    5      219.217 ±   60.046    B/op
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.count                         avgt    5     2384.000             counts
BenchmarkDataTableSerialization.preAllocateByteArrayWithBytesCache:·gc.time                          avgt    5     1275.000                 ms
BenchmarkDataTableSerialization.temporaryOutputStream                                                avgt    5      408.674 ±    5.641   us/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate                                 avgt    5     3577.824 ±   48.952  MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.alloc.rate.norm                            avgt    5  1558808.007 ±    0.005    B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space                        avgt    5     3584.078 ±   41.614  MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Eden_Space.norm                   avgt    5  1561535.360 ± 4793.590    B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space                    avgt    5        0.500 ±    0.204  MB/sec
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.churn.PS_Survivor_Space.norm               avgt    5      217.817 ±   88.656    B/op
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.count                                      avgt    5     2222.000             counts
BenchmarkDataTableSerialization.temporaryOutputStream:·gc.time                                       avgt    5     1249.000                 ms

Process finished with exit code 0

If my implementation is correct, benchmark result shows that using pre-allocate byte array with cache is slightly better than temporary output stream (10% faster -- 373.201 us/op VS. 408.674 us/op, use more memory of course to cache encoded KV, but GC time does not increased -- 1275ms VS 1249ms). It's easy to understand why preAllocateByteArrayNative is the worst one -- it encode K/V twice, whereas other two methods only encode K/V once.

Not sure whether we should do the change just for getting 10% improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant