Skip to content

CASSANDRA-15410: Improve strings encoding speed and add benchmarks#382

Closed
yifan-c wants to merge 3 commits intoapache:trunkfrom
yifan-c:CASSANDRA-15410
Closed

CASSANDRA-15410: Improve strings encoding speed and add benchmarks#382
yifan-c wants to merge 3 commits intoapache:trunkfrom
yifan-c:CASSANDRA-15410

Conversation

@yifan-c
Copy link
Contributor

@yifan-c yifan-c commented Nov 11, 2019

Given the fact that the encodeSize was calculated already when encoding, we can leverage the size and safely reserve the remaining capacity for writing to avoid resizing.

A set of benchmarks were taken to show the difference. For the long text, the change halves the string encoding time from 571.9 ns to 216.1 ns. The time is almost halves for the short text as well.

The improvement is because of avoiding the unnecessary resizing and data copy.

[java] Benchmark                                                  Mode  Cnt    Score    Error  Units
[java] Utf8StringEncodeBench.writeLongText                        avgt    6  571.949 ± 19.791  ns/op
[java] Utf8StringEncodeBench.writeLongTextWithExactSize           avgt    6  459.932 ± 27.790  ns/op
[java] Utf8StringEncodeBench.writeLongTextWithExactSizeSkipCalc   avgt    6  216.085 ±  3.480  ns/op
[java] Utf8StringEncodeBench.writeShortText                       avgt    6   62.775 ±  6.159  ns/op
[java] Utf8StringEncodeBench.writeShortTextWithExactSize          avgt    6   44.071 ±  5.645  ns/op
[java] Utf8StringEncodeBench.writeShortTextWithExactSizeSkipCalc  avgt    6   36.358 ±  5.135  ns/op
  • writeLongText: the original implementation that calls ByteBufUtils.writeUtf8. It over-estimates the size of string that causes resizing the buffer.
  • writeLongTextWithExactSize: calls TypeSizes.encodeUTF8Length to reserve the exact size of bytes to write.
  • writeLongTextWithExactSizeSkipCalc: optimize by removing calculating the UTF8 length. Because we calculated the encodeSize before encode for messages. Therefore, the size of the final bytes is known, we can leverage this information to just reserve using the remaining capacity.

@yifan-c yifan-c changed the title Estimate UTF-8 string size based on encodeSize and add benchmarks Improve strings encoding speed and add benchmarks Nov 16, 2019
@yifan-c
Copy link
Contributor Author

yifan-c commented Nov 16, 2019

Based on Aleksey's comment, a new write string method only for ASCII was added.

The latest benchmark shows similar performance between writeLongTextAsASCII and writeLongTextWithExactSizeSkipCalc

[java] Benchmark                                               Mode  Cnt    Score    Error  Units
[java] StringsEncodeBench.writeLongText                        avgt    6  570.920 ± 21.376  ns/op
[java] StringsEncodeBench.writeLongTextAsASCII                 avgt    6  291.466 ±  9.508  ns/op
[java] StringsEncodeBench.writeLongTextWithExactSize           avgt    6  467.222 ± 25.140  ns/op
[java] StringsEncodeBench.writeLongTextWithExactSizeSkipCalc   avgt    6  285.320 ± 10.883  ns/op
[java] StringsEncodeBench.writeShortText                       avgt    6   62.076 ±  2.107  ns/op
[java] StringsEncodeBench.writeShortTextAsASCII                avgt    6   32.121 ±  0.403  ns/op
[java] StringsEncodeBench.writeShortTextWithExactSize          avgt    6   41.929 ±  1.783  ns/op
[java] StringsEncodeBench.writeShortTextWithExactSizeSkipCalc  avgt    6   34.638 ±  0.455  ns/op

@yifan-c yifan-c changed the title Improve strings encoding speed and add benchmarks CASSANDRA-15410: Improve strings encoding speed and add benchmarks Nov 18, 2019
@yifan-c yifan-c closed this Nov 20, 2019
blambov pushed a commit to blambov/cassandra that referenced this pull request Jun 13, 2022
blambov pushed a commit to blambov/cassandra that referenced this pull request Nov 24, 2022
adelapena pushed a commit to adelapena/cassandra that referenced this pull request Sep 26, 2023
(cherry picked from commit 40e0941)
(cherry picked from commit 2815146)
(cherry picked from commit af7af2c)
(cherry picked from commit 0b9dad4)
ekaterinadimitrova2 pushed a commit to ekaterinadimitrova2/cassandra that referenced this pull request Jun 3, 2024
(cherry picked from commit 40e0941)
(cherry picked from commit 2815146)
(cherry picked from commit af7af2c)
(cherry picked from commit 0b9dad4)
(cherry picked from commit f842e4b)
michaelsembwever pushed a commit to thelastpickle/cassandra that referenced this pull request Jan 7, 2026
(cherry picked from commit 40e0941)
(cherry picked from commit 2815146)
(cherry picked from commit af7af2c)
(cherry picked from commit 0b9dad4)
(cherry picked from commit f842e4b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant