Skip to content

[FLINK-39044][benchmark] Add StringDataBenchmark for Table API string serialization#113

Closed
nateab wants to merge 1 commit intoapache:masterfrom
nateab:feature/table-api-serialization-benchmarks
Closed

[FLINK-39044][benchmark] Add StringDataBenchmark for Table API string serialization#113
nateab wants to merge 1 commit intoapache:masterfrom
nateab:feature/table-api-serialization-benchmarks

Conversation

@nateab
Copy link

@nateab nateab commented Feb 7, 2026

What is the purpose of the change

Add a JMH benchmark for StringDataSerializer operations to measure the performance of BinaryStringData copy, serialize, and deserialize operations.

This benchmark was used to validate the optimization in FLINK-39044 (apache/flink#27549), which adds a fast path in BinaryStringData.copy() for compact strings.

Benchmark methods:

  • copyCompactString / copyLargeString - measures serializer.copy() performance
  • serializeCompactString / serializeLargeString - measures serializer.serialize() performance
  • deserializeCompactString / deserializeLargeString - measures serializer.deserialize() performance

Sample results showing FLINK-39044 optimization impact:

Benchmark Baseline (ops/ms) Optimized (ops/ms) Improvement
copyCompactString 18,381 ± 22,770 807,198 ± 13,825 +4,292%
copyLargeString 3,551 ± 4,718 748,334 ± 6,211 +20,975%

Brief change log

  • Added StringDataBenchmark.java in org.apache.flink.benchmark.full package

Verifying this change

This change added tests and can be verified as follows:

  • Run the benchmark directly: mvn clean install exec:exec -Dbenchmarks=".*StringDataBenchmark.*"
  • Or run via main method in the benchmark class

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

… serialization

Add JMH benchmark for StringDataSerializer operations:
- copyCompactString / copyLargeString
- serializeCompactString / serializeLargeString
- deserializeCompactString / deserializeLargeString

This benchmark measures the performance of BinaryStringData copy,
serialize, and deserialize operations, used to validate FLINK-39044
(BinaryStringData.copy() optimization for compact strings).
@nateab nateab closed this Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant