Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8334 Make sure the thread which tries to complete delayed reque… #8657

Merged
merged 17 commits into from Sep 9, 2020

Conversation

chia7712
Copy link
Contributor

@chia7712 chia7712 commented May 12, 2020

The main changes of this PR are shown below.

  1. replace tryLock by lock for DelayedOperation#maybeTryComplete
  2. complete the delayed requests without holding group lock

BEFORE

test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.num_producers=3.acks=1
status: PASS
run time: 56.718 seconds
{"records_per_sec": 621619.67445, "mb_per_sec": 59.28}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 16.067 seconds
{"records_per_sec": 1565190.1706, "mb_per_sec": 149.2682}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 1 minute 2.486 seconds
{"records_per_sec": 3165558.7211, "mb_per_sec": 301.8912}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 19.929 seconds
{"records_per_sec": 1350621.2858, "mb_per_sec": 128.8053}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 1 minute 3.014 seconds
{"records_per_sec": 3653635.3672, "mb_per_sec": 348.4378}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 58.852 seconds
{"records_per_sec": 3252032.5203, "mb_per_sec": 310.138}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 59.315 seconds
{"records_per_sec": 3825554.7054, "mb_per_sec": 364.8333}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none
status: PASS
run time: 41.012 seconds
{"latency_99th_ms": 6.0, "latency_50th_ms": 0.0, "latency_999th_ms": 16.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy
status: PASS
run time: 44.975 seconds
{"latency_99th_ms": 5.0, "latency_50th_ms": 0.0, "latency_999th_ms": 19.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none
status: PASS
run time: 49.868 seconds
{"latency_99th_ms": 5.0, "latency_50th_ms": 0.0, "latency_999th_ms": 15.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy
status: PASS
run time: 48.454 seconds
{"latency_99th_ms": 5.0, "latency_50th_ms": 0.0, "latency_999th_ms": 19.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 9.145 seconds
{"consumer": {"records_per_sec": 610426.0774, "mb_per_sec": 58.2148}, "producer": {"records_per_sec": 620385.880017, "mb_per_sec": 59.16}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 1 minute 2.140 seconds
{"consumer": {"records_per_sec": 1465845.793, "mb_per_sec": 139.7939}, "producer": {"records_per_sec": 1416831.963729, "mb_per_sec": 135.12}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 10.968 seconds
{"consumer": {"records_per_sec": 599089.3841, "mb_per_sec": 57.1336}, "producer": {"records_per_sec": 626370.184779, "mb_per_sec": 59.74}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 58.237 seconds
{"consumer": {"records_per_sec": 1298532.6581, "mb_per_sec": 123.8377}, "producer": {"records_per_sec": 1315443.304394, "mb_per_sec": 125.45}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 1 minute 0.201 seconds
{"consumer": {"records_per_sec": 997705.2779, "mb_per_sec": 95.1486}, "producer": {"records_per_sec": 957212.596918, "mb_per_sec": 91.29}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 56.187 seconds
{"consumer": {"records_per_sec": 1313025.2101, "mb_per_sec": 125.2198}, "producer": {"records_per_sec": 1363512.407963, "mb_per_sec": 130.03}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 57.195 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 11.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 57.311 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 8.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 57.756 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 11.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 57.291 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 8.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 48.981 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 15.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 51.503 seconds
{"latency_99th_ms": 3.0, "latency_50th_ms": 0.0, "latency_999th_ms": 9.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 8.161 seconds
{"0": {"records_per_sec": 698421.567258, "mb_per_sec": 66.61}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 56.530 seconds
{"0": {"records_per_sec": 1639881.928501, "mb_per_sec": 156.39}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 4.389 seconds
{"0": {"records_per_sec": 720097.933319, "mb_per_sec": 68.67}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 59.589 seconds
{"0": {"records_per_sec": 1621271.076524, "mb_per_sec": 154.62}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 56.165 seconds
{"0": {"records_per_sec": 1152737.752161, "mb_per_sec": 109.93}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 54.846 seconds
{"0": {"records_per_sec": 1646903.820817, "mb_per_sec": 157.06}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 59.692 seconds
{"records_per_sec": 1794354.545455, "mb_per_sec": 17.11}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 58.774 seconds
{"records_per_sec": 1973499.779444, "mb_per_sec": 18.82}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 57.450 seconds
{"records_per_sec": 325613.051917, "mb_per_sec": 31.05}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 54.134 seconds
{"records_per_sec": 734232.49453, "mb_per_sec": 70.02}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 54.106 seconds
{"records_per_sec": 41259.452813, "mb_per_sec": 39.35}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 52.577 seconds
{"records_per_sec": 51681.555641, "mb_per_sec": 49.29}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 57.747 seconds
{"records_per_sec": 4320.991629, "mb_per_sec": 41.21}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 55.024 seconds
{"records_per_sec": 4223.096287, "mb_per_sec": 40.27}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 53.355 seconds
{"records_per_sec": 817.794028, "mb_per_sec": 77.99}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 53.097 seconds
{"records_per_sec": 797.859691, "mb_per_sec": 76.09}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 57.602 seconds
{"records_per_sec": 1779132.025451, "mb_per_sec": 16.97}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 59.207 seconds
{"records_per_sec": 1935367.267484, "mb_per_sec": 18.46}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 58.127 seconds
{"records_per_sec": 330911.489152, "mb_per_sec": 31.56}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 50.805 seconds
{"records_per_sec": 615677.522936, "mb_per_sec": 58.72}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 51.221 seconds
{"records_per_sec": 40378.158845, "mb_per_sec": 38.51}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 50.578 seconds
{"records_per_sec": 51901.392111, "mb_per_sec": 49.5}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 53.369 seconds
{"records_per_sec": 4363.13394, "mb_per_sec": 41.61}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 53.982 seconds
{"records_per_sec": 4323.775773, "mb_per_sec": 41.23}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 54.736 seconds
{"records_per_sec": 810.386473, "mb_per_sec": 77.28}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 54.867 seconds
{"records_per_sec": 795.023697, "mb_per_sec": 75.82}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-one.acks=1
status: PASS
run time: 48.440 seconds
{"records_per_sec": 701608.468374, "mb_per_sec": 66.91}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.acks=-1
status: PASS
run time: 55.268 seconds
{"records_per_sec": 268274.435339, "mb_per_sec": 25.58}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.acks=1
status: PASS
run time: 50.207 seconds
{"records_per_sec": 467657.491289, "mb_per_sec": 44.6}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=10
status: PASS
run time: 55.395 seconds
{"records_per_sec": 2038543.742406, "mb_per_sec": 19.44}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=100
status: PASS
run time: 52.835 seconds
{"records_per_sec": 479520.185781, "mb_per_sec": 45.73}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=1000
status: PASS
run time: 47.566 seconds
{"records_per_sec": 50609.728507, "mb_per_sec": 48.27}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=10000
status: PASS
run time: 49.949 seconds
{"records_per_sec": 5941.124391, "mb_per_sec": 56.66}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=100000
status: PASS
run time: 48.718 seconds
{"records_per_sec": 1698.734177, "mb_per_sec": 162.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=10
status: PASS
run time: 55.253 seconds
{"records_per_sec": 1946594.923858, "mb_per_sec": 18.56}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=100
status: PASS
run time: 50.712 seconds
{"records_per_sec": 986894.852941, "mb_per_sec": 94.12}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=1000
status: PASS
run time: 58.378 seconds
{"records_per_sec": 112787.394958, "mb_per_sec": 107.56}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=10000
status: PASS
run time: 50.972 seconds
{"records_per_sec": 5747.751606, "mb_per_sec": 54.81}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=100000
status: PASS
run time: 47.419 seconds
{"records_per_sec": 1580.683157, "mb_per_sec": 150.75}
--------------------------------------------------------------------------------

AFTER

test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.num_producers=3.acks=1
status: PASS
run time: 56.262 seconds
{"records_per_sec": 625731.99396, "mb_per_sec": 59.68}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 15.345 seconds
{"records_per_sec": 1458151.0645, "mb_per_sec": 139.0601}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 1 minute 5.284 seconds
{"records_per_sec": 3173595.6839, "mb_per_sec": 302.6577}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 16.265 seconds
{"records_per_sec": 1477759.7163, "mb_per_sec": 140.9301}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 1 minute 4.992 seconds
{"records_per_sec": 2992220.2274, "mb_per_sec": 285.3604}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 1 minute 2.562 seconds
{"records_per_sec": 3987240.8293, "mb_per_sec": 380.2529}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 1 minute 1.068 seconds
{"records_per_sec": 3531073.4463, "mb_per_sec": 336.7494}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none
status: PASS
run time: 43.213 seconds
{"latency_99th_ms": 6.0, "latency_50th_ms": 0.0, "latency_999th_ms": 19.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy
status: PASS
run time: 44.302 seconds
{"latency_99th_ms": 6.0, "latency_50th_ms": 0.0, "latency_999th_ms": 17.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none
status: PASS
run time: 52.117 seconds
{"latency_99th_ms": 6.0, "latency_50th_ms": 0.0, "latency_999th_ms": 17.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy
status: PASS
run time: 48.599 seconds
{"latency_99th_ms": 6.0, "latency_50th_ms": 0.0, "latency_999th_ms": 15.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 6.347 seconds
{"consumer": {"records_per_sec": 610165.3548, "mb_per_sec": 58.1899}, "producer": {"records_per_sec": 645161.290323, "mb_per_sec": 61.53}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 58.196 seconds
{"consumer": {"records_per_sec": 1365001.365, "mb_per_sec": 130.1767}, "producer": {"records_per_sec": 1315270.288044, "mb_per_sec": 125.43}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 8.056 seconds
{"consumer": {"records_per_sec": 635364.3815, "mb_per_sec": 60.5931}, "producer": {"records_per_sec": 645369.474024, "mb_per_sec": 61.55}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 55.585 seconds
{"consumer": {"records_per_sec": 1396453.0094, "mb_per_sec": 133.1761}, "producer": {"records_per_sec": 1345170.836696, "mb_per_sec": 128.29}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 57.844 seconds
{"consumer": {"records_per_sec": 995024.8756, "mb_per_sec": 94.893}, "producer": {"records_per_sec": 934928.9454, "mb_per_sec": 89.16}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 57.728 seconds
{"consumer": {"records_per_sec": 1442793.2477, "mb_per_sec": 137.5955}, "producer": {"records_per_sec": 1343002.954607, "mb_per_sec": 128.08}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 59.918 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 10.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 58.414 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 13.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 58.689 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 10.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 57.322 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 11.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 53.221 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 12.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 53.012 seconds
{"latency_99th_ms": 4.0, "latency_50th_ms": 0.0, "latency_999th_ms": 13.0}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 9.797 seconds
{"0": {"records_per_sec": 712352.186921, "mb_per_sec": 67.94}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.2.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 58.567 seconds
{"0": {"records_per_sec": 1586294.416244, "mb_per_sec": 151.28}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=none
status: PASS
run time: 1 minute 3.881 seconds
{"0": {"records_per_sec": 730513.551026, "mb_per_sec": 69.67}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.interbroker_security_protocol=PLAINTEXT.tls_version=TLSv1.3.security_protocol=SSL.compression_type=snappy
status: PASS
run time: 58.038 seconds
{"0": {"records_per_sec": 1624959.376016, "mb_per_sec": 154.97}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=none
status: PASS
run time: 54.064 seconds
{"0": {"records_per_sec": 1184834.123223, "mb_per_sec": 112.99}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=snappy
status: PASS
run time: 53.514 seconds
{"0": {"records_per_sec": 1647175.094713, "mb_per_sec": 157.09}}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 1 minute 0.244 seconds
{"records_per_sec": 1814733.910222, "mb_per_sec": 17.31}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 59.460 seconds
{"records_per_sec": 2014373.705538, "mb_per_sec": 19.21}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 58.051 seconds
{"records_per_sec": 328240.890193, "mb_per_sec": 31.3}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 52.228 seconds
{"records_per_sec": 747730.91922, "mb_per_sec": 71.31}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 53.001 seconds
{"records_per_sec": 40475.572979, "mb_per_sec": 38.6}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 54.782 seconds
{"records_per_sec": 70752.24038, "mb_per_sec": 67.47}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 57.875 seconds
{"records_per_sec": 4461.768617, "mb_per_sec": 42.55}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 57.089 seconds
{"records_per_sec": 4282.386726, "mb_per_sec": 40.84}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 54.228 seconds
{"records_per_sec": 824.324324, "mb_per_sec": 78.61}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.2.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 51.462 seconds
{"records_per_sec": 809.897405, "mb_per_sec": 77.24}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 59.836 seconds
{"records_per_sec": 1812773.095624, "mb_per_sec": 17.29}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 56.087 seconds
{"records_per_sec": 2029604.113111, "mb_per_sec": 19.36}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 55.743 seconds
{"records_per_sec": 348707.976098, "mb_per_sec": 33.26}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 52.794 seconds
{"records_per_sec": 974003.628447, "mb_per_sec": 92.89}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 55.364 seconds
{"records_per_sec": 41284.835435, "mb_per_sec": 39.37}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=1000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 51.486 seconds
{"records_per_sec": 57827.229642, "mb_per_sec": 55.15}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 55.384 seconds
{"records_per_sec": 4502.180476, "mb_per_sec": 42.94}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=10000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 54.028 seconds
{"records_per_sec": 4217.787555, "mb_per_sec": 40.22}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=none
status: PASS
run time: 56.079 seconds
{"records_per_sec": 839.79975, "mb_per_sec": 80.09}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.tls_version=TLSv1.3.message_size=100000.topic=topic-replication-factor-three.security_protocol=SSL.acks=1.compression_type=snappy
status: PASS
run time: 54.970 seconds
{"records_per_sec": 826.35468, "mb_per_sec": 78.81}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-one.acks=1
status: PASS
run time: 46.430 seconds
{"records_per_sec": 746068.371317, "mb_per_sec": 71.15}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.acks=-1
status: PASS
run time: 53.541 seconds
{"records_per_sec": 318277.685558, "mb_per_sec": 30.35}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.acks=1
status: PASS
run time: 49.139 seconds
{"records_per_sec": 487355.482934, "mb_per_sec": 46.48}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=10
status: PASS
run time: 53.150 seconds
{"records_per_sec": 2153686.136072, "mb_per_sec": 20.54}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=100
status: PASS
run time: 51.156 seconds
{"records_per_sec": 455438.411944, "mb_per_sec": 43.43}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=1000
status: PASS
run time: 51.568 seconds
{"records_per_sec": 52820.543093, "mb_per_sec": 50.37}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=10000
status: PASS
run time: 46.992 seconds
{"records_per_sec": 6253.960857, "mb_per_sec": 59.64}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=none.acks=1.message_size=100000
status: PASS
run time: 46.280 seconds
{"records_per_sec": 1669.154229, "mb_per_sec": 159.18}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=10
status: PASS
run time: 54.138 seconds
{"records_per_sec": 1951122.546882, "mb_per_sec": 18.61}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=100
status: PASS
run time: 47.680 seconds
{"records_per_sec": 1021443.683409, "mb_per_sec": 97.41}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=1000
status: PASS
run time: 47.886 seconds
{"records_per_sec": 116104.67128, "mb_per_sec": 110.73}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=10000
status: PASS
run time: 54.126 seconds
{"records_per_sec": 5550.454921, "mb_per_sec": 52.93}
--------------------------------------------------------------------------------
test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=snappy.acks=1.message_size=100000
status: PASS
run time: 46.799 seconds
{"records_per_sec": 1582.54717, "mb_per_sec": 150.92}
--------------------------------------------------------------------------------


Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@chia7712
Copy link
Contributor Author

@rajinisivaram @junrao @windkit please take a look :)

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the PR. Sorry for the delay. Made a pass of non-testing files. Overall, I felt that this approach works. It adds its own complexity, but it's probably better than adding a separate thread pool. A few comments below.

core/src/main/scala/kafka/server/ReplicaManager.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/ReplicaManager.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/DelayedOperation.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved
@chia7712
Copy link
Contributor Author

chia7712 commented Jun 2, 2020

@junrao Could you take a look?

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updated PR. Made a pass of all files. A few more comments below.

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/ReplicaManager.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/DelayedOperation.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved
offsetsPartitions.map(_.partition).toSet, isCommit = isCommit)
catch {
case e: IllegalStateException if isCommit
&& e.getMessage.contains("though the offset commit record itself hasn't been appended to the log")=>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why do we need this logic now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestReplicaManager#appendRecords (https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/coordinator/AbstractCoordinatorConcurrencyTest.scala#L207) always complete the delayedProduce immediately so the txn offset is append also. This PR tries to complete the delayedProduce after releasing the group lock so it is possible to cause following execution order.

  1. txn prepare
  2. txn completion (fail)
  3. txn append (this is executed by delayedProduce)

Copy link
Contributor

@junrao junrao Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am still not sure that I fully understand this. It seems that by not completing the delayedProduce within the group lock, we are hitting IllegalStateException. That seems a bug. Do you know which code depends on that? It seems that we do hold a group lock when updating the txnOffset.

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L462

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a bug.

The root cause (changed by this PR) is that the "txn initialization" and "txn append" are not executed within same lock.

The test story is shown below.

CommitTxnOffsetsOperation calls GroupMetadata.prepareTxnOffsetCommit to add CommitRecordMetadataAndOffset(None, offsetAndMetadata) to pendingTransactionalOffsetCommits (this is the link you attached).

GroupMetadata.completePendingTxnOffsetCommit called by CompleteTxnOperation throws IllegalStateException if CommitRecordMetadataAndOffset.appendedBatchOffset is None (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadata.scala#L664).

Why it does not cause error before?

CommitRecordMetadataAndOffset.appendedBatchOffset is updated by the callback putCacheCallback (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L407). TestReplicManager always create delayedProduce do handle the putCacheCallback (https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/coordinator/AbstractCoordinatorConcurrencyTest.scala#L188). The condition to complete the delayedProduce is completeAttempts.incrementAndGet() >= 3. And the condition gets true when call both producePurgatory.tryCompleteElseWatch(delayedProduce, producerRequestKeys) and tryCompleteDelayedRequests() since the former calls tryComplete two times and another calls tryComplete once. It means putCacheCallback is always executed by TestReplicManager.appendRecords and noted that TestReplicManager.appendRecords is executed within a group lock (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L738) . In short, txn initialization (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L464) and txn append (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L407) are executed with same group lock. Hence, the following execution order is impossible.

  1. txn initialization
  2. txn completion
  3. txn append

However, this PR disable to complete delayed requests within group lock held by caller. The putCacheCallback which used to append txn needs to require group lock again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great explanation. I understand the issue now. Essentially, this exposed a limitation of the existing test. The existing test happens to work because the producer callbacks are always completed in the same ReplicaManager.appendRecords() call under the group lock. However, this is not necessarily the general case.

Your fix works, but may hide other real problems. I was thinking that another way to fix this is to change the test a bit. For example, we expect CompleteTxnOperation to happen after CommitTxnOffsetsOperation. So, instead of letting them run in parallel, we can change the test to make sure that CompleteTxnOperation only runs after CommitTxnOffsetsOperation completes successfully. JoinGroupOperation and SyncGroupOperation might need a similar consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we expect CompleteTxnOperation to happen after CommitTxnOffsetsOperation. So, instead of letting them run in parallel, we can change the test to make sure that CompleteTxnOperation only runs after CommitTxnOffsetsOperation completes successfully.

will roger that !

JoinGroupOperation and SyncGroupOperation might need a similar consideration.

I didn't notice something interesting. Could you share it with me?

@@ -536,6 +537,11 @@ class GroupCoordinatorTest {
// Make sure the NewMemberTimeout is not still in effect, and the member is not kicked
assertEquals(1, group.size)

// prepare the mock replica manager again since the delayed join is going to complete
EasyMock.reset(replicaManager)
EasyMock.expect(replicaManager.getMagic(EasyMock.anyObject())).andReturn(Some(RecordBatch.MAGIC_VALUE_V1)).anyTimes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why do we need to mock this since replicaManager.getMagic() is only called through replicaManager.handleWriteTxnMarkersRequest()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GroupMetadataManager#storeGroup (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L245) also call ReplicaManager.getMagic.

There are delayed ops are completed by timer.advanceClock so we have to mock the replicaManager.getMagic. the mock is same to https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/coordinator/group/GroupCoordinatorTest.scala#L3823.

@chia7712
Copy link
Contributor Author

Another way that doesn't require checking lock.isHeldByCurrentThread is the following. But your approach seems simpler.

So... could we keep it simpler?

@chia7712 chia7712 force-pushed the fix_8334_avoid_deadlock branch 2 times, most recently from 4a9cbc9 to d8beeab Compare June 15, 2020 06:37
@chia7712
Copy link
Contributor Author

kafka.admin.ReassignPartitionsUnitTest > testModifyBrokerThrottles FAILED

the flaky is traced by #8853

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updated PR. Just one comment below. Also, there are a few comments not addressed from the previous round.

It will be helpful if you could preserve the commit history in future updates to the PR since that makes it easier to identify the delta changes.

@chia7712
Copy link
Contributor Author

It will be helpful if you could preserve the commit history in future updates to the PR since that makes it easier to identify the delta changes.

my bad :(

I'll keep that in mind

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updated PR. Added a few more comments below.

core/src/main/scala/kafka/server/DelayedOperation.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/DelayedOperation.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/ReplicaManager.scala Outdated Show resolved Hide resolved
offsetsPartitions.map(_.partition).toSet, isCommit = isCommit)
catch {
case e: IllegalStateException if isCommit
&& e.getMessage.contains("though the offset commit record itself hasn't been appended to the log")=>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great explanation. I understand the issue now. Essentially, this exposed a limitation of the existing test. The existing test happens to work because the producer callbacks are always completed in the same ReplicaManager.appendRecords() call under the group lock. However, this is not necessarily the general case.

Your fix works, but may hide other real problems. I was thinking that another way to fix this is to change the test a bit. For example, we expect CompleteTxnOperation to happen after CommitTxnOffsetsOperation. So, instead of letting them run in parallel, we can change the test to make sure that CompleteTxnOperation only runs after CommitTxnOffsetsOperation completes successfully. JoinGroupOperation and SyncGroupOperation might need a similar consideration.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updates PR. Just a few more comments below.

core/src/main/scala/kafka/cluster/Partition.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/server/ReplicaManager.scala Outdated Show resolved Hide resolved
groupCoordinator.groupManager.addPartitionOwnership(groupPartitionId)
val lock = new ReentrantLock()
val producerId = producerIdCount
producerIdCount += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention for the test is probably to use the same producerId since it tests more on transactional conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. However, the same producerId means the group completed by CompleteTxnOperation is possible to be impacted by any CommitTxnOffsetsOperation (since the partitions are same also). Hence, the side-effect is that we need a single lock to control the happen-before of txn completion and commit so the test will get slower.

@chia7712
Copy link
Contributor Author

chia7712 commented Sep 7, 2020

@junrao Thanks for all reviews again 👍

Do you plan to remove some of the unused methods in DelayedOperations in Partition?

my bad. I forgot this request :(

Expect for checkAndCompleteFetch, the other unused methods (in production scope) are removed by this PR.

Currently, when calling checkAndComplete() for the produce/fetch/deleteRecords purgatory, we still hold replicaStateChangeLock. This doesn't seem to cause any deadlock for now. In the future, we can potentially improve this by calling checkAndComplete() outside of the replicaStateChangeLock by passing leader epoch into those delayed operations and checking if leader epoch has changed in tryComplete().

It seems we can remove delayedOperations fromPartition. That is similar to this PR and Partition SHOULD NOT complete delayed request anymore. I can take over this in separate PR :)

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updated PR. Just a few more minor comments.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the new update. A few more minor comments.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the updated PR. A few more minor comments below.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 : Thanks for the latest changes. LGTM.

Latest system result has 1 failure.
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2020-09-08--001.1599611744--chia7712--fix_8334_avoid_deadlock--fbd46565a/report.html

Also, are the jenkins test failures related to this PR?

@junrao
Copy link
Contributor

junrao commented Sep 9, 2020

@ijuma @hachikuji @rajinisivaram : I think this PR is ready to be merged. Any further comments from you?

@chia7712
Copy link
Contributor Author

chia7712 commented Sep 9, 2020

Build / JDK 15 / kafka.network.ConnectionQuotasTest.testNoConnectionLimitsByDefault
Build / JDK 11 / kafka.network.DynamicConnectionQuotaTest.testDynamicListenerConnectionCreationRateQuota
Build / JDK 11 / org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta[true]
Module: kafkatest.tests.connect.connect_distributed_test
Class:  ConnectDistributedTest
Method: test_bounce
Arguments:
{
  "clean": true,
  "connect_protocol": "sessioned"
}

On my local, they are flaky on trunk branch.

@junrao junrao merged commit c2273ad into apache:trunk Sep 9, 2020
@junrao
Copy link
Contributor

junrao commented Sep 9, 2020

@chia7712 : Thanks a lot for staying on this tricky issue and finding a simpler solution!

@chia7712
Copy link
Contributor Author

Thanks a lot for staying on this tricky issue and finding a simpler solution!

thanks for all suggestions. I benefit a lot from it.

@chia7712 chia7712 deleted the fix_8334_avoid_deadlock branch March 25, 2024 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants