[pip][design] PIP-274: Support pluggable topic compactor #20493

coderzc · 2023-06-05T13:35:13Z

Motivation

This is a PIP to enable support pluggable compactor. The PR contents have the motivation.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

poorbarcode · 2023-06-05T14:19:08Z

pip/pip-274.md

+-->
+
+
+### Configuration


broker.conf

compactorClassName=org.apache.pulsar.compaction.TwoPhaseCompactor

If I want namespace A to apply policy TwoPhaseCompactor, but namespace B applies policy KafkaCompactor, how can I configure it?

Do not support the namespace level configuration. This is a global configuration.
By the way, since a topic may have Kafka Format and Pulsar Format at the same time, you need to be compatible with Pulsar Format in KafkaCompactor.

If I have a Pulsar cluster and there are some topics on this cluster, and we need to migrate Kafka to Pulsar now, how can I do?

since a topic may have Kafka Format and Pulsar Format at the same time, you need to be compatible with Pulsar Format in KafkaCompactor.

Then, you copied the logic from the existing compactor? :(
IMO, I wish we could support ns or topic-level configuration to avoid someday we will implement a powerful compactor. (one compactor supports all of protocol)

If I have a Pulsar cluster and there are some topics on this cluster, and we need to migrate Kafka to Pulsar now, how can I do?

For KafkaCompactor, you need to deal with Kafka Format and Pulsar Format, Kop can send Pulsar Format or Kafka Format

Did you change the interface to support all functions of TwoPhaseCompactor?

Did you change the interface to support all functions of TwoPhaseCompactor?

Support all functions of TwoPhaseCompactor is not a required option, this is just a requirement in the case of Kop

For KafkaCompactor, you need to deal with Kafka Format and Pulsar Format, Kop can send Pulsar Format or Kafka Format
Support all functions of TwoPhaseCompactor is not a required option, this is just a requirement in the case of Kop

I have three questions:

The format rules of "Pulsar Format" and existing pulsar are exactly the same, right?

KafkaCompactor can handle both Kafka Format and Pulsar Format, right?^[2]

If not^[2], when users want to migrate Kafka to an existing Pulsar cluster, how can they do it?

The format rules of "Pulsar Format" and existing pulsar are exactly the same, right?

Yes

KafkaCompactor can handle both Kafka Format and Pulsar Format, right?

Yes

If not[2], when users want to migrate Kafka to an existing Pulsar cluster, how can they do it?

If KafkacomPactor is not implemented, you may need to use Pulsar Format in Kop to produce messages, and Kop will automatically transform Kafka Format to Pulsar Format.

BewareMyPower · 2023-06-07T07:25:38Z

pip/pip-274.md

+                    BookKeeper bk,
+                    ScheduledExecutorService scheduler);


It's not a good abstraction to have BookKeeper (or even PulsarClient) involved. These parameters are just copied from the existing Compactor's constructor, which is not abstracted well.

If the motivation is to handle messages whose format is other than pulsar, we might need to pass some functions like:

interface CompactEntryProcessor { // Just an example method void handle(Entry entry); }

and pass the processor to the compactor.

@BewareMyPower Do you mean to abstract the CompactEntryProcessor and add a compactEntryProcessorClassName configuration?
But we still need to construct this CompactEntryProcessor by BookKeeper.

How about abstraction the CompactStorage like:

interface CompactionHandle { CompletableFuture<Void> asyncAddEntry(ByteBuf entry); CompletableFuture<List<ByteBuf>> asyncReadEntries(long from, long end) CompletableFuture<Long> getId(); CompletableFuture<Void> flush(); CompletableFuture<Void> close(); } public interface CompactStorage { CompletableFuture<CompactionHandle> createCompactionStorage(); void deleteCompactionStorage(CompactionHandle compactionHandle); }

No. You understood wrong. Adding these two interfaces is complicated for users to implement. I mean, we only need ways to handle entries of different format, so users don't need to implement the same logic of reading entries or writing entries again.

For example, we only need to replace

if (RawBatchConverter.isReadableBatch(m)) { // ... } else { // ... }

with

processor.handle(m, latestForKey);

Where the processor is an interface:

interface CompactMessageProcessor { /** * Process the message to update the latest key => value map. * * @param message the raw message * @param latestForKey the map that maps the key to the latest value */ void handle(RawMessage message, Map<String, MessageId> latestForKey); }

Then, we don't need to rewrite the same logic again. We only need to change the logic about how to process a RawMessage.

asafm · 2023-06-07T08:54:12Z

Can you please follow https://github.com/apache/pulsar/blob/master/pip/README.md regarding the PR title and get the number of the pip?

asafm · 2023-06-07T10:06:09Z

pip/pip-274.md

+
+    CompletableFuture<Long> compact(String topic);
+
+    CompactorMXBean getStats();


I think CompactorMXBean is a very specific abstraction for metrics. It doesn't allow a compactor to add it's own metrics and will make future update to metrics hard.

First, some metrics are going to be common for any compactor used:

CompactionSucceedCount - how many compactions have been successful

CompactionFailedCount - ... failed

CompactionDurationInMillis - a counter counting how much was spent in compactions so far

Those metrics should be maintained by the service, which calls the compactor interface of compact(topic. It only needs to know if the compaction was a success or not based on the return value.

I suggest using two interfaces which will be given in initialize():

interface CompactionMetricsRecorder { TopicCompactionMetricsRecorder provideTopicCompactionMetricsRecorder(String topic) } interface TopicCompactionMetricsRecorder { recordMessageRemoved() recordMessagesWritten(numOfMessages, messagesSizeInBytes, duration, durationUnit) recordMessagesRead(numOfMessages, messagesSizeInBytes, duration, durationUnit) ... }

There are some metrics I still need to figure out where to fit since I need to help understand something. The metrics are:

compactedEntriesCount - how many entries were written in the last compaction

compactedEntriesSize - what's the total size of entries written in the last compaction

Once the compact() has ended, how does the compactor hand over the details on the output - may be the ledger ID, new compaction horizon, etc? I don't see that in the interface.
The reason I'm asking is that maybe other compactors would do incremental compaction, so it changes what to expect from it.

Now, what's left is a way to record custom metrics specific to your compactor.
Need to think about it

It seems using CompactionMetricsRecorder instead of CompactorMXBean still unresolved other compactor can't add it's own metrics. Can you explain the benefits of doing this?

For CompactionSucceedCount/CompactionFailedCount /CompactionDurationInMillis , I think they already contain in AbstractCompactor other compactors only need to extend from it.

Extending is the worst. We should keep interfacing only.

The interface creates a clean design that makes it easier to change in the future to some other mechanism (say OTel...). Somebody external to Pulsar needs to be able to depend as little as possible on Pulsar. CompactorMXBean is the opposite of that. First, nobody is using MXBean, and it has a different meaning (it was named like this to be integrated into JMX back in 2012). Second, we need to separate the metrics as I wrote.

Today adding metrics that are high cardinality (contain topic label) is done by writing them directly to SimpleTextOutputStream. You usually add a method for that, and you call it from the PrometheusMetricsGenerator.

Thinking out loud - maybe we can add a method to CompactionMetricsRecorder

void writeCustomMetrics(SimpleTextOutputStream)

WDYT?

asafm · 2023-06-07T10:13:05Z

pip/pip-274.md

+
+## Public-facing Changes
+
+<!--


I think it's ok to remove all comments if you're done writing

asafm · 2023-06-07T10:14:39Z

pip/pip-274.md

+-->
+
+
+### Configuration


Did you change the interface to support all functions of TwoPhaseCompactor?

asafm · 2023-06-07T13:56:07Z

@coderzc Have you seen you are colliding with another PIP?

coderzc · 2023-06-21T02:15:03Z

Close this PIP, I opened PIP-278 to discuss related topics.

PIP-274: Support pluggable compactor

00e2141

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jun 5, 2023

coderzc changed the title ~~[feat][doc] PIP-274: Support pluggable compactor~~ [feat][doc] PIP-274: Support pluggable topic compactor Jun 5, 2023

Add discussion thread

b7a57fb

poorbarcode reviewed Jun 5, 2023

View reviewed changes

BewareMyPower reviewed Jun 7, 2023

View reviewed changes

coderzc changed the title ~~[feat][doc] PIP-274: Support pluggable topic compactor~~ [pip][design] PIP-274: Support pluggable topic compactor Jun 7, 2023

asafm reviewed Jun 7, 2023

View reviewed changes

codelipenghui assigned coderzc Jun 7, 2023

coderzc marked this pull request as draft June 8, 2023 09:35

coderzc closed this Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pip][design] PIP-274: Support pluggable topic compactor #20493

[pip][design] PIP-274: Support pluggable topic compactor #20493

coderzc commented Jun 5, 2023

poorbarcode Jun 5, 2023 •

edited

coderzc Jun 6, 2023

poorbarcode Jun 6, 2023

mattisonchao Jun 7, 2023

coderzc Jun 7, 2023

asafm Jun 7, 2023

coderzc Jun 7, 2023 •

edited

asafm Jun 7, 2023

poorbarcode Jun 7, 2023 •

edited

coderzc Jun 15, 2023

BewareMyPower Jun 7, 2023

BewareMyPower Jun 7, 2023

coderzc Jun 7, 2023 •

edited

BewareMyPower Jun 8, 2023

asafm commented Jun 7, 2023

asafm Jun 7, 2023

coderzc Jun 7, 2023

coderzc Jun 7, 2023

asafm Jun 7, 2023

asafm Jun 7, 2023

asafm Jun 7, 2023

asafm commented Jun 7, 2023

coderzc commented Jun 21, 2023


		CompletableFuture<Long> compact(String topic);

		CompactorMXBean getStats();

		-->


		### Configuration

		-->


		### Configuration

[pip][design] PIP-274: Support pluggable topic compactor #20493

[pip][design] PIP-274: Support pluggable topic compactor #20493

Conversation

coderzc commented Jun 5, 2023

Documentation

Matching PR in forked repository

poorbarcode Jun 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderzc Jun 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poorbarcode Jun 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderzc Jun 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asafm commented Jun 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asafm commented Jun 7, 2023

coderzc commented Jun 21, 2023

poorbarcode Jun 5, 2023 •

edited

coderzc Jun 7, 2023 •

edited

poorbarcode Jun 7, 2023 •

edited

coderzc Jun 7, 2023 •

edited