Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplex all produce requests into a single Producer. #787

Merged
merged 18 commits into from Jan 19, 2021
Merged

Multiplex all produce requests into a single Producer. #787

merged 18 commits into from Jan 19, 2021

Conversation

rigelbm
Copy link
Contributor

@rigelbm rigelbm commented Jan 7, 2021

Background: Currently the way V2 produce works internally is that REST Proxy keeps one producer for each serialization format (BINARY, JSON, AVRO, JSONSCHEMA, PROTOBUF). Based on the format of the incoming request, it will decide to which producer to send the request to.

Problem: This works "well" for V2 because in V2, both key and value must use the same serialization format. I say "well" in quotes because it still requires maintaining 5 producers up, but that is still manageable. In V3, unfortunately, that won't do. One of the design differences of produce V3 is that it decouples the serialization format of key and values. If we were to follow the existing design, we would need to keep 25 (5x5) producers, which is not managleable.

Solution: This PR changes the produce internals to use a single producer instead, a byte-array producer. To achieve it, we change REST Proxy to serialize the records BEFORE sending them to the producer, instead of handing the producer the unserialized record for the producer itself to serialize.

Detailed Solution:

The existing flow looks something like:

  1. schema = getSchema(request) // this is only done for AVRO, JSONSCHEMA and PROTOBUF
  2. producer = getProducer(request)
  3. producer.producer(schema, request)

The new flow looks like:

  1. schema = getSchema(request) // this is only done for AVRO, JSONSCHEMA and PROTOBUF
  2. serialized = serializeRecord(schema, request)
  3. binaryProducer.produce(serialized)

Step (1) is handled by SchemaManager, step (2) is handled by RecordSerializerFacade, and step (3) is handled by ProduceController.

Testing: Great care has been taken to achieve exact same behaviour, since we are basically reimplementing the entire V2 produce API. All the integration tests (produce is arguably the most tested feature on REST Proxy) are largely unchanged. SchemaManager, RecordSerializer and ProduceController are throrougly unit-tested for parity with the now deleted ProducerPoll, NoSchemaRestProducer and SchemaRestProducer.

@rigelbm rigelbm requested a review from mhowlett January 7, 2021 20:07
@rigelbm
Copy link
Contributor Author

rigelbm commented Jan 14, 2021

After investigating deeper into how the Kafka(Avro|JsonSchema|Protobuf)Serializer are implemented, I realised the V2 implementation that I basically mimicked was not very good, and that I could do it much simpler by using SchemaRegistryClient directly (instead of via the serializers). So I just had to rewrite SchemaManagerImpl entirely. Sorry for the extra burden. Can you PTAL? Happy news: the cache is gone (SchemaRegistryClient already has a internal cache).

@rigelbm rigelbm merged commit 157a5b1 into confluentinc:master Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants