NIFI-9822 - ConsumeKafkaRecord allows writing out Kafka record key by greyp9 · Pull Request #6045 · apache/nifi

greyp9 · 2022-05-13T19:06:38Z

Summary

NIFI-9822

ConsumeKafkaRecord allows writing out Kafka record key

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using mvn clean install -P contrib-check
- JDK 8
- JDK 11
- JDK 17

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

greyp9 · 2022-05-13T19:10:17Z

@markap14 , @pvillard31

Wanted to vet an initial implementation of this ticket, including a first pass at documentation. It seems to work in the happy path case. I would like to wait on error handling, unit tests, and (properly formatted) documentation until you guys like the approach.

markap14

Hey @greyp9 thanks for the updates! I think this is a good direction. I left several comments inline of things that I noticed. Some of these you may well already have realized given that you said you need to do more work on tests & docs. Some are minor recommendations that you can accept or ignore.

I do think we need to improve the Publisher a bit more, though. I think the publisher needs a new property that tells it whether the incoming data should be sent as-is, as a single record, or if the incoming data is in the "Wrapper" format. And only expose the Key Record Writer if using the Wrapper format. That way, we can easily receive a message from ConsumeKafkaRecord, do some processing, etc. and then push to another kafka topic, for instance, and easily retain the headers, the key, and the value.

markap14 · 2022-05-17T12:46:34Z

...processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumeKafkaRecord_2_6.java

@@ -426,11 +465,13 @@ protected ConsumerPool createConsumerPool(final ProcessContext context, final Co
            }

            return new ConsumerPool(maxLeases, readerFactory, writerFactory, props, topics, maxUncommittedTime, securityProtocol,


Doesn't necessarily need to be done in this ticket. But probably makes sense to introduce a Builder pattern here instead of so many constructor args. It made sense before this, too, though :)

markap14 · 2022-05-17T12:52:35Z

...afka-2-6-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java

+            final InputStream is = new ByteArrayInputStream(key);
+            final RecordReader reader = keyReaderFactory.createRecordReader(attributes, is, key.length, logger);


Should use try-with-resources here to ensure that we close the InputStream and the Record Reader.

markap14 · 2022-05-17T12:55:27Z

...afka-2-6-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java

+            throws IOException, SchemaNotFoundException, MalformedRecordException {
+        final Tuple<RecordField, Object> tuple;
+        final byte[] key = consumerRecord.key() == null ? new byte[0] : consumerRecord.key();
+        if (KafkaProcessorUtils.RECORD.getValue().equals(keyFormat)) {


Took me a minute to figure out what this mean - KafkaProcessorUtils.RECORD wasn't immediately obvious to me. Perhaps it makes sense to rename RECORD, STRING, etc. to something that makes more sense outside the context, such as KEY_AS_RECORD, KEY_AS_STRING etc.? Is a bit of a nitpick and you can feel free to ignore if you want.

markap14 · 2022-05-17T12:57:16Z

...afka-2-6-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java

+            tuple = new Tuple<>(recordField, record);
+        } else if (KafkaProcessorUtils.STRING.getValue().equals(keyFormat)) {
+            final RecordField recordField = new RecordField("key", RecordFieldType.STRING.getDataType());
+            tuple = new Tuple<>(recordField, new String(key, StandardCharsets.UTF_8));


We'll definitely want to make sure that we document that this strategy requires that the key be a UTF-8 compatible String. And we should probably ensure that we test with a non-UTF-8 compatible String. In that case, the record should probably go to the parse.failure relationship.

markap14 · 2022-05-17T12:59:16Z

...afka-2-6-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java

+        } else {
+            final RecordField recordField = new RecordField("key",
+                    RecordFieldType.ARRAY.getArrayDataType(RecordFieldType.BYTE.getDataType()));
+            tuple = new Tuple<>(recordField, key);


Eventually we need to introduce a BYTES data type for Records. Right now, when we have an Array of type Byte, the record api expects this to be an array of Byte objects, not primitive bytes. So, as inefficient as it is, in this case, I think we need to create a Byte[] for the key instead of provide the byte[].

markap14 · 2022-05-17T13:12:48Z

...fka-2-6-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/PublisherLease.java

+                            .map(s -> s.getBytes(StandardCharsets.UTF_8)).orElse(null);
+                } else {
+                    final ByteArrayOutputStream os = new ByteArrayOutputStream(1024);
+                    final MapRecord keyRecord = (MapRecord) record.getValue(messageKeyField);


Should probably be using Record here - not MapRecord

markap14 · 2022-05-17T13:15:18Z

...s/docs/org.apache.nifi.processors.kafka.pubsub.ConsumeKafkaRecord_2_6/additionalDetails.html

+- Output Strategy "Write Value Only" (the default) emits flowfile records containing only the Kafka record value.
+- Output Strategy "Use Wrapper" (new) emits flowfile records containing the Kafka record key, value, and headers, as
+well as additional metadata from the Kafka record.</p>


Should probably use <ul> with <li> rather than - for denoting lists.

markap14 · 2022-05-17T13:16:22Z

...s/docs/org.apache.nifi.processors.kafka.pubsub.ConsumeKafkaRecord_2_6/additionalDetails.html

+<p>This processor (NiFi 1.17+) offers multiple output strategies (configured via processor property "Output Strategy")
+for converting Kafka records into flow files.
+- Output Strategy "Write Value Only" (the default) emits flowfile records containing only the Kafka record value.
+- Output Strategy "Use Wrapper" (new) emits flowfile records containing the Kafka record key, value, and headers, as


We need to be sure that we call out the Record Schema that will be used here.

markap14 · 2022-05-17T13:18:48Z

...s/docs/org.apache.nifi.processors.kafka.pubsub.ConsumeKafkaRecord_2_6/additionalDetails.html

+
+<p>Additionally, the choice of the "Output Strategy" processor property affects the related properties "Headers to Add
+as Attributes (Regex)" and "Key Attribute Encoding".  These properties are available only when "Output Strategy" is set
+to "Write Value Only".</p>


Might make sense to mention the reason they are only available when Output Strategy = Write Value Only. I.e., because it doesn't make sense when writing Records, as the Headers and keys are not attributes, they are part of the Record/FlowFile content.

markap14 · 2022-05-18T16:05:43Z

...processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumeKafkaRecord_2_6.java

        descriptors.add(KafkaProcessorUtils.TOKEN_AUTH);
        descriptors.add(KafkaProcessorUtils.SSL_CONTEXT_SERVICE);
        descriptors.add(SEPARATE_BY_KEY);
+        descriptors.add(OUTPUT_STRATEGY);


Output Strategy (and related properties) is going to be a very important thing for the user to think through when configuring this. Because of that, I'd recommend moving this property up in the list to just after Group ID

greyp9 · 2022-06-16T16:04:32Z

Closing in favor of #6131.

markap14 reviewed May 18, 2022

View reviewed changes

NIFI-9822 - ConsumeKafkaRecord allows writing out Kafka record key

25d1ade

greyp9 force-pushed the NIFI-9822-B branch from fee73be to 25d1ade Compare June 3, 2022 21:27

greyp9 added 2 commits June 9, 2022 19:09

NIFI-9822 - ConsumeKafkaRecord allows writing out Kafka record key

573c7dd

NIFI-9822 - ConsumeKafkaRecord allows writing out Kafka record key

0672d12

greyp9 closed this Jun 16, 2022

		@@ -426,11 +465,13 @@ protected ConsumerPool createConsumerPool(final ProcessContext context, final Co
		}

		return new ConsumerPool(maxLeases, readerFactory, writerFactory, props, topics, maxUncommittedTime, securityProtocol,

		final InputStream is = new ByteArrayInputStream(key);
		final RecordReader reader = keyReaderFactory.createRecordReader(attributes, is, key.length, logger);

Conversation

greyp9 commented May 13, 2022

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

greyp9 commented May 13, 2022

Uh oh!

markap14 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greyp9 commented Jun 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants