CDAP-13280 Add Spark 2 streaming Kafka source #25

poornachandra · 2018-04-12T19:17:20Z

JIRA - https://issues.cask.co/browse/CDAP-13280
Build - Ran mvn clean package locally

Adds support for Spark 2 streaming Kafka source
- Upgrade Kafka client for Spark 2 to 0.10.2.0, since Spark 2 needs 0.10 Kafka client for streaming
Add test cases for Kafka streaming source, Kafka batch sink and Kafka alerts publisher
Fix some dependencies in POM
Update documentation
Add screenshots and icons

Note: Majority of the change is due to renaming kafka-plugins-0.9 module to kafka-plugins-0.10

yaojiefeng

Just some small comments.
Also wondering if we should keep kafka 9 module, since this kafka 9 client will not be able to read from kafka 10

yaojiefeng · 2018-06-01T18:47:19Z

kafka-plugins-0.10/docs/KAFKASOURCE.md

+
+## License and Trademarks
+
+Copyright © 2017 Cask Data, Inc.


This should be 2018

yaojiefeng · 2018-06-01T19:02:47Z

kafka-plugins-0.10/docs/Kafka-alert-publisher.md

+
+## License and Trademarks
+
+Copyright © 2017 Cask Data, Inc.


This should be 2018

yaojiefeng · 2018-06-01T23:02:30Z

...a-plugins-0.10/src/main/java/co/cask/hydrator/plugin/alertpublisher/KafkaAlertPublisher.java

+        producer.send(record);
+      } catch (Exception e) {
+        // catch the exception and continue processing rest of the alerts
+        LOG.error("Exception while emitting alert {}", alert, e);


Should we limit the logs we sent, since we send the record one by one to kafka. If something breaks, this will flood the log

Filed a JIRA to do this later since the log sampling classes are not directly usable now - https://issues.cask.co/browse/CDAP-13479

yaojiefeng · 2018-06-01T23:10:25Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/sink/Kafka.java

 import co.cask.hydrator.common.KeyValueListParser;
 import co.cask.hydrator.common.ReferenceBatchSink;
 import co.cask.hydrator.common.ReferencePluginConfig;
+import co.cask.hydrator.plugin.common.KafkaHelpers;


It is not related to this pr, but I think we should name this class to KafkaBatchSink to be consistent with the KafkaBatchSource.

yaojiefeng · 2018-06-01T23:13:45Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/source/KafkaConfig.java

+
+  public KafkaConfig(String referenceName, String brokers, String topic, String partitions,
+                     String initialPartitionOffsets, Long defaultInitialOffset, String schema, String format,
+                     String timeField, String keyField, String partitionField, String offsetField) {


I know this constructor is not used. But should we have principal and keytab location in the parameters since they are now fields in the config

The constructor already has too many parameters, and the constructor is not used anywhere. So I just removed the constructors.

yaojiefeng · 2018-06-01T23:26:46Z

...a-plugins-0.10/src/main/java/co/cask/hydrator/plugin/alertpublisher/KafkaAlertPublisher.java

+        producer.send(record);
+      } catch (Exception e) {
+        // catch the exception and continue processing rest of the alerts
+        LOG.error("Exception while emitting alert {}", alert, e);


Should we limit the logs here since if something bad happens sending the alerts, the message may flood the log

Duplicate comment

yaojiefeng · 2018-06-01T23:29:30Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/sink/Kafka.java

 import org.apache.commons.lang3.StringUtils;
 import org.apache.hadoop.io.Text;
+import org.apache.kafka.clients.producer.ProducerConfig;
+import org.apache.kafka.common.serialization.StringSerializer;


Though it is not related to this pr, I think we should name this class as KafkaBatchSink to be consistent with the batch source, otherwise it will confuse people why this is called Kafka

yaojiefeng · 2018-06-01T23:39:01Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/source/KafkaStreamingSource.java

+    kafkaParams.put("value.deserializer", ByteArrayDeserializer.class.getCanonicalName());
+    KafkaHelpers.setupKerberosLogin(kafkaParams, conf.getPrincipal(), conf.getKeytabLocation());
+    // Create a unique string for the group.id using the pipeline name and the topic
+    kafkaParams.put("group.id", Joiner.on("-").join(context.getPipelineName().length(), conf.getTopic().length(),


Can you add a comment about this property?

yaojiefeng · 2018-06-02T00:03:22Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/source/KafkaStreamingSource.java

+      for (Map.Entry<TopicPartition, Long> entry : offsets.entrySet()) {
+        TopicPartition topicAndPartition = entry.getKey();
+        Long offset = entry.getValue();
+        if (offset == OffsetRequest.EarliestTime()) {


I don't understand this logic. The offsets can have -1 and -2 as their value, if that is the case, this will never match right?

OffsetRequest.EarliestTime() always returns -2 and OffsetRequest.LatestTime() always returns -1, so they will match.

yaojiefeng · 2018-06-02T00:37:42Z

kafka-plugins-0.10/src/test/java/co/cask/hydrator/KafkaStreamingSourceTest.java

+  @ClassRule
+  public static TemporaryFolder tmpFolder = new TemporaryFolder();
+
+


extra new line

poornachandra · 2018-06-04T21:51:08Z

@yaojiefeng I have addressed the comments, please take another look.

Since not many people are using Kafka 0.9, I think it is expensive to have support for it. The Kafka 0.8 client can be still used for non-Kerberos Kafka 0.9 servers. Just that Kerberos support is not available. We can revisit this decision later if needed.

yaojiefeng

Just a few comments about import, rest LGTM

yaojiefeng · 2018-06-05T16:32:25Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/batchSource/KafkaBatchSource.java

-import java.util.Set;
 import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;


We should use specific imports instead of using *

yaojiefeng · 2018-06-05T16:38:50Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/source/KafkaConfig.java

-import java.util.Map;
-import java.util.Set;
-import javax.annotation.Nullable;
+import java.util.*;


We should use specific imports instead of using *

yaojiefeng · 2018-06-05T16:39:00Z

kafka-plugins-0.10/src/main/java/co/cask/hydrator/plugin/source/KafkaStreamingSource.java

-import java.util.Map;
-import java.util.Properties;
-import java.util.Set;
+import java.util.*;


We should use specific imports instead of using *

yaojiefeng · 2018-06-05T16:39:14Z

kafka-plugins-0.10/src/test/java/co/cask/hydrator/KafkaSinkAndAlertsPublisherTest.java

-import org.junit.BeforeClass;
-import org.junit.ClassRule;
-import org.junit.Test;
+import org.junit.*;


We should use specific imports instead of using *

yaojiefeng

lgtm

CDAP-13280 Add Spark 2 streaming Kafka source

3fb7e36

poornachandra requested a review from yaojiefeng April 12, 2018 19:17

yaojiefeng reviewed Jun 2, 2018

View reviewed changes

Address review comments

d91236f

yaojiefeng approved these changes Jun 5, 2018

View reviewed changes

Poorna Chandra added 3 commits June 5, 2018 17:14

Add SASL config to Kafka output format

c9c3699

Fix imports

eed1b99

Replace Kafka constant to prevent class not found exception

1e7a9d5

yaojiefeng approved these changes Jun 14, 2018

View reviewed changes

poornachandra merged commit c19e181 into release/1.8 Jun 14, 2018

poornachandra deleted the feature/CDAP-13280-kafka-spark2-streaming branch June 14, 2018 23:15

		@ClassRule
		public static TemporaryFolder tmpFolder = new TemporaryFolder();

CDAP-13280 Add Spark 2 streaming Kafka source #25

CDAP-13280 Add Spark 2 streaming Kafka source #25

Uh oh!

Conversation

poornachandra commented Apr 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaojiefeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poornachandra commented Jun 4, 2018

Uh oh!

yaojiefeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaojiefeng left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

poornachandra commented Apr 12, 2018 •

edited

Loading