Skip to content

Conversation

@poornachandra
Copy link
Contributor

@poornachandra poornachandra commented Apr 12, 2018

JIRA - https://issues.cask.co/browse/CDAP-13280
Build - Ran mvn clean package locally

  • Adds support for Spark 2 streaming Kafka source
    • Upgrade Kafka client for Spark 2 to 0.10.2.0, since Spark 2 needs 0.10 Kafka client for streaming
  • Add test cases for Kafka streaming source, Kafka batch sink and Kafka alerts publisher
  • Fix some dependencies in POM
  • Update documentation
  • Add screenshots and icons

Note: Majority of the change is due to renaming kafka-plugins-0.9 module to kafka-plugins-0.10

Copy link
Contributor

@yaojiefeng yaojiefeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments.
Also wondering if we should keep kafka 9 module, since this kafka 9 client will not be able to read from kafka 10


## License and Trademarks

Copyright © 2017 Cask Data, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


## License and Trademarks

Copyright © 2017 Cask Data, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 2018

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

producer.send(record);
} catch (Exception e) {
// catch the exception and continue processing rest of the alerts
LOG.error("Exception while emitting alert {}", alert, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we limit the logs we sent, since we send the record one by one to kafka. If something breaks, this will flood the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a JIRA to do this later since the log sampling classes are not directly usable now - https://issues.cask.co/browse/CDAP-13479

import co.cask.hydrator.common.KeyValueListParser;
import co.cask.hydrator.common.ReferenceBatchSink;
import co.cask.hydrator.common.ReferencePluginConfig;
import co.cask.hydrator.plugin.common.KafkaHelpers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not related to this pr, but I think we should name this class to KafkaBatchSink to be consistent with the KafkaBatchSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


public KafkaConfig(String referenceName, String brokers, String topic, String partitions,
String initialPartitionOffsets, Long defaultInitialOffset, String schema, String format,
String timeField, String keyField, String partitionField, String offsetField) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this constructor is not used. But should we have principal and keytab location in the parameters since they are now fields in the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor already has too many parameters, and the constructor is not used anywhere. So I just removed the constructors.

producer.send(record);
} catch (Exception e) {
// catch the exception and continue processing rest of the alerts
LOG.error("Exception while emitting alert {}", alert, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we limit the logs here since if something bad happens sending the alerts, the message may flood the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment

import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.Text;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it is not related to this pr, I think we should name this class as KafkaBatchSink to be consistent with the batch source, otherwise it will confuse people why this is called Kafka

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

kafkaParams.put("value.deserializer", ByteArrayDeserializer.class.getCanonicalName());
KafkaHelpers.setupKerberosLogin(kafkaParams, conf.getPrincipal(), conf.getKeytabLocation());
// Create a unique string for the group.id using the pipeline name and the topic
kafkaParams.put("group.id", Joiner.on("-").join(context.getPipelineName().length(), conf.getTopic().length(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about this property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (Map.Entry<TopicPartition, Long> entry : offsets.entrySet()) {
TopicPartition topicAndPartition = entry.getKey();
Long offset = entry.getValue();
if (offset == OffsetRequest.EarliestTime()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this logic. The offsets can have -1 and -2 as their value, if that is the case, this will never match right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OffsetRequest.EarliestTime() always returns -2 and OffsetRequest.LatestTime() always returns -1, so they will match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ClassRule
public static TemporaryFolder tmpFolder = new TemporaryFolder();


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@poornachandra
Copy link
Contributor Author

@yaojiefeng I have addressed the comments, please take another look.

Since not many people are using Kafka 0.9, I think it is expensive to have support for it. The Kafka 0.8 client can be still used for non-Kerberos Kafka 0.9 servers. Just that Kerberos support is not available. We can revisit this decision later if needed.

Copy link
Contributor

@yaojiefeng yaojiefeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments about import, rest LGTM

import java.util.Set;
import javax.annotation.Nullable;
import java.io.IOException;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use specific imports instead of using *

import java.util.Map;
import java.util.Set;
import javax.annotation.Nullable;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use specific imports instead of using *

import java.util.Map;
import java.util.Properties;
import java.util.Set;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use specific imports instead of using *

import org.junit.BeforeClass;
import org.junit.ClassRule;
import org.junit.Test;
import org.junit.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use specific imports instead of using *

Copy link
Contributor

@yaojiefeng yaojiefeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@poornachandra poornachandra merged commit c19e181 into release/1.8 Jun 14, 2018
@poornachandra poornachandra deleted the feature/CDAP-13280-kafka-spark2-streaming branch June 14, 2018 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants