This charm will install a specific Kafka application dedicated to consuming the Twitter Streaming API and send all messages to a Kafka broker.
Note that this application is useless if not used in combination with a Kafka Server / Broker cluster
In essence, you will need a Kafka cluster to make it run. As Kafka requires a Zookeeper cluster to run, we are also going to deploy that.
Your first taks will be to select your favourite Hadoop distribution and select the Zookeeper from that distribution. The below example shows a vanilla version, but hdp-zookeeper would work.
:~$ juju deploy trusty/zookeeper :~$ juju deploy trusty/kafka :~$ juju add-relation kafka zookeeper
This will make sure you have a cluster available with an up & running Kafka broker.
Now install this Producer and add a relation between it and the Kafka Broker.
:~$ juju deploy trusty/kafka-twitter :~$ juju add-relation kafka-twitter kafka
Scale out Usage
This charm doesn't need to scale out for demo purposes. In the current implementation of Twitter the top bandwidth you are going to get out of it is ~140Mbps uncompressed. Hence a single well configured machine is OK for this.
However, it can use a Kafka cluster so you can scale that out.
:~$ juju add-unit kafka -n 3
If the volume of tweet you expect is very high, you can also increase the number of partitions Kafka uses in the configuration file. This doesn't come as a configuration option in this charm so please notify if you need it and I can add this.
Production environments usually run ZooKeeper in a HA mode with 3 nodes. You can do so by running:
:~# juju add-unit zookeeper -n 2
Known Limitations and Issues
The default configuration for Kafka doesn't work well for our purpose as it clearly assumes a complete and extensive production environment. So we are going to change some variables to make it sustainable for an AWS m1.small instance. Note that those changes are not mandatory if you run enough disk space.
First of all connect to your kafka server, edit the server details with your favorite editor. We'll use Nano for the purpose of this example, and only write what lines we change / update
:~$ juju ssh kafka/0 :~$ sudo nano /opt/kafka/config/server.properties
OK now let's modify
# log.retention.hours=168 log.retention.hours=24 # log.cleaner.enable=false log.cleaner.enable=true
Then you should restart Kafka.
sudo service kafka restart
you can also run the same from your computer with
:~# juju run --service=kafka '(sed -i -e s/^log\.retention\.hours.*/log\.retention\.hours\=24/ -e s/^log.cleaner.enable.*/log\.cleaner\.enable\=true) && service kafka restart'
We should also check the ZooKeeper connection string. In some occasions if the ZooKeeper charm is not the one planed it may fail to expose the right URL. So let's have a look at zookeeper.connect and make sure it is OK.
Those settings will apply to /etc/kafka-twitter/producer.conf
consumerKey = YourTwitterConsumerKey consumerSecret = YourTwitterConsumerSecret accessToken = YourTwitterToken accessTokenSecret = YourTwitterTokenSecret keywords = space separated list of hashtags to follow (without #)
Maintener of this charm: Samuel Cozannet firstname.lastname@example.org
Upstream Project Name
This project was inspired by NF LAbs project for a Kafka Twitter agent. Most of the java code come from them.