# Apache Kafka Streaming Analytics
In this notebook Apache Kafka is going to be used and analyzed with reference to the streaming performance using the twitter dataset.

In this case, we are going to use only one **Kafka Broker** that streams the data to the **Kafka Consumer**.

In [2]:
# Install the Python Client for Apache Kafka
!pip install confluent-kafka

Collecting confluent-kafka
  Downloading confluent_kafka-1.6.1-cp37-cp37m-manylinux2010_x86_64.whl (2.7 MB)
[K     |████████████████████████████████| 2.7 MB 6.5 MB/s eta 0:00:01     |██████████████████████████▋     | 2.3 MB 6.5 MB/s eta 0:00:01
[?25hInstalling collected packages: confluent-kafka
Successfully installed confluent-kafka-1.6.1


In [9]:
# Load dependencies and set constants
import matplotlib.pyplot as plt
import json
from confluent_kafka import Producer, Consumer, KafkaError

TWITTER_DATA_PATH = "../data/dataset.json"
KAFKA_TOPIC_TWITTER = "twitter-stream"
CONSUMER_GROUP_ID = "twitter-consumers"

### Reminder: Running Kafka Architecture required
The following cells assume a running Apache Kafka Environment.

In [None]:
# Produce the data / write it to the Kafka Cluster
producer_config = {
    "bootstrap.servers": "localhost:9092"
}
p = Producer(producer_config)

with open(TWITTER_DATA_PATH, "r") as dataset:
    for tweet in dataset:       
        p.produce(KAFKA_TOPIC_TWITTER, value=tweet)

In [None]:
# Consume the data
consumer_config = {
    "bootstrap.servers": "localhost:9092",
    "group.id": CONSUMER_GROUP_ID,
    "client.id": "client-1",
    "enable.auto.commit": True,
    "session.timeout.ms": 6000,
    "default.topic.config": {"auto.offset.reset": "smallest"}
}
c = Consumer(consumer_config)

c.subscribe([KAFKA_TOPIC_TWITTER])

try:
    while True:
        msg = c.poll(0.1)
        
        if msg is None:
            continue
        elif not msg.error():
            # Display the received tweet
            print(json.dumps(msg.value(), indent=4, ensure_ascii=False, sort_keys=True))
        elif msg.error().code() == KafkaError._PARTITION_EOF:
            print("End of partition reached {}/{}".format(msg.topic(), msg.partition()))
        else:
            print("Error occured: {}".format(msg.error().str()))
except KeyBoardInterrupt:
    pass
finally:
    c.close()