# Apache Kafka Streaming Analytics
In this notebook Apache Kafka is going to be used and analyzed with reference to the streaming performance using the twitter dataset.

In this case, we are going to use only one **Kafka Broker** that streams the data to the **Kafka Consumer**.

In [2]:
# Install the Python Client for Apache Kafka
!pip install confluent-kafka

Collecting confluent-kafka
  Downloading confluent_kafka-1.6.1-cp37-cp37m-manylinux2010_x86_64.whl (2.7 MB)
[K     |████████████████████████████████| 2.7 MB 6.5 MB/s eta 0:00:01     |██████████████████████████▋     | 2.3 MB 6.5 MB/s eta 0:00:01
[?25hInstalling collected packages: confluent-kafka
Successfully installed confluent-kafka-1.6.1


In [1]:
# Load dependencies and set constants
import matplotlib.pyplot as plt
import json

from confluent_kafka import Producer, Consumer, KafkaError

DATA_GENERATION_IN_MB = 1000 # ~ 1GB
DATASET_SIZE_MB = 10
TWITTER_DATA_PATH = "../data/dataset.json"
KAFKA_TOPIC_TWITTER = "twitter-stream"
CONSUMER_GROUP_ID = "twitter-consumers"

### Reminder: Running Kafka Architecture required
The following cells assume a running Apache Kafka Environment.

In [9]:
# Produce the data / write it to the Kafka Cluster
producer_config = {
    "bootstrap.servers": "localhost:9092"
}
p = Producer(producer_config)

# Fill the topic with the specified amount of data
generation_steps = int(DATA_GENERATION_IN_MB / DATASET_SIZE_MB)
with open(TWITTER_DATA_PATH, "r") as dataset:
    for step in range(generation_steps):
        print(f"Executing data generation step {step}...")
        dataset.seek(0) # Jump back to first line  
        
        try: 
            for tweet in dataset:  
                p.produce(KAFKA_TOPIC_TWITTER, value=tweet)
        except BufferError:
            print('[INFO] Local producer queue is full (%d messages awaiting delivery): Trying again after flushing...\n' % len(p)) 
            p.poll(0)
            
            p.flush() # Send the messages in the queue

Data Generation step 0
Data Generation step 1
Data Generation step 2
Data Generation step 3
Data Generation step 4
Data Generation step 5
Data Generation step 6
Local producer queue is full (100000 messages awaiting delivery): try again

Data Generation step 7
Data Generation step 8
Data Generation step 9
Data Generation step 10
Data Generation step 11
Data Generation step 12
Data Generation step 13
Local producer queue is full (100122 messages awaiting delivery): try again

Data Generation step 14
Data Generation step 15
Data Generation step 16
Data Generation step 17
Data Generation step 18
Data Generation step 19
Data Generation step 20
Local producer queue is full (100125 messages awaiting delivery): try again

Data Generation step 21
Data Generation step 22
Data Generation step 23
Data Generation step 24
Data Generation step 25
Data Generation step 26
Data Generation step 27
Local producer queue is full (100128 messages awaiting delivery): try again

Data Generation step 28
Data G

In [10]:
def get_kafka_stats(json_stats_bytes):
    """ Callback for the Apache Kafka Consumer Configuration which 
    requests performance metrics / statistics from Kafka. 
    
    :param json_stats: The JSON statistics data in bytes
    """
    # Decode the bytes into a python dictionary representing the JSON
    stats = json.loads(json_stats_bytes)
    
    brokers = stats["brokers"]
    broker = brokers["ubuntu-apache-kafka-master-vm:9092/0"]
    print("Broker {0} Avg. Output Buffer Latency: {1}".format(broker["name"], broker["outbuf_latency"]["avg"]))
    
# Consume the data
consumer_config = {
    "bootstrap.servers": "localhost:9092",
    "group.id": CONSUMER_GROUP_ID,
    "client.id": "client-1",
    "stats_cb": get_kafka_stats,
    "statistics.interval.ms": 20,
    'api.version.request': True,
    "enable.auto.commit": True,
    "session.timeout.ms": 6000,
    "default.topic.config": {"auto.offset.reset": "smallest"}
}
c = Consumer(consumer_config)

c.subscribe([KAFKA_TOPIC_TWITTER])

try:
    while True:
        msg = c.poll(0.1)
        
        if msg is None:
            continue
        elif not msg.error():
            pass
            # Display the received tweet
            #tweet_json = json.loads(msg.value())
            #print(json.dumps(tweet_json, indent=4, ensure_ascii=False, sort_keys=True))
        elif msg.error().code() == KafkaError._PARTITION_EOF:
            print("End of partition reached {}/{}".format(msg.topic(), msg.partition()))
        else:
            print("Error occured: {}".format(msg.error().str()))
except KeyBoardInterrupt:
    pass
finally:
    c.close()

Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9

Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 56
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 296
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 297
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 47
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 12
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 11
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 275
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 159
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 12
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 12
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 12
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 26
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 18
Broker ubuntu-apache-

Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9

Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9

Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 50
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:9092/0 Avg. Output Buffer Latency: 0
Broker ubuntu-apache-kafka-master-vm:

NameError: name 'KeyBoardInterrupt' is not defined