### Preparation

Step 1: install dependencies, if not installed

In [None]:
!pip install sqlalchemy ipython-sql kafka-python
%conda install -c cyclus java-jre

### Execution

Step 2: Download Kafka

In [None]:
!wget -P /tmp https://archive.apache.org/dist/kafka/2.8.0/kafka_2.12-2.8.0.tgz

Step 3: Extract Kafka

In [None]:
!tar -xzf /tmp/kafka_2.12-2.8.0.tgz -C /tmp

Step 7: Start Zookeeper

* Open a new terminal and run the following command:
* `/tmp/kafka_2.12-2.8.0/bin/zookeeper-server-start.sh /tmp/kafka_2.12-2.8.0/config/zookeeper.properties`

Step 8: Start Kafka server

* Open a new terminal and run the following command:
* `/tmp/kafka_2.12-2.8.0/bin/kafka-server-start.sh /tmp/kafka_2.12-2.8.0/config/server.properties`

Step 9: Create a topic named toll in kafka

In [None]:
!/tmp/kafka_2.12-2.8.0/bin/kafka-topics.sh --create --topic toll --bootstrap-server localhost:9092

Step 10: Download toll traffic simulator program <<Thanks, IBM!>>

In [None]:
!wget -P /tmp https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0250EN-SkillsNetwork/labs/Final%20Assignment/toll_traffic_generator.py

Step 11: Customize the generator program, changing topic to "toll"

* Open a new terminal and type `nano /tmp/toll_traffic_generator.py`
* Change the TOPIC variable to "toll"
* Press CTRL+O to save
* Press ENTER/RETURN to confirm
* Press CTRL+X to exit nano

Step 12: Run the Toll Traffic Simulator

* Open a new terminal and type: `python3 /tmp/toll_traffic_generator.py`
* In my case, the command was slightly different: `~/miniconda3/envs/mytest/bin/python /tmp/toll_traffic_generator.py`

Step 13: Download streaming data reader (consumer) <<Thanks, IBM!>>

* You don't need to download if you don't want to. Simply skip this step and go to step 14.
* Font: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0250EN-SkillsNetwork/labs/Final%20Assignment/streaming_data_reader.py

Step 14: Customize the consumer program to write into a SQLite database table

In [None]:
%%bash
echo -e 'from datetime import datetime
from kafka import KafkaConsumer
import sqlite3

TOPIC="toll"

print("Connecting to the database")
try:
    connection = sqlite3.connect("file:/tmp/kafka-project.db", uri=True)
except Exception:
    print("Could not connect to database")
else:
    print("Connected to database")
cursor = connection.cursor()

#Create table
print("creating table")
try:
    cursor.execute("""create table livetolldata(
        timestamp datetime,
        vehicle_id int,
        vehicle_type char(15),
        toll_plaza_id smallint
        );""")
except Exception:
    print("table already exists")
else:
    print("table created successfully")

print("Connecting to Kafka")
consumer = KafkaConsumer(TOPIC)
print("Connected to Kafka")
print(f"Reading messages from the topic {TOPIC}")
for msg in consumer:

    # Extract information from kafka
    message = msg.value.decode("utf-8")

    # Transform the date format to suit the database schema
    (timestamp, vehicle_id, vehicle_type, plaza_id) = message.split(",")

    dateobj = datetime.strptime(timestamp, "%a %b %d %H:%M:%S %Y")
    timestamp = dateobj.strftime("%Y-%m-%d %H:%M:%S")

    # Loading data into the database table

    sql = "insert into livetolldata values(?,?,?,?)"
    result = cursor.execute(sql, (timestamp, vehicle_id, vehicle_type, plaza_id))
    print(f"{timestamp}: A {vehicle_type} was inserted into the database")
    connection.commit()
connection.close()' > /tmp/streaming_data_reader.py

Step 15: Run the consumer script

* Open a new terminal and type: `python3 /tmp/streaming_data_reader.py`
* In my case, the command was slightly different: `~/miniconda3/envs/mytest/bin/python /tmp/streaming_data_reader.py`

Step 16: Verify that streamed data is being collected in the database table

In [None]:
%load_ext sql

In [None]:
%sql sqlite:////tmp/kafka-project.db

In [None]:
%sql SELECT * FROM livetolldata order by timestamp desc LIMIT 10