## Read streaming data from a Kafka Topic
The notebook establishes a spark-session with the local (in-memory) spark environment of the container.
A spark master-node needs to be specified. This is done by setting `master` to `local[*]`.

The provided messages from the Kafka topic are just simple text-messages, therefor this example lacks schema definition.

To read from a Kafka-stream we use `spark.readStream` specifying the Kafka server:

In [None]:
import pandas as pd
from IPython.display import clear_output, display
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("KafkaStream") \
    .master("local[*]") \
    .getOrCreate()

def handle_batch(df, epoch_id):
    # Convert to Pandas for display
    pdf = df.selectExpr("CAST(value AS STRING)").toPandas()
    #clear_output(wait=True)
    display(pdf)

df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "kafka:9092") \
    .option("subscribe", "test-topic") \
    .option("startingOffsets", "earliest") \
    .load()

query = df.writeStream \
    .outputMode("append") \
    .foreachBatch(handle_batch) \
    .start()


## Output
New messages will "arrive" when created via a Kafka producer (see the console-example).
To stop the stream-processing tell the query it is enough>

In [None]:
print("Is active:", query.isActive)
query.stop()

## Cleanup
Finally we also stop the spark session to have a clean house.

In [None]:
spark.stop()