# Generating the serving layer

In this exercise, we'll use Spark structured streaming to generate the serving layer. Specifically, we'll write each event to a single table in InfluxDB. We'll use [the `influxdb` Python library](https://influxdb-python.readthedocs.io/en/latest/api-documentation.html) since there is no dedicated InfluxDB Spark connector.

In [None]:
%%bash
# Ensure the required Python 3 dependencies are installed.
python3 -m pip install kafka-python influxdb

First, create the `process` function that writes a single row to InfluxDB. Add the `visitor_browser` and `visitor_country` as tags, `ts_ingest` as the time and the other values as `fields`.

In [None]:
from influxdb import InfluxDBClient


To query the database we can send a HTTP request to the db. More info can be found in the InfluxDB [documentation](https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/). Use the local terminal for curl commands since the notebook image does not have it installed.

```text
curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=data" --data-urlencode "q=SELECT * FROM \"clicks\""
```

Create a Spark context and specify that the python spark-kafka libraries need to be added.

In [None]:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3 pyspark-shell'

import pyspark 
from pyspark import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

sc = SparkContext()
sc.setLogLevel("WARN")
spark = SparkSession(sc)

Create a streaming DataFrame that represents the events received from the Kafka topic `clicks-cleaned`.

Cast the json to columns in the DataFrame. Make sure to use TimestampType for the `ts_ingest` since we already converted it in the `clean` notebook.

Use a Spark `foreach` statement to call the `process` function for each row and start the query [docs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#foreach).

In [None]:
query.awaitTermination()