## Reading Web Server logs using Spark Structured Streaming

As we are ready with the ability to simulate log message generation, let us get into reading these logs using Spark Structured Streaming.
* `spark` which is of type `sparkSession` have an attribute called as `readStream`. It is of type `pyspark.sql.streaming.DataStreamReader`.
* It exposes APIs such as `csv`, `json`, etc along with `format`. To read data from web servers, we can use `socket` as format.
* We need to set options `host` and `port`, then invoke `load` to read data in streaming fashion.
* It will create an object which will be of type `pyspark.sql.dataframe.DataFrame`.

In [0]:
spark.readStream

In [0]:
socketDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 9000) \
    .load()

In [0]:
socketDF.isStreaming

In [0]:
socketDF.printSchema()

In [0]:
socketDF.show() # throws exceptions

* If no trigger setting is explicitly specified, then by default, the query will be executed in micro-batch mode, where micro-batches will be generated as soon as the previous micro-batch has completed processing.

In [0]:
socketDF.writeStream.

In [0]:
socketDF \
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .start()

* Run the below code and watch the output. You will see messages being processed every 5 seconds.

In [0]:
socketDF \
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .trigger(processingTime='5 seconds') \
    .start()

# Triggers every 5 seconds