## Writing Streaming Data to Files

As we have successfully read the data and see it is being processed using `writeStream.format('console')`, now it is time for us to understand how the data can be written to files.

Here are the steps we need to follow to write the data to files:
1. Ensure the logs are being redirected to Netcat Webserver
2. Read the data using `spark.readStream` with `format('socket')`
3. Use `writeStream.format` with appropriate options related to the file format. We will be using `writeStream.format('csv')` and hence we need to specify checkpoint and target location.
```
socketDF \
    .writeStream \
    .format("csv") \
    .option("checkpointLocation", "/FileStore/retail_logs/gen_logs/checkpoint") \
    .option("path", "/FileStore/retail_logs/gen_logs/data") \
    .start()
```
4. Validate both the checkpoint location as well as data location in which files are being copied to.

In [0]:
socketDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", 9000) \
    .load()

In [0]:
%fs rm dbfs:/FileStore/retail_logs/gen_logs -r

In [0]:
%fs ls dbfs:/FileStore/retail_logs/

path,name,size
dbfs:/FileStore/retail_logs/checkpoint/,checkpoint/,0
dbfs:/FileStore/retail_logs/visitor_traffic/,visitor_traffic/,0


In [0]:
%fs mkdirs dbfs:/FileStore/retail_logs/gen_logs

In [0]:
%fs mkdirs dbfs:/FileStore/retail_logs/gen_logs/data

In [0]:
socketDF \
    .writeStream \
    .format("csv") \
    .option("path", "/FileStore/retail_logs/gen_logs/data") \
    .start()

In [0]:
socketDF \
  .writeStream \
  .format("csv") \
  .option("checkpointLocation", "/FileStore/retail_logs/gen_logs/checkpoint") \
  .option("path", "/FileStore/retail_logs/gen_logs/data") \
  .trigger(processingTime='5 seconds') \
  .start()

In [0]:
%fs ls /FileStore/retail_logs/gen_logs/data

path,name,size
dbfs:/FileStore/retail_logs/gen_logs/data/_spark_metadata/,_spark_metadata/,0
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-1766ac92-de59-4329-9ff5-8b020ea780d7-c000.csv,part-00000-1766ac92-de59-4329-9ff5-8b020ea780d7-c000.csv,0
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-1bce272e-20b7-4c57-89b3-0fa33b07169e-c000.csv,part-00000-1bce272e-20b7-4c57-89b3-0fa33b07169e-c000.csv,422
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-2134960e-9053-472e-928b-3384d0eadbc1-c000.csv,part-00000-2134960e-9053-472e-928b-3384d0eadbc1-c000.csv,390
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-322aa87a-a33b-4727-90e8-eca834913b5c-c000.csv,part-00000-322aa87a-a33b-4727-90e8-eca834913b5c-c000.csv,418
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-380f9d1f-3d70-47bc-a018-661c4225412d-c000.csv,part-00000-380f9d1f-3d70-47bc-a018-661c4225412d-c000.csv,382
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-3de4181b-548c-487e-8018-6a4120ae7d36-c000.csv,part-00000-3de4181b-548c-487e-8018-6a4120ae7d36-c000.csv,439
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-48dbe17c-a925-48fe-9f4e-b26b021254be-c000.csv,part-00000-48dbe17c-a925-48fe-9f4e-b26b021254be-c000.csv,388
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-60429dab-ec58-43c8-a2f9-0b18a8c9aa61-c000.csv,part-00000-60429dab-ec58-43c8-a2f9-0b18a8c9aa61-c000.csv,359
dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-65469286-fe4d-4836-b4fd-12a09e6003db-c000.csv,part-00000-65469286-fe4d-4836-b4fd-12a09e6003db-c000.csv,392


In [0]:
%fs ls /FileStore/retail_logs/gen_logs/checkpoint

path,name,size
dbfs:/FileStore/retail_logs/gen_logs/checkpoint/commits/,commits/,0
dbfs:/FileStore/retail_logs/gen_logs/checkpoint/metadata,metadata,45
dbfs:/FileStore/retail_logs/gen_logs/checkpoint/offsets/,offsets/,0


In [0]:
%fs head dbfs:/FileStore/retail_logs/gen_logs/data/part-00000-8895135d-18ca-4f84-a2d2-c1db77cc754e-c000.csv