#Financial Data Stream Producer

This notebook provides some basic functionality which you need to let a stream run so that the Spark Streaming Engine can process it. The stream is produced from a file with trading data from the New York Stock Exchange for the years 2010-2011. In order to simulate streaming data, the producer reads this file continuously and adds a timestamp. 

The only thing you need **to do** in order to run this Notebook is to change the path to the input file, after you upload it to Databricks. From here, the financial streaming data will be produced on a local socket on port 9998 (see Cmd 2 below).

The simplest way is to click `Run All` on the top. <br>

**NOTE:** when replacing the inputFile below with your own after uploading the file to Databricks, don't forget to **copy the uploaded input file from "dbfs:" to "file:"** (as in the example below). This is because we are using native Python to open this file, therefore we need to provide the full path. Databricks does this conversion automatically whenever we open files using Spark API.

In [0]:
###########################################################################

###### Financial Data Stream Producer morphed into Meteo Data Stream #####

###########################################################################


In [0]:
import socket
import time
import pandas as pd
from datetime import datetime
from pyspark import SparkFiles

# Upload the csv-fles from the external website

url1 = "https://www.web.statistik.zh.ch/awel/LoRa/data/AWEL_Sensors_LoRa_202208.csv"
spark.sparkContext.addFile(url1)
inputFile = spark.read.csv("file://"+SparkFiles.get("AWEL_Sensors_LoRa_202208.csv"), header=True, inferSchema=True, sep=';')
 
# This function is needed for the transformation into Pandas data frame

spark.conf.set("spark.sql.execcution.arrow.enabled","true") 

def send():
  
  ifile = inputFile.toPandas()
  i = 0

  while True:
    client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    client_socket.bind(('localhost', 9998))
    client_socket.listen(10)
    conn, addr = client_socket.accept()
    try:  
      while True:
        start = time.time()
        for row in ifile.iterrows():
          print("sending: "+ row.rstrip() + "," + str(datetime.now()))
          message = row.rstrip() + ',' + str(datetime.now()) + "\n"
          message = message.encode()
          conn.send(message)
          # send data every 100 ms
          #TODO: change the input rate here (or remove) to see effects in the consumer on total number of items processed per window
          time.sleep(0.1)

    except Exception as e:
        print(str(e))
        conn.close()
        client_socket.close()
        continue
    finally:
        conn.close()
        client_socket.close()
send()



'tuple' object has no attribute 'rstrip'
