# ***Exercixe 62 - Spark Streaming***

High stock price variation identification in real-time

Input:
- A stream of stock prices
    - Each input record has the format
    - Timestamp,StockID,Price
    
Output:
 - Every 30 seconds print on the standard output the StockID and the price variation (%) in the last 30 seconds of the stocks with a price variation greater than 0.5% in the last 30 sec0nds
    - Given a stock, its price variation during the last 30 seconds is:

                [(max(price)-min(price))/max(price)]

In [25]:
from pyspark.streaming import StreamingContext

In [26]:
# Create a Spark Streaming Context object
ssc = StreamingContext(sc, 30)

In [27]:
# Create a (Receiver) DStream that will connect to localhost:9999
linesDStream = ssc.socketTextStream("localhost", 9999)

In [28]:
# Computer for each stockID the price variation (compute it for each batch).
# Select only the stocks with a price variation (%) greater than 0.5%

In [29]:
# Return one pair (stockId, (price, price) )  for each input record

def extractStockIdPricePrice(line):
    fields = line.split(",")
    
    stockId = fields[1]
    price = fields[2]
    
    return (stockId, (float(price), float(price)) )


stockIdPriceDStream = linesDStream.map(extractStockIdPricePrice)

**Map doesn't send data on the network**

In [30]:
# Compute max and min for each stockId
stockIdMaxMinPriceDStream = stockIdPriceDStream\
.reduceByKey(lambda v1, v2: ( max(v1[0],v2[0]), min(v1[1],v2[1]) ) )

In [31]:
# Compute variation for each stock
stockIdVariationDStream = stockIdMaxMinPriceDStream\
.mapValues(lambda MaxMinValue: 100.0*(MaxMinValue[0]-MaxMinValue[1])/MaxMinValue[0] )

In [32]:
# Select only the stocks with variation greater than 0.5%
selectedStockIdsVariationsDStream = stockIdVariationDStream.filter(lambda pair: pair[1]>0.5)

In [33]:
selectedStockIdsVariationsDStream.pprint()

In [38]:
#Start the computation
ssc.start()

In [None]:
# Run this application for 90 seconds
ssc.awaitTerminationOrTimeout(90)
ssc.stop(stopSparkContext=False)