# ***Spark Streaming - Stateful Computation***
The updateStateByKey transformation allows maintaining a “state” for each key. The value of the state of each key is continuously updated every time a new batch is analyzed.

The use of updateStateByKey is based on two steps:

 - **Define the state**
     - The data type of the state associated with the keys can be an arbitrary data type
     

 - **Define the state update function**
     - Specify with a function how to update the state of a key using the previous state and the new values from an input stream associated with that key
     
In every batch, Spark will apply the state update function for all existing keys.

### **Word Count - Stateful Version**
Update result every 5s

In [None]:
from pyspark.streaming import StreamingContext
sc

# Set prefix of the output folders
outputPathPrefix="resSparkStreamingExamples"

#Create a configuration object and#set the name of the applicationconf
SparkConf().setAppName("Streaming word count")

# Create a Spark Context object
sc = SparkContext(conf=conf)

# Create a Spark Streaming Context object (5 seconds)
ssc = StreamingContext(sc, 5)

# Set the checkpoint folder (it is needed by some window transformations)
ssc.checkpoint("checkpointfolder")

In [None]:
# Create a (Receiver) DStream that will connect to localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

# Apply a chain of transformations to perform the word count task
# The returned RDDs are DStream RDDs
words = lines.flatMap(lambda line: line.split(" "))

wordsOnes = words.map(lambda word: (word, 1))

In [None]:
# Define the function that is used to update the state of a key at a time

def updateFunction(newValues, currentCount):
    if currentCount is None:
        currentCount = 0
    
    # Sum the new values to the previous state for the current key
    return sum(newValues, currentCount)

# DStream made of cumulative counts for each key that get updated in every batch
totalWordsCounts = wordsOnes.updateStateByKey(updateFunction)

In [None]:
# Print the num. of occurrences of each word of the current window
# (only 10 of them)
totalWordsCounts.pprint()

# Store the output of the computation in the folders with prefix
# outputPathPrefix
totalWordsCounts.saveAsTextFiles(outputPathPrefix, "")

#Start the computation
ssc.start()

# Run this application for 90 seconds
ssc.awaitTerminationOrTimeout(90)

ssc.stop(stopSparkContext=False)